• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 17
  • Tagged with
  • 244
  • 244
  • 42
  • 35
  • 32
  • 27
  • 26
  • 26
  • 26
  • 25
  • 24
  • 22
  • 22
  • 20
  • 19
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
91

Using river network structure to improve estimation of common temporal patterns

Gallacher, Kelly Marie January 2016 (has links)
Statistical models for data collected over space are widely available and commonly used. These models however usually assume relationships between observations depend on Euclidean distance between monitoring sites whose location is determined using two dimensional coordinates, and that relationships are not direction dependent. One example where these assumptions fail is when data are collected on river networks. In this situation, the location of monitoring sites along a river network relative to other sites is as important as the location in two dimensional space since it can be expected that spatial patterns will depend on the direction of water flow and distance between monitoring sites measured along the river network. Euclidean distance therefore might no longer be the most appropriate distance metric to consider. This is further complicated where it might be necessary to consider both Euclidean distance and distance along the river network if the observed variable is influenced by the land in which the river network is embedded. The Environment Agency (EA), established in 1996, is the government agency responsible for monitoring and improving the water quality in rivers situated in England (and Wales until 2013). A key responsibility of the EA is to ensure that efforts are made to improve and maintain water quality standards in compliance with EU regulations such as the Water Framework Directive (WFD, European Parliament (2000)) and Nitrates Directive (European Parliament, 1991). Environmental monitoring is costly and in many regions of the world funding for environmental monitoring is decreasing (Ferreyra et al., 2002). It is therefore important to develop statistical methods that can extract as much information as possible from existing or reduced monitoring networks. One way to do this is to identify common temporal patterns shared by many monitoring sites so that redundancy in the monitoring network could be reduced by removing non-informative sites exhibiting the same temporal patterns. In the case of river water quality, information about the shape of the river network, such as flow direction and connectivity of monitoring sites, could be incorporated into statistical techniques to improve statistical power and provide efficient inference without the increased cost of collecting more data. Reducing the volume of data required to estimate temporal trends would improve efficiency and provide cost savings to regulatory agencies. The overall aim of this thesis is to investigate how information about the spatial structure of river networks can be used to augment and improve the specfic trends obtained when using a variety of statistical techniques to estimate temporal trends in water quality data. Novel studies are designed to investigate the effect of accounting for river network structure within existing statistical techniques and, where necessary, statistical methodology is developed to show how this might be achieved. Chapter 1 provides an introduction to water quality monitoring and a description of several statistical methods that might be used for this. A discussion of statistical problems commonly encountered when modelling spatiotemporal data is also included. Following this, Chapter 2 applies a dimension reduction technique to investigate temporal trends and seasonal patterns shared among catchment areas in England and Wales. A novel comparison method is also developed to identify differences in the shape of temporal trends and seasonal patterns estimated using several different statistical methods, each of which incorporate spatial information in different ways. None of the statistical methods compared in Chapter 2 specifically account for features of spatial structure found in river networks: direction of water flow, relative influence of upstream monitoring sites on downstream sites, and stream distance. Chapter 3 therefore provides a detailed investigation and comparison of spatial covariance models that can be used to model spatial relationships found in river networks to standard spatial covariance models. Further investigation of the spatial covariance function is presented in Chapter 4 where a simulation study is used to assess how predictions from statistical models based on river network spatial covariance functions are affected by reducing the size of the monitoring network. A study is also developed to compare the predictive performance of statistical models based on a river network spatial covariance function to models based on spatial covariate information, but assuming spatial independence of monitoring sites. Chapters 3 and 4 therefore address the aim of assessing the improvement in information extracted from statistical models after the inclusion of information about river network structure. Following this, Chapter 5 combines the ideas of Chapters 2, 3 and 4 and proposes a novel statistical method where estimated common temporal patterns are adjusted for known spatial structure, identified in Chapters 3 and 4. Adjusting for known structure in the data means that spatial and temporal patterns independent of the river network structure can be more clearly identified since they are no longer confounded with known structure. The final chapter of this thesis provides a summary of the statistical methods investigated and developed within this thesis, identifies some limitations of the work carried out and suggests opportunities for future research. An Appendix provides details of many of the data processing steps required to obtain information about the river network structure in an appropriate form.
92

Three essays on time series : spatio-temporal modelling, dimension reduction and change-point detection

Dou, Baojun January 2015 (has links)
Modelling high dimensional time series and non-stationary time series are two import aspects in time series analysis nowadays. The main objective of this thesis is to deal with these two problems. The first two parts deal with high dimensionality and the third part considers a change point detection problem. In the first part, we consider a class of spatio-temporal models which extend popular econometric spatial autoregressive panel data models by allowing the scalar coefficients for each location (or panel) different from each other. The model is of the following form: yt = D(λ0)Wyt + D(λ1)yt−1 + D(λ2)Wyt−1 + εt, (1) where yt = (y1,t, . . . , yp,t) T represents the observations from p locations at time t, D(λk) = diag(λk1, . . . , λkp) and λkj is the unknown coefficient parameter for the j-th location, and W is the p×p spatial weight matrix which measures the dependence among different locations. All the elements on the main diagonal of W are zero. It is a common practice in spatial econometrics to assume W known. For example, we may let wij = 1/(1 + dij ), for i ̸= j, where dij ≥ 0 is an appropriate distance between the i-th and the j-th location. It can simply be the geographical distance between the two locations or the distance reflecting the correlation or association between the variables at the two locations. In the above model, D(λ0) captures the pure spatial effect, D(λ1) captures the pure dynamic effect, and D(λ2) captures the time-lagged spatial effect. We also assume that the error term εt = (ε1,t, ε2,t, . . . , εp,t) T in (1) satisfies the condition Cov (yt−1, εt) = 0. When λk1 = · · · = λkp for all k = 1, 2, 3, (1) reduces to the model of Yu et al. (2008), in which there are only 3 unknown regressive coefficient parameters. In general the regression function in (1) contains 3p unknown parameters. To overcome the innate endogeneity, we propose a generalized Yule-Walker estimation method which applies the least squares estimation to a Yule-Walker equation. The asymptotic theory is developed under the setting that both the sample size and the number of locations (or panels) tend to infinity under a general setting for stationary and α-mixing processes, which includes spatial autoregressive panel data models driven by i.i.d. innovations as special cases. The proposed methods are illustrated using both simulated and real data. In part 2, we consider a multivariate time series model which decomposes a vector process into a latent factor process and a white noise process. Let yt = (y1,t, · · · , yp,t) T be an observable p × 1 vector time series process. The factor model decomposes yt in the following form: yt = Axt + εt , (2) where xt = (x1,t, · · · , xr,t) T is a r × 1 latent factor time series with unknown r ≤ p and A = (a1, a2, · · · , ar) is a p × r unknown constant matrix. εt is a white noise process with mean 0 and covariance matrix Σε. The first part of (2) is a dynamic part and the serial dependence of yt is driven by xt. We will achieve dimension reduction once r ≪ p in the sense that the dynamics of yt is driven by a much lower dimensional process xt. Motivated by practical needs and the characteristic of high dimensional data, the sparsity assumption on factor loading matrix is imposed. Different from Lam, Yao and Bathia (2011)’s method, which is equivalent to an eigenanalysis of a non negative definite matrix, we add a constraint to control the number of nonzero elements in each column of the factor loading matrix. Our proposed sparse estimator is then the solution of a constrained optimization problem. The asymptotic theory is developed under the setting that both the sample size and the dimensionality tend to infinity. When the common factor is weak in the sense that δ > 1/2 in Lam, Yao and Bathia (2011)’s paper, the new sparse estimator may have a faster convergence rate. Numerically, we employ the generalized deflation method (Mackey (2009)) and the GSLDA method (Moghaddam et al. (2006)) to approximate the estimator. The tuning parameter is chosen by cross validation. The proposed method is illustrated with both simulated and real data examples. The third part is a change point detection problem. we consider the following covariance structural break detection problem: Cov(yt)I(tj−1 ≤ t < tj ) = Σtj−1, j = 1, · · · , m + 1, where yt is a p × 1 vector time series, Σtj−1̸ = Σtj and {t1, . . ., tm} are change points, 1 = t0 < t1 < · · · < tm+1 = n. In the literature, the number of change points m is usually assumed to be known and small, because a large m would involve a huge amount of computational burden for parameters estimation. By reformulating the problem in a variable selection context, the group least absolute shrinkage and selection operator (LASSO) is proposed to estimate m and the locations of the change points {t1, . . ., tm}. Our method is model free, it can be extensively applied to multivariate time series, such as GARCH and stochastic volatility models. It is shown that both m and the locations of the change points {t1, . . . , tm} can be consistently estimated from the data, and the computation can be efficiently performed. An improved practical version that incorporates group LASSO and the stepwise regression variable selection technique are discussed. Simulation studies are conducted to assess the finite sample performance.
93

Extreme insurance and the dynamics of risk

Maynard, Trevor January 2016 (has links)
The aim of this thesis is to explore the question: can scientific models improve insurance pricing? Model outputs are often converted to forecasts and, in the context of insurance, the supplementary questions: ‘are forecasts skillful?’ and ‘are forecasts useful?’ are examined. Skill score comparison experiments are developed allowing several scores in common use to be ranked. One score is shown to perform well; several others are shown to have systematic failings; with the conclusion that these should not be used by insurers. A new skill score property ‘Feasibility’ is proposed which highlights a key shortcoming of some scores in common use. Variables from a well known dynamical system are used as a proxy for an insurable index. A new method relating the system and its models is presented using skill scores to find their score optimal piecewise linear relationship. The index is priced using both traditional techniques and new methods that use the score optimal relationship. One new method is very successful in that it produces lower prices on average, is more profitable and leads to a lower probability of insurer failure. In this context the forecasts are both skilful and useful. The efficacy of forecast use is further explored by considering hurricane insurance. Here forecasts are shown to be useful only if very simple adjustments to pricing are made. A novel agent based model of a two company insurance industry containing many key features in the real world is presented enabling the impact of regulation and competition to be assessed. Several common practices are shown to reduce expected company lifetime.
94

Evaluating mode differences in longitudinal data : moving to a mixed mode paradigm of survey methodology

Cernat, Alexandru January 2015 (has links)
Collecting and combining data using multiple modes of interview (e.g., face-to- face, telephone, Web) is becoming common practice in survey agencies. This is also true for longitudinal studies, a special type of survey that applies questionnaires repeatedly to the same respondents. In this PhD I investigate if and how collecting information using different modes can impact data quality in panel studies. Chapters 2 and 3 investigate how a sequential telephone - face-to-face mixed mode design can bias reliability, validity and estimates of change compared to a single mode. In order to achieve this goal I have used an experimental design from the Understanding Society Innovation Panel. The analyses have shown that there are only small differences in reliability and validity between the two modes but estimates of change might be overestimated in the mixed modes design. Chapter 4 investigates the measurement differences between face-to-face, telephone and Web on three scales: depression, physical activity and religiosity. We use a quasi-experimental (cross-over) design in the Health and Retirement Study. The results indicate systematic differences between interviewer modes and Web. We propose social desirability and recency as possible explanations. In Chapter 5 we investigate using the Understanding Innovation Panel if the extra contact by email leads to increased propensity to participate in a sequential Web - face-to-face design. Using the experimental nature of our data we show that the extra contact by email in the mixed mode survey does not increase participation likelihood. One of the main difficulties in the research of (mixed) modes designs is separating the effects of selection and measurement of the modes. Chapter 6 tackles this issue by proposing equivalence testing, a statistical approach to control for measurement differences across groups, as a front-door approach to disentangle these two. A simulation study shows that this approach works and highlights the bias when the two main assumptions don't hold.
95

Coupling and the policy improvement algorithm for controlled diffusion processess

Širaj, Dejan January 2015 (has links)
The thesis deals with the mirror and synchronous couplings of geometric Brownian motions, the policy improvement (or iteration) algorithm in completely continuous settings, and an application where the latter is applied to the former. First we investigate whether the mirror and synchronous couplings of Brownian motions minimise and maximise, respectively, the coupling time of the corresponding geometric Brownian motions. We prove (via Bellman's principle) that this is indeed the case in the infinite horizon and ergodic average problems, but not necessarily in the finite horizon and exponential efficiency problems, for which we characterise when the two couplings are suboptimal. Then we describe the policy improvement algorithm for controlled diffusion processes in the framework of the discounted infinite horizon problem, both in one and several dimensions. Under some assumptions on the data of the problem, we prove that the algorithm yields a sequence of Markov policies such that its accumulation point is an optimal policy, and that the corresponding payoff functions converge monotonically to the value function. We use no discretisation procedures at any stage. We show that a large class of data satisfies the assumptions, and an example implemented in Matlab demonstrates that the convergence is numerically fast. Next we study the policy improvement algorithm for continuous finite horizon problem. We obtain analogous results as for the infinite horizon problem. Finally we apply the algorithm to a certain sequence of data to approximate the value function of the (partially unsolved) finite horizon problem for geometric Brownian motions.
96

Investigating the performance of multilevel cross-classified and multiple membership logistic models : with applications to interviewer effects on nonresponse

Vassallo, Rebecca January 2014 (has links)
This thesis focuses on the modelling of interviewer effects on nonresponse using cross classified and multiple membership multilevel logistic models, and investigates the properties of such models under various survey conditions. The first paper reviews the use of cross-classified and multiple membership models to account for both interviewer and area effects and for various wave interviewers. An extension to incorporate both wave interviewer effects and area effects is presented. The mathematical details, assumptions and limitations of the models are considered. The different models conceptualised are then fitted to a dataset. This application extends the focus of the first paper from simply a methodological one to an applied study with substantive research questions. The study aims to identify interviewer characteristics that influence nonresponse behaviour, assess the relative importance of previous and current wave interviewers on current wave nonresponse, and explore whether respondents react favourably to interviewers with similar characteristics. The second and third papers investigate the properties of cross-classified and multiple membership multilevel models respectively under various survey conditions. The second study looks at the effects of different interviewer case assignment schemes, total sample sizes, group sizes (interviewer caseload), number of groups (number of interviewers), overall rates of response, and the variance partitioning coefficient on the properties of the estimators and the power of the Wald test. The study aims to provide practical recommendations for future study designs by identifying the smallest total sample size, interviewer pool, and the most geographically-restrictive and cost-effective interviewer case allocation required to adequately distinguish between area and interviewer effects. The third paper includes a sensitivity analysis which looks at how accurately the Deviance Information Criterion identifies the best weighting scheme for different true multiple membership weights, interview allocation profiles, and total sample sizes. This sensitivity analysis indicates how well the relative importance of the previous and current wave interviewers can be estimated in multiple membership models under different survey conditions. Moreover the quality of parameter estimates under models with correctly specified weights, models with incorrectly specified weights, and models with weights based on the Deviance Information Criterion are also investigated.
97

Predicting hypotensive episodes in the traumatic brain injury domain

Donald, Rob January 2014 (has links)
The domain with which this research is concerned is traumatic brain injury and models which attempt to predict hypotensive (low blood pressure) events occurring in a hospital intensive care unit environment. The models process anonymised, clinical, minute-byminute, physiological data from the BrainIT consortium. The research reviews three predictive modelling techniques: classic time series analysis; hidden Markov models; and classifier models, which are the main focus of this thesis. The data preparation part of this project is extensive and six applications have been developed: an event list generator, used to process a given event definition; a data set generation tool, which produces a series of base data sets that can be used to train machine learning models; a training and test set generation application, which produces randomly drawn training and test data sets; an application used to build and assess a series of logistic regression models; an application to test the statistical models on unseen data, which uses anonymised real clinical data from intensive care unit bedside monitors; and finally, an application that implements a proposed clinical warning protocol, which attempts to assess a model’s performance in terms of usefulness to a clinical team. These applications are being made available under a public domain licence to enable further research (see Appendix A for details). Six logistic regression models and two Bayesian neural network models are examined using the physiological signals heart rate and arterial blood pressure, along with the demographic variables of age and gender. Model performance is assessed using the standard ROC technique to give the AUC metric. An alternative performance metric, the H score, is also investigated. Using unseen clinical data, two of the models are assessed in a manner which mimics the ICU environment. This approach shows that models may perform better than would be suggested by standard assessment metrics. The results of the modelling experiments are compared with a recent similar project in the healthcare domain and show that logistic regression models could form the basis of a practical early warning system for use in a neuro intensive care unit.
98

Validating and extending the two-moment capital asset pricing model for financial time series

Neslihanoglu, Serdar January 2014 (has links)
This thesis contributes to the ongoing discussion about the financial and statistical modelling of returns on financial stock markets. It develops the asset pricing model concept which has received continuous attention for almost 50 years in the area of finance, as a method by which to identify the stochastic behaviour of financial data when making investment decisions, such as portfolio choices, and determining market risk. The best known and most widely used asset pricing model detailed in the finance literature is the Two-Moment Capital Asset Pricing Model (CAPM) (consistent with the Linear Market Model), which was developed by Sharpe-Lintner- Mossin in the 1960s to explore systematic risk in a mean-variance framework and is the benchmark model for this thesis. However, this model has now been criticised as misleading and insufficient as a tool for characterising returns in financial stock markets. This is partly a consequence of the presence of non-normally distributed returns and non-linear relationships between asset and market returns. The inadequacies of the Two-Moment CAPM are qualified in this thesis, and the extensions are proposed that improve on both model fit and forecasting abilities. To validate and extend the benchmark Linear Market Model, the empirical work presented in this thesis centres around three related extensions. The first extension compares the Linear Market Model’s modelling and forecasting abilities with those of the time-varying Linear Market Model (consistent with the conditional Two-Moment CAPM) for 19 Turkish industry sector portfolios. Two statistical modelling techniques are compared: a class of GARCH-type models, which allow for non-constant variance in stock market returns, and state space models, which allow for the systematic covariance risk to change linearly over time in the time-varying Linear Market Model. The state space modelling is shown to outperform the GARCH-type modelling. The second extension concentrates on comparing the performance of the Linear Market Model, with models for higher order moments, including polynomial extensions and a Generalised Additive Model (GAM). In addition, time-varying versions of the Linear Market Model and polynomial extensions, in the form of state space models, are considered. All these models are applied to 18 global markets during three different time periods: the entire period from July 2002 to July 2012, from July 2002 to just before the October 2008 financial crisis, and from after the October 2008 financial crisis to July 2012. Although the more complex unconditional models are shown to improve slightly on the Linear Market Model, the state space models again improve substantially on all the unconditional models. The final extension focuses on comparing the performance of four possible multivariate state space forms of the time-varying Linear Market Models, using data on the same 18 global markets, utilising correlations between markets. This approach is shown to improve further on the performance of the univariate state space models. The thesis concludes by drawing together three related themes: the inappropriateness of the Linear Market Model, the extent to which multivariate modelling improves the univariate market model and the state of the world’s stock markets.
99

Bayesian mixture models for count data

Chanialidis, Charalampos January 2015 (has links)
Regression models for count data are usually based on the Poisson distribution. This thesis is concerned with Bayesian inference in more flexible models for count data. Two classes of models and algorithms are presented and studied in this thesis. The first employs a generalisation of the Poisson distribution called the COM-Poisson distribution, which can represent both overdispersed data and underdispersed data. We also propose a density regression technique for count data, which, albeit centered around the Poisson distribution, can represent arbitrary discrete distributions. The key contribution of this thesis are MCMC-based methods for posterior inference in these models. One key challenge in COM-Poisson-based models is the fact that the normalisation constant of the COM-Poisson distribution is not known in closed form. We propose two exact MCMC algorithms which address this problem. One is based on the idea of retrospective sampling; we sample the uniform random variable used to decide on the acceptance (or rejection) of the proposed new state of the unknown parameter first and then only evaluate bounds for the acceptance probability, in the hope that we will not need to know the acceptance probability exactly in order to come to a decision on whether to accept or reject the newly proposed value. This strategy is based on an efficient scheme for computing lower and upper bounds for the normalisation constant. This procedure can be applied to a number of discrete distributions, including the COM-Poisson distribution. The other MCMC algorithm proposed is based on an algorithm known as the exchange algorithm. The latter requires sampling from the COM-Poisson distribution and we will describe how this can be done efficiently using rejection sampling. We will also present simulation studies which show the advantages of using the COM-Poisson regression model compared to the alternative models commonly used in literature (Poisson and negative binomial). Three real world applications are presented: the number of emergency hospital admissions in Scotland in 2010, the number of papers published by Ph.D. students and fertility data from the second German Socio-Economic Panel. COM-Poisson distributions are also the cornerstone of the proposed density regression technique based on Dirichlet process mixture models. Density regression can be thought of as a competitor to quantile regression. Quantile regression estimates the quantiles of the conditional distribution of the response variable given the covariates. This is especially useful when the dispersion changes across the covariates. Instead of estimating the conditional mean , quantile regression estimates the conditional quantile function across different quantiles. As a result, quantile regression models both location and shape shifts of the conditional distribution. This allows for a better understanding of how the covariates affect the conditional distribution of the response variable. Almost all quantile regression techniques deal with a continuous response. Quantile regression models for count data have so far received little attention. A technique that has been suggested is adding uniform random noise ('jittering'), thus overcoming the problem that, for a discrete distribution, the conditional quantile function is not a continuous function of the parameters of interest. Even though this enables us to estimate the conditional quantiles of the response variable, it has disadvantages. For small values of the response variable Y, the added noise can have a large influence on the estimated quantiles. In addition, the problem of 'crossing quantiles' still exists for the jittering method. We eliminate all the aforementioned problems by estimating the density of the data, rather than the quantiles. Simulation studies show that the proposed approach performs better than the already established jittering method. To illustrate the new method we analyse fertility data from the second German Socio-Economic Panel.
100

A Bayesian hierarchical model of compositional data with zeros : classification and evidence evaluation of forensic glass

Napier, Gary January 2014 (has links)
A Bayesian hierarchical model is proposed for modelling compositional data containing large concentrations of zeros. Two data transformations were used and compared: the commonly used additive log-ratio (alr) transformation for compositional data, and the square root of the compositional ratios. For this data the square root transformation was found to stabilise variability in the data better. The square root transformation also had no issues dealing with the large concentrations of zeros. To deal with the zeros, two different approaches have been implemented: the data augmentation approach and the composite model approach. The data augmentation approach treats any zero values as rounded zeros, i.e. traces of components below limits of detection, and updates those zero values with non-zero values. This is better than the simple approach of adding constant values to zeros as it reduces any artificial correlation produced by updating the zeros as part of the modelling procedure. However, due to the small detection limit it does not necessarily alleviate the problems of having a point mass very close to zero. The composite model approach treats any zero components as being absent from a composition. This is done by splitting the data into subsets according to the presence or absence of certain components to produce different data configurations that are then modelled separately. The models are applied to a database consisting of the elemental configurations of forensic glass fragments with many levels of variability and of various use types. The main purposes of the model are (i) to derive expressions for the posterior predictive probabilities of newly observed glass fragments to infer their use type (classification) and (ii) to compute the evidential value of glass fragments under two complementary propositions about their source (forensic evidence evaluation). Simulation studies using cross-validation are carried out to assess both model approaches, with both performing well at classifying glass fragments of use types bulb, headlamp and container, but less well so when classifying car and building windows. The composite model approach marginally outperforms the data augmentation approach at the classification task; both approaches have the edge over support vector machines (SVM). Both model approaches also perform well when evaluating the evidential value of glass fragments, with false negative and false positive error rates below 5%. The results from glass classification and evidence evaluation are an improvement over existing methods. Assessment of the models as part of the evidence evaluation simulation study also leads to a restriction being placed upon the reported strength of the value of this type of evidence. To prevent strong support in favour of the wrong proposition it is recommended that this glass evidence should provide, at most, moderately strong support in favour of a proposition. The classification and evidence evaluation procedures are implemented into an online web application, which outputs the corresponding results for a given set of elemental composition measurements. The web application contributes a quick and easy-to-use tool for forensic scientists that deal with this type of forensic evidence in real-life casework.

Page generated in 0.0735 seconds