Global ETD Search

51	Evaluation of sampling and monitoring designs for water quality Haggarty, Ruth Alison January 2012 (has links) Assessing water quality is of crucial importance to both society and the environment. Deterioration in water quality through issues such as eutrophication presents substantial risk to human health, plant and animal life, and can have detrimental effects on the local economy. Long-term data records across multiple sites can be used to investigate water quality and risk factors statistically, however, identification of underlying changes can only be successful if there is a sufficient quantity of data available. As vast amounts of resources are required for the implementation and maintenance of a monitoring network, logistically and financially it is not possible to employ continuous monitoring of all water environments. This raises the question as to the optimal design for long-term monitoring networks which are capable of capturing underlying changes. Two of the main design considerations are clearly where to sample, and how frequently to sample. The principal aim of this thesis is to use statistical analysis to investigate frequently used environmental monitoring networks, developing new methodology where appropriate, so that the design and implementation of future networks can be made as effective and cost efficient as possible. Using data which have been provided by the Scottish Environment Protection Agency, several data from Scottish lakes and rivers and a range of determinands are considered in order to explore water quality monitoring in Scotland. Chapter 1 provides an introduction to environmental monitoring and both existing statistical techniques, and potential challenges which are commonly encountered in the analysis of environmental data are discussed. Following this, Chapter 2 presents a simulation study which has been designed and implemented in order to evaluate the nature and statistical power for commonly used environmental sampling and monitoring designs for surface waters. The aim is to answer questions regarding how many samples to base the chemical classification of standing waters, and how appropriate the currently available data in Scotland are for detecting trends and seasonality. The simulation study was constructed to investigate the ability to detect the different underlying features of the data under several different sampling conditions. After the assessment of how often sampling is required to detect change, the remainder of the thesis will attempt to address some of the questions associated with where the optimal sampling locations are. The European Union Water Framework Directive (WFD) was introduced in 2003 to set compliance standards for all water bodies across Europe, with an aim to prevent deterioration, and ensure all sites reach `good' status by 2015. One of the features of the WFD is that water bodies can be grouped together and the classification of all members of the group is then based on the classification of a single representative site. The potential misclassification of sites means one of the key areas of interest is how well the existing groups used by SEPA for classification capture differences between the sites in terms of several chemical determinands. This will be explored in Chapter 3 where a functional data analysis approach will be taken in order to investigate some of the features of the existing groupings. An investigation of the effect of temporal autocorrelation on our ability to distinguish groups of sites from one another will also be presented here. It is also of interest to explore whether fewer, or indeed more groups would be optimal in order to accurately represent the trends and variability in the water quality parameters. Different statistical approaches for grouping standing waters will be presented in Chapter 4, where the question of how many groups is statistically optimal is also addressed. As in Chapter 3, these approaches for grouping sites will be based on functional data in order to include the temporal dynamics of the variable of interest within any analysis of group structure obtained. Both hierarchical and model based functional clustering are considered here. The idea of functional clustering is also extended to the multivariate setting, thus enabling information from several determinands of interest to be used within formation of groups. This is something which is of particular importance in view of the fact that the WFD classification encompasses a range of different determinands. In addition to the investigation of standing waters, an entirely different type of water quality monitoring network is considered in Chapter 5. While standing waters are assumed to be spatially independent of one another there are several situations where this assumption is not appropriate and where spatial correlation between locations needs to be accounted for. Further developments of the functional clustering methods explored in Chapter 4 are presented here in order to obtain groups of stations that are not only similar in terms of mean levels and temporal patterns of the determinand of interest, but which are also spatially homogenous. The river network data explored in Chapter 5 introduces a set of new challenges when considering functional clustering that go beyond the inclusion of Euclidean distance based spatial correlation. Existing methodology for estimating spatial correlation are combined with functional clustering approaches and developed to be suitable for application on sites which lie along a river network. The final chapter of this thesis provides a summary of the work presented and discussion of limitations and suggestions for future directions. 628.1 HA Statistics
52	Latent variable models for mixed manifest variables Moustaki, Irini January 1996 (has links) Latent variable models are widely used in social sciences in which interest is centred on entities such as attitudes, beliefs or abilities for which there e)dst no direct measuring instruments. Latent modelling tries to extract these entities, here described as latent (unobserved) variables, from measurements on related manifest (observed) variables. Methodology already exists for fitting a latent variable model to manifest data that is either categorical (latent trait and latent class analysis) or continuous (factor analysis and latent profile analysis). In this thesis a latent trait and a latent class model are presented for analysing the relationships among a set of mixed manifest variables using one or more latent variables. The set of manifest variables contains metric (continuous or discrete) and binary items. The latent dimension is continuous for the latent trait model and discrete for the latent class model. Scoring methods for allocating individuals on the identified latent dimen-sions based on their responses to the mixed manifest variables are discussed. ' Item nonresponse is also discussed in attitude scales with a mixture of binary and metric variables using the latent trait model. The estimation and the scoring methods for the latent trait model have been generalized for conditional distributions of the observed variables given the vector of latent variables other than the normal and the Bernoulli in the exponential family. To illustrate the use of the naixed model four data sets have been analyzed. Two of the data sets contain five memory questions, the first on Thatcher's resignation and the second on the Hillsborough football disaster; these five questions were included in BMRBI's August 1993 face to face omnibus survey. The third and the fourth data sets are from the 1990 and 1991 British Social Attitudes surveys; the questions which have been analyzed are from the sexual attitudes sections and the environment section respectively. 519.5 HA Statistics
53	A statistical analysis of low birthweight in Glasgow Murray, Barbara A. January 1999 (has links) The percentage of singleton livebirths resulting in low birthweight deliveries has remained constant in the last 20 years, with between 6 and 10% of singleton pregnancies resulting in such a delivery. Low birthweight infants have been shown to develop medical problems in infancy and childhood, such as visual impairment, lower IQs and neuromotor problems, and as such it is important to identify those pregnancies that may result in low birthweight infants. This thesis considers factors that may be related to low birthweight, and uses these factors in the construction of a model to predict the probability of a woman delivering a low birthweight infant in order to identify high risk mothers. One factor that may be thought of as being related to low birthweight is deprivation. In this thesis a new deprivation measure is proposed which updates previous work in the area by using the 1991 small area census data to create a continuous deprivation measure, based on postcode area of residence, within the Greater Glasgow Health Board. This new measure of deprivation is included in the model referred to above. As there are many possible risk factors involved in modelling the probability of delivering a low birthweight infant multiple comparisons are involved in the production of the model and it is important to produce a model that incorporates most of the relevant factors and relatively few of the unimportant factors. The first order Bonferroni bound is one method used to correct for multiple comparisons by giving an upper bound on the actual p-value. This thesis considers the second order Bonferroni bound which gives a lower bound on the p-value and, when used in conjunction with the first order bound, gives a better correction method than the first order bound alone. These two bounds are then extended into logistic regression models. 519.5 HA Statistics
54	The statistical analysis of exercise test data : a critical review McConnachie, Alex January 2003 (has links) Exercise tests have played a prominent role in the evaluation of therapies currently used for the management of patients with angina, such as nitrates, b-blockers, and calcium antagonists. Such evaluations have shown dramatic improvements in exercise tolerance, most commonly measured by the time spent exercising until the occurrence of anginal pain or ECG signs of ischaemia, and often amongst patients with severe disease. However, the statistical methods used have generally been based on Normal theory, such as the t-test, or non-parametric equivalents, such as the Wilcoxon rank sum test. Such methods make no allowance for the fact that ischaemic endpoints may not occur in all patients, particularly when patients are under active treatment or in patients with less severe symptoms. In the current situation, where there are several therapeutic options of proven clinical effectiveness, new treatments must be evaluated in opposition, or in addition to existing therapies. Thus it is of particular importance that the statistician responsible for an analysis of exercise test data should use appropriate and efficient techniques, since the benefits of new treatments may be small. Since exercise times may be censored, in that the event of interest need not occur, it has been recognised that methods for the analysis of survival data are the most appropriate for analyses of exercise test data. Using data from the TIBET Study, a large clinical trial of two anti-anginal therapies administered singly or in combination, this thesis examines in detail the appropriateness of the Cox proportional hazards model, the most popular method for survival regression in the medical literature, to this type of data. It then considers alternatives to this model, and addresses the implications of some common features of exercise test data, in particular the presence of interval censoring and the possibility of multiple exercise tests being conducted on the same patient, using data from the TIBET Study and through simulation studies. Finally, using real data examples, two methods that appear to have received little or not attention with respect to exercise test data are explored, namely competing risks and repeated measures analyses. 616.0754 HA Statistics
55	Optimal cutpoint determination via design theory for regression models Nguyen, The M. January 2006 (has links) The main focus of this thesis is how to optimally choose a set or sets of cutpoints (in a categorized survey question) which are offered to respondents. In the case of several sets, a further issue is how to allocate sampled subjects to these sets (design points). Applications include Contingent Valuation (CV) studies (surveys on a population’s willingness to pay for a service or public good) and market research studies which might include for example a question on individual incomes. 519.5 HA Statistics
56	Bayesian analysis of finite mixture distributions using the allocation sampler Fearnside, Alastair T. January 2007 (has links) Finite mixture distributions are receiving more and more attention from statisticians in many different fields of research because they are a very flexible class of models. They are typically used for density estimation or to model population heterogeneity. One can think of a finite mixture distribution as grouping the observations into components from which they are assumed to have arisen. In certain settings these groups have a physical interpretation. The interest in these distributions has been boosted recently because of the ever increasing computer power available to researchers to carry out the computationally intensive tasks required in their analysis. In order to fit a finite mixture distribution taking a Bayesian approach a posterior distribution has to be evaluated. When the number of components in the model is assumed known this posterior distribution can be sampled from using methods such as Data Augmentation or Gibbs sampling (Tanner and Wong (1987) and Gelfand and Smith (1990)) and the Metropolis-Hastings algorithm (Hastings (1970)). However, the number of components in the model can also be considered an unknown and an object of inference. Richardson and Green (1997) and Stephens (2000a) both describe Bayesian methods to sample across models with different numbers of components. This enables an estimate of the posterior distribution of the number of components to be evaluated. Richardson and Green (1997) define a reversible jump Markov chain Monte Carlo (RJMCMC) sampler while Stephens (2000a) uses a Markov birth-death process approach sample from the posterior distribution. In this thesis a Markov chain Monte Carlo method, named the allocation sampler. This sampler differs from the RJMCMC method reported in Richardson and Green (1997) because the state space of the sampler is simplified by the assumption that the components' parameters and weights can be analytically integrated out of the model. This in turn has the advantage that only minimal changes are required to the sampler for mixtures of components from other parametric families. This thesis illustrates the allocation sampler's performance on both simulated and real data sets. Chapter 1 provides a background to finite mixture distributions and gives an overview of some inferential techniques that have already been used to analyse these distributions. Chapter 2 sets out the Bayesian model framework that is used throughout this thesis and defines all the required distributional results. Chapter 3 describes the allocation sampler. Chapter 4 tests the performance of the allocation sampler using simulated datasets from a collection of 15 different known mixture distributions. Chapter 5 illustrates the allocation sampler with real datasets from a number of different research fields. Chapter 6 summarises the research in the thesis and provides areas of possible future research. 519.5 HA Statistics
57	Theory and applications of delayed censoring models in survival analysis Heydari, Fariborz January 1997 (has links) The objective of this thesis is to develop new statistical models for the analysis of censored survival data, particularly for the study of recidivism data, such as the reoffence data used in the analysis here. This has been an area of great interest in criminology in recent years. There is a growing literature on survival analysis in criminology, where interest centres on the time from an offender's conviction, or release from prison, to the first reconviction or reimprisonment. In deciding whether to release a prisoner on parole, the Parole Board is provided with a statistical score which estimates the chance that the prisoner will reoffend within the period of time that he or she would otherwise be in prison. This score is based on a survival analysis of data on a sample of releases from long-term prison sentences. To capture most reoffences which occur within 2 years of release, follow-up must continue for at least 3 years to allow for the delay between offence and conviction. We reanalyse the data by using a model which explicitly allows for this delay. We refer to this as 'delayed censoring model'. The new analysis can be applied to data with a substantially shorter length of follow-up. This means that risk scores can be constructed from more up-to-date data and at less cost. It is models of this kind that we shall be concerned with in this thesis, and this is the principal motivation of the work done. The statistical models that this thesis provides bring in a number of new ideas which are undoubtedly useful both at a theoretical level and in applications. Other major divisions of the work include: (i) Assessing the possibility of an association between the delay and reoffence times by studying truncated distributions fitted to these data, by parametric, semi-parametric and nonparametric models. With the nonparametric approach we have developed a 'backward regression model' which is similar to the Cox model. (ii) We have also discussed delayed censoring modification to the Cox model, and developed a more general semi-parametric model for all the data including both observed and censored cases. In this model the delay and reoffence times are allowed to be correlated. We refer to this as the 'generalized weighted hazards model'. (iii) Finally, we have compared the results by applying all these models to the data. Although the parametric models give a good fit to the data, the semi-parametric and nonparametric models give a slightly better fit, as expected. 519.5 HA Statistics
58	Statistical methods for sparse image time series of remote-sensing lake environmental measurements Gong, Mengyi January 2017 (has links) Remote-sensing technology is widely used in Earth observation, from everyday weather forecasting to long-term monitoring of the air, sea and land. The remarkable coverage and resolution of remote sensing data are extremely beneficial to the investigation of environmental problems, such as the state and function of lakes under climate change. However, the attractive features of remote-sensing data bring new challenges to statistical analysis. The wide coverage and high resolution means that data are usually of large volume. The orbit track of the satellite and the occasional obscuring of the instruments due to atmospheric factors could result in substantial missing observations. Applying conventional statistical methods to this type of data can be ineffective and computationally intensive due to its volume and dimensionality. Modifications to existing methods are often required in order to incorporate the missingness. There is a great need of novel statistical approaches to tackle these challenges. This thesis aims to investigate and develop statistical approaches that can be used in the analysis of the sparse remote-sensing image time series of environmental data. Specifically, three aspects of the data are considered, (a) the high dimensionality, which is associated with the volume and the dimension of data, (b) the sparsity, in the sense of high missing percentages and (c) the spatial/temporal structures, including the patterns and the correlations. Initially, methods for temporal and spatial modelling are explored and implemented with care, e.g. harmonic regression and bivariate spline regression with residual correlation structures. In recognizing the drawbacks of these methods, functional data analysis is employed as a general approach in this thesis. Specifically, functional principal component analysis (FPCA) is used to achieve the goal of dimension reduction. Bivariate basis functions are proposed to transform the satellite image data, which typically consists of thousands/millions of pixels, into functional data with low dimensional representations. This approach has the advantage of identifying spatial variation patterns through the principal component (PC) loadings, i.e. eigenfunctions. To overcome the high missing percentages that might invalidate the standard implementation of the FPCA, the mixed model FPCA (MM-FPCA) was investigated in Chapter 3. Through estimating the PCs using a mixed effect model, the influence of sparsity could be accounted for appropriately. Data imputation can be obtained from the fitted model using the (truncated) Karhunen-Loeve expansion. The method's applicability to sparse image series is examined through a simulation study. To incorporate the temporal dependence into the MM-FPCA, a novel spatio-temporal model consisting of a state space component and a FPCA component is proposed in Chapter 4. The model, referred to as SS-FPCA in the thesis, is developed based on the dynamic spatio-temporal model framework. The SS-FPCA exploits a flexible hierarchical design with (a) a data model consisting of a time varying mean function and random component for the common spatial variation patterns formulated as the FPCA, (b) a process model specifying the type of temporal dynamic of the mean function and (c) a parameter model ensuring the identifiability of the model components. A 2-cycle alternating expectation - conditional maximization (AECM) algorithm is proposed to estimate the SS-FPCA model. The AECM algorithm allows different data augmentations and parameter combinations in various cycles within an iteration, which in this case results in analytical solutions for all the MLEs of model parameters. The algorithm uses the Kalman filter/smoother to update the system states according to the data model and the process model. Model investigations are carried out in Chapter 5, including a simulation study on a 1-dimensional space to assess the performance of the model and the algorithm. This is accompanied by a brief summary of the asymptotic results of the EM-type algorithm, some of which can be used to approximate the standard errors of model estimates. Applications of the MM-FPCA and SS-FPCA to the remote-sensing lake surface water temperature and Chlorophyll data of Lake Victoria (obtained from the European Space Agency's Envisat mission) are presented at the end of Chapter 3 and 5. Remarks on the implications and limitations of these two methods are provided in Chapter 6, along with the potential future extensions of both methods. The Appendices provide some additional theorems, computation and derivation details of the methods investigated in the thesis. HA Statistics ; QA Mathematics
59	Sparse hierarchical Bayesian models for detecting relevant antigenic sites in virus evolution Davies, Vinny January 2016 (has links) Understanding how virus strains offer protection against closely related emerging strains is vital for creating effective vaccines. For many viruses, including Foot-and-Mouth Disease Virus (FMDV) and the Influenza virus where multiple serotypes often co-circulate, in vitro testing of large numbers of vaccines can be infeasible. Therefore the development of an in silico predictor of cross-protection between strains is important to help optimise vaccine choice. Vaccines will offer cross-protection against closely related strains, but not against those that are antigenically distinct. To be able to predict cross-protection we must understand the antigenic variability within a virus serotype, distinct lineages of a virus, and identify the antigenic residues and evolutionary changes that cause the variability. In this thesis we present a family of sparse hierarchical Bayesian models for detecting relevant antigenic sites in virus evolution (SABRE), as well as an extended version of the method, the extended SABRE (eSABRE) method, which better takes into account the data collection process. The SABRE methods are a family of sparse Bayesian hierarchical models that use spike and slab priors to identify sites in the viral protein which are important for the neutralisation of the virus. In this thesis we demonstrate how the SABRE methods can be used to identify antigenic residues within different serotypes and show how the SABRE method outperforms established methods, mixed-effects models based on forward variable selection or l1 regularisation, on both synthetic and viral datasets. In addition we also test a number of different versions of the SABRE method, compare conjugate and semi-conjugate prior specifications and an alternative to the spike and slab prior; the binary mask model. We also propose novel proposal mechanisms for the Markov chain Monte Carlo (MCMC) simulations, which improve mixing and convergence over that of the established component-wise Gibbs sampler. The SABRE method is then applied to datasets from FMDV and the Influenza virus in order to identify a number of known antigenic residue and to provide hypotheses of other potentially antigenic residues. We also demonstrate how the SABRE methods can be used to create accurate predictions of the important evolutionary changes of the FMDV serotypes. In this thesis we provide an extended version of the SABRE method, the eSABRE method, based on a latent variable model. The eSABRE method takes further into account the structure of the datasets for FMDV and the Influenza virus through the latent variable model and gives an improvement in the modelling of the error. We show how the eSABRE method outperforms the SABRE methods in simulation studies and propose a new information criterion for selecting the random effects factors that should be included in the eSABRE method; block integrated Widely Applicable Information Criterion (biWAIC). We demonstrate how biWAIC performs equally to two other methods for selecting the random effects factors and combine it with the eSABRE method to apply it to two large Influenza datasets. Inference in these large datasets is computationally infeasible with the SABRE methods, but as a result of the improved structure of the likelihood, we are able to show how the eSABRE method offers a computational improvement, leading it to be used on these datasets. The results of the eSABRE method show that we can use the method in a fully automatic manner to identify a large number of antigenic residues on a variety of the antigenic sites of two Influenza serotypes, as well as making predictions of a number of nearby sites that may also be antigenic and are worthy of further experiment investigation. 576 HA Statistics
60	Methods for change-point detection with additional interpretability Schröder, Anna Louise January 2016 (has links) The main purpose of this dissertation is to introduce and critically assess some novel statistical methods for change-point detection that help better understand the nature of processes underlying observable time series. First, we advocate the use of change-point detection for local trend estimation in financial return data and propose a new approach developed to capture the oscillatory behaviour of financial returns around piecewise-constant trend functions. Core of the method is a data-adaptive hierarchically-ordered basis of Unbalanced Haar vectors which decomposes the piecewise-constant trend underlying observed daily returns into a binary-tree structure of one-step constant functions. We illustrate how this framework can provide a new perspective for the interpretation of change points in financial returns. Moreover, the approach yields a family of forecasting operators for financial return series which can be adjusted flexibly depending on the forecast horizon or the loss function. Second, we discuss change-point detection under model misspecification, focusing in particular on normally distributed data with changing mean and variance. We argue that ignoring the presence of changes in mean or variance when testing for changes in, respectively, variance or mean, can affect the application of statistical methods negatively. After illustrating the difficulties arising from this kind of model misspecification we propose a new method to address these using sequential testing on intervals with varying length and show in a simulation study how this approach compares to competitors in mixed-change situations. The third contribution of this thesis is a data-adaptive procedure to evaluate EEG data, which can improve the understanding of an epileptic seizure recording. This change-point detection method characterizes the evolution of frequencyspecific energy as measured on the human scalp. It provides new insights to this high dimensional high frequency data and has attractive computational and scalability features. In addition to contrasting our method with existing approaches, we analyse and interpret the method’s output in the application to a seizure data set. 519.5 HA Statistics

Search results