Global ETD Search

101	Identification of causal effects using the 1995 earthquake in Japan : studies of education and health Aoki, Yu January 2012 (has links) This thesis aims to identify causal effects using a natural experimental approach. We focus on the Great Hanshin-Awaji Earthquake in midwestern Japan as a source of exogenous variation in the variables of interest. Chapter 1 explores the causal effect of schooling on juvenile delinquency using variation in schooling caused by policy interventions in specific municipalities after the earthquake. Using the instrumental variable estimator to address endogeneity problems arising from simultaneity and unobserved heterogeneity, we find that schooling reduces juvenile delinquency, although some of our estimates have large standard errors and are imprecisely estimated. The results indicate that a one-percentage-point increase in the high school participation rate reduces the number of juvenile arrests by approximately 1.1 per 1,000 youths. 1 Estimates of social benefits show that it is less expensive to reach a target level of social benefits by improving schooling than by strengthening police forces. Chapter 2 studies the causal effect of volunteer work on the mortality of the elderly. After the earthquake, levels of volunteering increased considerably in municipalities hit by the earthquake, while other municipalities did not experience such a sharp increase. This exogenous shift in levels of volunteering is exploited to address the endogeneity problem associated with estimating the effects of volunteering. Specifically, unobserved heterogeneity across municipalities that affects both morality and the level of volunteering, such as the quality of local health care services, may bias estimates on the effect of volunteering. The results indicate that volunteering has no significant effect on mortality amongst people in their 50s and 60s, while it significantly reduces mortality amongst people in their 70s and 80s or older. Evaluated at the mean, the estimate implies that the life of approximately one person aged 80 or older (out of 186 persons) is saved in a given year when the number of volunteers increases by 100 (out of 1,911 persons). 330 HA Statistics ; HM Sociology
102	Aspects of generative and discriminative classifiers Xue, Jinghao January 2008 (has links) Meanwhile, we suggest that the so-called output-dependent HMMs could be represented in a state-dependent manner, and vice versa, essentially by application of Bayes' theorem. Finally, in Chapter \ref{ch:img}, we present discriminative approaches to histogram-based image thresholding, in which the optimal threshold is derived from the maximum likelihood based on the conditional distribution $p(y\|x)$ of $y$, the class indicator of a grey level $x$, given $x$. The discriminative approaches can be regarded as discriminative extensions of the traditional generative approaches to thresholding, such as Otsu's method~\citep{Otsu:79} and Kittler and Illingworth's minimum error thresholding (MET)~\citep{Kittler:86}. As illustrations, we develop discriminative versions of Otsu's method and MET by using discriminant functions corresponding to the original methods to represent $p(y\|x)$. These two discriminative thresholding approaches are compared with their original counterparts on selecting thresholds for a variety of histograms of mixture distributions. Results show that the discriminative Otsu method consistently provides relatively good performance. Although being of higher computational complexity than the original methods in parameter estimation, its robustness and model simplicity can justify the discriminative Otsu method for scenarios in which the risk of model mis-specification is high and the computation is not demanding. 519.5 HA Statistics : QA Mathematics
103	Measuring the reproducibility of and comparability between physiological and psychological responses in exercise testing Zare, Shahram January 1997 (has links) Chapter 1 gives a brief background to Exercise Testing and its importance as well as a literature review of relevant topics including reproducibility, comparability, components of variance and the estimation of common correlation; the latter two are essential building blocks for the estimation of Comparability. Chapter 2 deals with the estimation of measurement reproducibility of data from mixed effects models involving two variance components. Two approaches, one based on sums of squares and the other on Profile Likelihood are used for the separate cases of balanced and unbalanced data. This is carried out in two distinct contexts, one for simple replication and the other assuming an order effect to the replications. Applicability of the approaches to Exercise Testing data shows that while point estimates from both approaches are often identical, interval estimates from the Profile Likelihood approach tend to be narrower. Chapter 3 involves a simulation study to investigate and assess the performances of the two approaches. Data are simulated from a variety of underlying configurations and the performances then compared according to three statistical criteria. The results of this study again favour the Profile Likelihood approach. The estimation of Comparability between two variables is the other aspect of the thesis put forward in chapter 4 where, first of all, the estimation of a common correlation coefficient from a population of correlation coefficients is considered. Five different methods for point and interval estimation of a common correlation coefficient are introduced. An illustrative example using data from an Exercise Testing procedure is used to compare the performances of the methods. Further investigation on the performances of the five methods was carried out by means of a simulation study across a variety of underlying configurations. The overall results suggest the 'Fisher method' as the best method of point and interval estimate of common correlation. Finally, chapter 6 outlines the conclusions from the previous chapters and suggests some ideas for further work. 610 HA Statistics : QP Physiology
104	Statistical issues in modelling the ancestry from Y-chromosome and surname data Sharif, Maarya January 2012 (has links) A considerable industry has grown-up around genealogical inference from genetic testing, supplementing more traditional genealogical techniques but with very limited quantification of uncertainty. In many societies Y-chromosomes are co-inherited with surnames and as such passed down from father to son. This thesis seeks to explore what the correlation can say about ancestry. In particular it is concerned with estimation of the time to the most recent common paternal ancestor (TMRCA) for pairs of males who are not known to be directly related but share the same surname, based on the repeat number at short tandem repeat (STR) markers on their Y-chromosomes. We develop a model of TMRCA estimation based on the difference in repeat numbers in pairs of male haplotypes using a Bayesian framework and Markov-Chain Monte-Carlo techniques, such as adaptive Metropolis-Hastings algorithm. The model incorporates the process of STR discovery and the calibration of mutation rates, which can differ across STRs. In simulation studies, we find that the estimates of TMRCA are rather robust to the ascertainment process and the way in which it is modelled. However, they are affected by the site-specific mutation rates at the typed STRs. Indeed sequencing the fastest mutating STRs yields a lower error in the estimated TMRCA than random STRs. In the British context, we extend our model to include additional information such as the haplogroup status (as determined from single nucleotide polymorphisms, SNPs) of the pair of males, as well as the frequency and origin of the surname. In general, the effect of this is to reduce estimates of the TMRCA for pairs of males with an older TMRCA, typically outwith the period of surname establishment (about 500-700 years ago). In the genealogical context, incorporating surname frequency (within the prior distribution) results in lower estimates of TMRCA for pairs of males who appear to have diverged from a common male ancestor since the period of surname establishment. In addition, we include uncertainty in the years per generation conversion factor in our model. 576.58 QH426 Genetics ; HA Statistics
105	Modelling via normalisation for parametric and nonparametric inference Kolossiatis, Michalis January 2009 (has links) Bayesian nonparametric modelling has recently attracted a lot of attention, mainly due to the advancement of various simulation techniques, and especially Monte Carlo Markov Chain (MCMC) methods. In this thesis I propose some Bayesian nonparametric models for grouped data, which make use of dependent random probability measures. These probability measures are constructed by normalising infinitely divisible probability measures and exhibit nice theoretical properties. Implementation of these models is also easy, using mainly MCMC methods. An additional step in these algorithms is also proposed, in order to improve mixing. The proposed models are applied on both simulated and real-life data and the posterior inference for the parameters of interest are investigated, as well as the effect of the corresponding simulation algorithms. A new, n-dimensional distribution on the unit simplex, that contains many known distributions as special cases, is also proposed. The univariate version of this distribution is used as the underlying distribution for modelling binomial probabilities. Using simulated and real data, it is shown that this proposed model is particularly successful in modelling overdispersed count data. 519 HG Finance : HA Statistics
106	A comparison of data envelopment analysis and stochastic frontiers as methods for assessing the efficiencies of organisational units Read, Laura Elizabeth January 1998 (has links) This thesis gives an overall view of the two most commonly used approaches for measuring the relative efficiencies of organisational units. The two approaches, data envelopment analysis (DEA) and stochastic frontiers (SF), are supposedly estimating the same underlying efficiency values but the natures of the two methods are very different. This can lead to different estimates for some, or all, of the units in an analysis. By identifying the nature of these differences this work shows that it is possible to gain some insight into the nature of the underlying data and to say more confidently which of the two estimates is closer to the true efficiency for individual units. In order to investigate the differences between the methods across different facets of the technology two important dimensions are chosen. Firstly differences across scale size are investigated. It is shown how it is possible to define a measure of scale size in both the single output and multiple input and output cases. This measure of scale size can then be used to split the technology into regions of differing scale size enabling, for example, tests for the true nature of returns to scale in DEA. The measure of scale size developed in multiple dimensions necessitates a method for estimating an homothetic, constant returns to scale function. Differences between the approaches across input mix are also investigated. These differences may highlight the abilities of the methods to correctly identify the elasticity of substitution between the inputs. The results of the comparisons between the methods are summarised. This summary gives possible reasons for differences which may be found between the results of the two approaches, and an indication of what the nature of the estimates may be to the true efficiency values. An algorithm is then developed for using a comparison of the results from the two methods to help to identify the better estimates. 338.6041 HA Statistics : HF Commerce
107	Monotone local linear estimation of transducer functions Hughes, David January 2014 (has links) Local polynomial regression has received a great deal of attention in the past. It is a highly adaptable regression method when the true response model is not known. However, estimates obtained in this way are not guaranteed to be monotone. In some situations the response is known to depend monotonically upon some variables. Various methods have been suggested for constraining nonparametric local polynomial regression to be monotone. The earliest of these is known as the Pool Adjacent Violators algorithm (PAVA) and was first suggested by Brunk (1958). Kappenman (1987) suggested that a non-parametric estimate could be made monotone by simply increasing the bandwidth used until the estimate was monotone. Dette et al. (2006) have suggested a monotonicity constraint which they call the DNP method. Their method involves calculating a density estimate of the unconstrained regression estimate, and using this to calculate an estimate of the inverse of the regression function. Fan, Heckman and Wand (1995) generalized local polynomial regression to quasi-likelihood based settings. Obviously such estimates are not guaranteed to be monotone, whilst in many practical situations monotonicity of response is required. In this thesis I discuss how the above mentioned monotonicity constraint methods can be adapted to the quasi-likelihood setting. I am particularly interested in the estimation of monotone psychometric functions and, more generally, biological transducer functions, for which the response is often known to follow a distribution which belongs to the exponential family. I consider some of the key theoretical properties of the monotonised local linear estimators in the quasi-likelihood setting. I establish asymptotic expressions for the bias and variance for my adaptation of the DNP method (called the LDNP method) and show that this estimate is asymptotically normally distributed and first{-}order equivalent to competing methods. I demonstrate that this adaptation overcomes some of the problems with using the DNP method in likelihood based settings. I also investigate the choice of second bandwidth for use in the density estimation step. I compare the LDNP method, the PAVA method and the bandwidth method by means of a simulation study. I investigate a variety of response models, including binary, Poisson and exponential. In each study I calculate monotone estimates of the response curve using each method and compare their bias, variance, MSE and MISE. I also apply these methods to analysis of data from various hearing and vision studies. I show some of the deficiencies of using local polynomial estimates, as opposed to local likelihood estimates. 510 HA Statistics ; QA Mathematics
108	Asymmetry and other distributional properties in medical research data Partlett, Christopher January 2015 (has links) The central theme of this thesis is to investigate the use of non-parametric methods for making inferences about a random sample with an unknown distribution function. The overarching aim is the development of new methods to make inferences regarding the nature of the unknown distribution to enhance medical research. Initially,the focus is exclusively on the asymmetry of a random variable. In particular, a recently proposed measure of asymmetry provides the foundation for the proposal and development of a new test for symmetry. The potential applications of the test and measure are applied to a number of medical research settings including randomised trials. Moreover, guidance is provided on its implementation, with particular emphasis on the problem of small sample estimation. This investigation is then generalised to examine asymmetry across multiple studies. In particular, meta-analysis methods are used to synthesise information about the amount of asymmetry in several studies. Further, a detailed simulation study is carried out to investigate the impact of asymmetry on linear models and meta-analyses of randomised trials, in terms of the accuracy of the treatment effect estimate and the coverage of confidence and prediction intervals. Finally, the scope of the investigation is widened to encompass the problem of comparing and synthesising information about the probability density function and cumulative distribution function, based on samples from multiple studies. The meta-analysis of the smooth distribution function estimate is then applied to propose new methods for conducting meta-analyses of diagnostic test accuracy, which have a number of merits compared to the existing methodology. 510 HA Statistics ; QA Mathematics
109	The design of dynamic and nonlinear models in cash flow prediction Pang, Yang January 2015 (has links) This thesis is concerned with designing a novel model for cash flow prediction. Cash flow and earnings are both important measures of a firm’s profit. The extant literature has discussed different models that have been applied to cash flow prediction. However, previous studies have not made attempts to address the dynamics in the cash flow model parameters, which are potentially nonlinear processes. This thesis proposes a grey-box model to capture the nonlinearity and dynamics of the cash flow model parameters. The parameters are modelled as a black box, which adopts a Padé approximant as the functional form and two exogenous variables as input variables that are considered to have explanatory power for the parameter process. Besides, this thesis also employs a Bayesian forecasting model in an attempt to capture the parameter dynamics of the cash flow modelling process. The Bayesian model has the advantage of applicability in the case of a limited number of observations. Compared with the grey-box model, the Bayesian model places linear restriction on the parameter dynamics. The prior is required for the implementation of the Bayesian model and this thesis uses the results of a random parameter model as the prior. In addition, panel data estimation methods are also applied to see whether they could outperform the pooled regression that is widely applied in the extant literature. There are four datasets employed in this thesis for the examination of various models’ performance in predicting cash flow. All datasets are in panel form. This work studies the pattern of net operating cash flow (or cash flow to asset ratio) along with time for different datasets. Out-of-sample comparison is conducted among the applied models and two measures of performance are selected to compare the practical predictive power of the models. The designed grey-box model has promising and encouraging performance in all the datasets, especially for U.S. listed firms. However, the Bayesian model does not appear to be superior compared to the simple benchmark models in making practical prediction. Similarly, the panel data models also cannot beat pooled regression. In this thesis, the traditional discounted cash flow model for equity valuation is employed to take account of the cash flow prediction models that have been developed to obtain the theoretical value of equities based on the cash flows predicted by the various models developed in this thesis. The reported results show that simpler models such as the random walk model is closer to market expectation of future cash flows because it leads to a better fitness for the market share prices using the new discounting model. The results reported in this thesis show that the new valuation models developed in this thesis could have investment value. This thesis has made contributions in both theoretical and practical aspects. Through the derivation of various models, it is found that there exists potential nonlinearity and dynamic feature in cash flow prediction models. Therefore, it is crucial to capture the nonlinearity using particular tools. In addition, this thesis builds up a framework, which can be used to analyse problems of similar kinds, such as panel data prediction. The models are derived from theoretical level and then applied to analyse empirical data. The promising results suggest that in practice, the models developed in this work could provide useful guidance for people who make decisions. 658.15 HA Statistics ; HG Finance
110	Model selection and model averaging in the presence of missing values Gopal Pillay, Khuneswari January 2015 (has links) Model averaging has been proposed as an alternative to model selection which is intended to overcome the underestimation of standard errors that is a consequence of model selection. Model selection and model averaging become more complicated in the presence of missing data. Three different model selection approaches (RR, STACK and M-STACK) and model averaging using three model-building strategies (non-overlapping variable sets, inclusive and restrictive strategies) were explored to combine results from multiply-imputed data sets using a Monte Carlo simulation study on some simple linear and generalized linear models. Imputation was carried out using chained equations (via the "norm" method in the R package MICE). The simulation results showed that the STACK method performs better than RR and M-STACK in terms of model selection and prediction, whereas model averaging performs slightly better than STACK in terms of prediction. The inclusive and restrictive strategies perform better in terms of prediction, but non-overlapping variable sets performs better for model selection. STACK and model averaging using all three model-building strategies were proposed to combine the results from a multiply-imputed data set from the Gateshead Millennium Study (GMS). The performance of STACK and model averaging was compared using mean square error of prediction (MSE(P)) in a 10% cross-validation test. The results showed that STACK using an inclusive strategy provided a better prediction than model averaging. This coincides with the results obtained through a mimic simulation study of GMS data. In addition, the inclusive strategy for building imputation and prediction models was better than the non-overlapping variable sets and restrictive strategy. The presence of highly correlated covariates and response is believed to have led to better prediction in this particular context. Model averaging using non-overlapping variable sets performs better only if an auxiliary variable is available. However, STACK using an inclusive strategy performs well when there is no auxiliary variable available. Therefore, it is advisable to use STACK with an inclusive model-building strategy and highly correlated covariates (where available) to make predictions in the presence of missing data. Alternatively, model averaging with non-overlapping variables sets can be used if an auxiliary variable is available. 511.3 HA Statistics ; QA Mathematics

Search results