111 |
Monotone local linear estimation of transducer functionsHughes, David January 2014 (has links)
Local polynomial regression has received a great deal of attention in the past. It is a highly adaptable regression method when the true response model is not known. However, estimates obtained in this way are not guaranteed to be monotone. In some situations the response is known to depend monotonically upon some variables. Various methods have been suggested for constraining nonparametric local polynomial regression to be monotone. The earliest of these is known as the Pool Adjacent Violators algorithm (PAVA) and was first suggested by Brunk (1958). Kappenman (1987) suggested that a non-parametric estimate could be made monotone by simply increasing the bandwidth used until the estimate was monotone. Dette et al. (2006) have suggested a monotonicity constraint which they call the DNP method. Their method involves calculating a density estimate of the unconstrained regression estimate, and using this to calculate an estimate of the inverse of the regression function. Fan, Heckman and Wand (1995) generalized local polynomial regression to quasi-likelihood based settings. Obviously such estimates are not guaranteed to be monotone, whilst in many practical situations monotonicity of response is required. In this thesis I discuss how the above mentioned monotonicity constraint methods can be adapted to the quasi-likelihood setting. I am particularly interested in the estimation of monotone psychometric functions and, more generally, biological transducer functions, for which the response is often known to follow a distribution which belongs to the exponential family. I consider some of the key theoretical properties of the monotonised local linear estimators in the quasi-likelihood setting. I establish asymptotic expressions for the bias and variance for my adaptation of the DNP method (called the LDNP method) and show that this estimate is asymptotically normally distributed and first{-}order equivalent to competing methods. I demonstrate that this adaptation overcomes some of the problems with using the DNP method in likelihood based settings. I also investigate the choice of second bandwidth for use in the density estimation step. I compare the LDNP method, the PAVA method and the bandwidth method by means of a simulation study. I investigate a variety of response models, including binary, Poisson and exponential. In each study I calculate monotone estimates of the response curve using each method and compare their bias, variance, MSE and MISE. I also apply these methods to analysis of data from various hearing and vision studies. I show some of the deficiencies of using local polynomial estimates, as opposed to local likelihood estimates.
|
112 |
Asymmetry and other distributional properties in medical research dataPartlett, Christopher January 2015 (has links)
The central theme of this thesis is to investigate the use of non-parametric methods for making inferences about a random sample with an unknown distribution function. The overarching aim is the development of new methods to make inferences regarding the nature of the unknown distribution to enhance medical research. Initially,the focus is exclusively on the asymmetry of a random variable. In particular, a recently proposed measure of asymmetry provides the foundation for the proposal and development of a new test for symmetry. The potential applications of the test and measure are applied to a number of medical research settings including randomised trials. Moreover, guidance is provided on its implementation, with particular emphasis on the problem of small sample estimation. This investigation is then generalised to examine asymmetry across multiple studies. In particular, meta-analysis methods are used to synthesise information about the amount of asymmetry in several studies. Further, a detailed simulation study is carried out to investigate the impact of asymmetry on linear models and meta-analyses of randomised trials, in terms of the accuracy of the treatment effect estimate and the coverage of confidence and prediction intervals. Finally, the scope of the investigation is widened to encompass the problem of comparing and synthesising information about the probability density function and cumulative distribution function, based on samples from multiple studies. The meta-analysis of the smooth distribution function estimate is then applied to propose new methods for conducting meta-analyses of diagnostic test accuracy, which have a number of merits compared to the existing methodology.
|
113 |
The design of dynamic and nonlinear models in cash flow predictionPang, Yang January 2015 (has links)
This thesis is concerned with designing a novel model for cash flow prediction. Cash flow and earnings are both important measures of a firm’s profit. The extant literature has discussed different models that have been applied to cash flow prediction. However, previous studies have not made attempts to address the dynamics in the cash flow model parameters, which are potentially nonlinear processes. This thesis proposes a grey-box model to capture the nonlinearity and dynamics of the cash flow model parameters. The parameters are modelled as a black box, which adopts a Padé approximant as the functional form and two exogenous variables as input variables that are considered to have explanatory power for the parameter process. Besides, this thesis also employs a Bayesian forecasting model in an attempt to capture the parameter dynamics of the cash flow modelling process. The Bayesian model has the advantage of applicability in the case of a limited number of observations. Compared with the grey-box model, the Bayesian model places linear restriction on the parameter dynamics. The prior is required for the implementation of the Bayesian model and this thesis uses the results of a random parameter model as the prior. In addition, panel data estimation methods are also applied to see whether they could outperform the pooled regression that is widely applied in the extant literature. There are four datasets employed in this thesis for the examination of various models’ performance in predicting cash flow. All datasets are in panel form. This work studies the pattern of net operating cash flow (or cash flow to asset ratio) along with time for different datasets. Out-of-sample comparison is conducted among the applied models and two measures of performance are selected to compare the practical predictive power of the models. The designed grey-box model has promising and encouraging performance in all the datasets, especially for U.S. listed firms. However, the Bayesian model does not appear to be superior compared to the simple benchmark models in making practical prediction. Similarly, the panel data models also cannot beat pooled regression. In this thesis, the traditional discounted cash flow model for equity valuation is employed to take account of the cash flow prediction models that have been developed to obtain the theoretical value of equities based on the cash flows predicted by the various models developed in this thesis. The reported results show that simpler models such as the random walk model is closer to market expectation of future cash flows because it leads to a better fitness for the market share prices using the new discounting model. The results reported in this thesis show that the new valuation models developed in this thesis could have investment value. This thesis has made contributions in both theoretical and practical aspects. Through the derivation of various models, it is found that there exists potential nonlinearity and dynamic feature in cash flow prediction models. Therefore, it is crucial to capture the nonlinearity using particular tools. In addition, this thesis builds up a framework, which can be used to analyse problems of similar kinds, such as panel data prediction. The models are derived from theoretical level and then applied to analyse empirical data. The promising results suggest that in practice, the models developed in this work could provide useful guidance for people who make decisions.
|
114 |
Model selection and model averaging in the presence of missing valuesGopal Pillay, Khuneswari January 2015 (has links)
Model averaging has been proposed as an alternative to model selection which is intended to overcome the underestimation of standard errors that is a consequence of model selection. Model selection and model averaging become more complicated in the presence of missing data. Three different model selection approaches (RR, STACK and M-STACK) and model averaging using three model-building strategies (non-overlapping variable sets, inclusive and restrictive strategies) were explored to combine results from multiply-imputed data sets using a Monte Carlo simulation study on some simple linear and generalized linear models. Imputation was carried out using chained equations (via the "norm" method in the R package MICE). The simulation results showed that the STACK method performs better than RR and M-STACK in terms of model selection and prediction, whereas model averaging performs slightly better than STACK in terms of prediction. The inclusive and restrictive strategies perform better in terms of prediction, but non-overlapping variable sets performs better for model selection. STACK and model averaging using all three model-building strategies were proposed to combine the results from a multiply-imputed data set from the Gateshead Millennium Study (GMS). The performance of STACK and model averaging was compared using mean square error of prediction (MSE(P)) in a 10% cross-validation test. The results showed that STACK using an inclusive strategy provided a better prediction than model averaging. This coincides with the results obtained through a mimic simulation study of GMS data. In addition, the inclusive strategy for building imputation and prediction models was better than the non-overlapping variable sets and restrictive strategy. The presence of highly correlated covariates and response is believed to have led to better prediction in this particular context. Model averaging using non-overlapping variable sets performs better only if an auxiliary variable is available. However, STACK using an inclusive strategy performs well when there is no auxiliary variable available. Therefore, it is advisable to use STACK with an inclusive model-building strategy and highly correlated covariates (where available) to make predictions in the presence of missing data. Alternatively, model averaging with non-overlapping variables sets can be used if an auxiliary variable is available.
|
115 |
Survival modelling in mathematical and medical statisticsHua, Hairui January 2015 (has links)
An essential aspect of survival analysis is the estimation and prediction of survival probabilities for individuals. For this purpose, mathematical modelling of the hazard rate function is a fundamental issue. This thesis focuses on the novel estimation and application of hazard rate functions in mathematical and medical research. In mathematical research we focus on the development of a semiparametric kernel-based estimate of hazard rate function and a L\(_1\) error optimal kernel hazard rate estimate. In medical research we concentrate on the development and validation of survival models using individual participant data from multiple studies. We also consider how to fit survival models that predict individual response to treatment effectiveness, given IPD from multiple trials.
|
116 |
Thinking God : the mysticism of rabbi Zadok of Lublin /Brill, Alan. January 1900 (has links)
Texte remanié de: Diss. / Bibliogr. p. 416-462. Index.
|
117 |
To p, or not to p? : quantifying inferential decision errors to assess whether significance truly is significantAbdey, James Spencer January 2009 (has links)
Empirical testing is centred on p-values. These summary statistics are used to assess the plausibility of a null hypothesis, and therein lies a flaw in their interpretation. Central to this research is accounting for the behaviour of p-values, through density functions, under the alternative hypothesis, H1. These densities are determined by a combination of the sample size and parametric specification of H1. Here, several new contributions are presented to reflect p-value behaviour. By considering the likelihood of both hypotheses in parallel, it is possible to optimise the decision-making process. A framework for simultaneously testing the null and alternative hypotheses is outlined for various testing scenarios. To facilitate efficient empirical conclusions, a new set of critical value tables is presented requiring only the conventional p-value, hence avoiding the need for additional computation in order to apply this joint testing in practice. Simple and composite forms of H1 are considered. Recognising the conflict between different schools of thought with respect to hypothesis testing, a unified approach at consolidating the advantages of each is offered. Again, exploiting p-value distributions under various forms of H1, a revised conditioning statistic for conditional frequentist testing is developed from which original p-value curves and surfaces are produced to further ease decision making. Finally, attention turns to multiple hypothesis testing. Estimation of multiple testing error rates is discussed and a new estimator for the proportion of true null hypotheses, when simultaneously testing several independent hypotheses, is presented. Under certain conditions it is shown that this estimator is superior to an established estimator.
|
118 |
Robust asset allocation under model ambiguityTobelem-Foldvari, Sandrine January 2010 (has links)
A decision maker, when facing a decision problem, often considers several models to represent the outcomes of the decision variable considered. More often than not, the decision maker does not trust fully any of those models and hence displays ambiguity or model uncertainty aversion. In this PhD thesis, focus is given to the specific case of asset allocation problem under ambiguity faced by financial investors. The aim is not to find an optimal solution for the investor, but rather come up with a general methodology that can be applied in particular to the asset allocation problem and allows the investor to find a tractable, easy to compute solution for this problem, taking into account ambiguity. This PhD thesis is structured as follows: First, some classical and widely used models to represent asset returns are presented. It is shown that the performance of the asset portfolios built using those single models is very volatile. No model performs better than the others consistently over the period considered, which gives empirical evidence that: no model can be fully trusted over the long run and that several models are needed to achieve the best asset allocation possible. Therefore, the classical portfolio theory must be adapted to take into account ambiguity or model uncertainty. Many authors have in an early stage attempted to include ambiguity aversion in the asset allocation problem. A review of the literature is studied to outline the main models proposed. However, those models often lack flexibility and tractability. The search for an optimal solution to the asset allocation problem when considering ambiguity aversion is often difficult to apply in practice on large dimension problems, as the ones faced by modern financial investors. This constitutes the motivation to put forward a novel methodology easily applicable, robust, flexible and tractable. The Ambiguity Robust Adjustment (ARA) methodology is theoretically presented and then tested on a large empirical data set. Several forms of the ARA are considered and tested. Empirical evidence demonstrates that the ARA methodology improves portfolio performances greatly. Through the specific illustration of the asset allocation problem in finance, this PhD thesis proposes a new general methodology that will hopefully help decision makers to solve numerous different problems under ambiguity.
|
119 |
Sparse modelling and estimation for nonstationary time series and high-dimensional dataCho, Haeran January 2010 (has links)
Sparse modelling has attracted great attention as an efficient way of handling statistical problems in high dimensions. This thesis considers sparse modelling and estimation in a selection of problems such as breakpoint detection in nonstationary time series, nonparametric regression using piecewise constant functions and variable selection in high-dimensional linear regression. We first propose a method for detecting breakpoints in the secondorder structure of piecewise stationary time series, assuming that those structural breakpoints are sufficiently scattered over time. Our choice of time series model is the locally stationary wavelet process (Nason et al., 2000), under which the entire second-order structure of a time series is described by wavelet-based local periodogram sequences. As the initial stage of breakpoint detection, we apply a binary segmentation procedure to wavelet periodogram sequences at each scale separately, which is followed by within-scale and across-scales postprocessing steps. We show that the combined methodology achieves consistent estimation of the breakpoints in terms of their total number and locations, and investigate its practical performance using both simulated and real data. Next, we study the problem of nonparametric regression by means of piecewise constant functions, which are known to be flexible in approximating a wide range of function spaces. Among many approaches developed for this purpose, we focus on comparing two well-performing techniques, the taut string (Davies & Kovac, 2001) and the Unbalanced Haar (Fryzlewicz, 2007) methods. While the multiscale nature of the latter is easily observed, it is not so obvious that the former can also be interpreted as multiscale. We provide a unified, multiscale representation for both methods, which offers an insight into the relationship between them as well as suggesting some lessons that both methods can learn from each other. Lastly, one of the most widely-studied applications of sparse modelling and estimation is considered, variable selection in high-dimensional linear regression. High dimensionality of the data brings in many complications including (possibly spurious) non-negligible correlations among the variables, which may result in marginal correlation being unreliable as a measure of association between the variables and the response. We propose a new way of measuring the contribution of each variable to the response, which adaptively takes into account high correlations among the variables. A key ingredient of the proposed tilting procedure is hard-thresholding sample correlation of the design matrix, which enables a data-driven switch between the use of marginal correlation and tilted correlation for each variable. We study the conditions under which this measure can discriminate between relevant and irrelevant variables, and thus be used as a tool for variable selection. In order to exploit these theoretical properties of tilted correlation, we construct an iterative variable screening algorithm and examine its practical performance in a comparative simulation study.
|
120 |
A Bayesian approach to modelling mortality, with applications to insuranceCairns, George Lindsay January 2013 (has links)
The purpose of this research was to use Bayesian statistics to develop flexible mortality models that could be used to forecast human mortality rates. Several models were developed as extensions to existing mortality models, in particular the Lee-Carter mortality model and the age-period-cohort model, by including some of the following features: age-period and age-cohort interactions, random effects on mortality, measurement errors in population count and smoothing of the mortality rate surface. One expects mortality rates to change in a relatively smooth manner between neighbouring ages or between neighbouring years or neighbouring cohorts. The inclusion of random effects in some of the models captures additional fluctuations in these effects. This smoothing is incorporated in the models by ensuring that the age, period and cohort parameters of the models have a relatively smooth sequence which is achieved through the choice of the prior distribution of the parameters. Three different smoothing priors were employed: a random walk, a random walk on first differences of the parameters and an autoregressive model of order one on the first differences of the parameters. In any model only one form of smoothing was used. The choice of smoothing prior not only imposes different patterns of smoothing on the parameters but is seen to be very influential when making mortality forecasts. The mortality models were fitted, using Bayesian methods, to population data for males and females from England and Wales. The fits of the models were analysed and compared using analysis of residuals, posterior predictive intervals for both in-sample and out-of-sample data and the Deviance Information Criterion. The models fitted the data better than did both the Lee-Carter model and the age-period-cohort model. From the analysis undertaken, for any given age and calendar year, the preferred model based on the Deviance Information Criterion score, for male and female death counts was a Poisson model with the mean parameter equal to the number of lives exposed to risk of dying for that age in that calendar year multiplied by a mortality parameter. The logit of this mortality parameter was a function of the age, year (period) and cohort with additional interactions between the age and period parameters and between the age and cohort parameters. The form of parameter smoothing that suited the males was an autoregressive model of order one on the first differences of the parameters and that for the females was a random walk. Moreover, it was found useful to add Gaussian random effects to account for overdispersion caused by unobserved heterogeneity in the population mortality. The research concluded by the application of a selection of these models to the provision of forecasts of period and cohort life expectancies as well as the numbers of centenarians for males and females in England and Wales. In addition, the thesis illustrated how Bayesian mortality models could be used to consider the impact of the new European Union solvency regulations for insurers (Solvency II) for longevity risk. This research underlined the important role that Bayesian stochastic mortality models can play in considering longevity risk.
|
Page generated in 0.0293 seconds