691 |
Statistical analysis of grouped dataCrafford, Gretel 01 July 2008 (has links)
The maximum likelihood (ML) estimation procedure of Matthews and Crowther (1995: A maximum likelihood estimation procedure when modelling in terms of constraints. South African Statistical Journal, 29, 29-51) is utilized to fit a continuous distribution to a grouped data set. This grouped data set may be a single frequency distribution or various frequency distributions that arise from a cross classification of several factors in a multifactor design. It will also be shown how to fit a bivariate normal distribution to a two-way contingency table where the two underlying continuous variables are jointly normally distributed. This thesis is organized in three different parts, each playing a vital role in the explanation of analysing grouped data with the ML estimation of Matthews and Crowther. In Part I the ML estimation procedure of Matthews and Crowther is formulated. This procedure plays an integral role and is implemented in all three parts of the thesis. In Part I the exponential distribution is fitted to a grouped data set to explain the technique. Two different formulations of the constraints are employed in the ML estimation procedure and provide identical results. The justification of the method is further motivated by a simulation study. Similar to the exponential distribution, the estimation of the normal distribution is also explained in detail. Part I is summarized in Chapter 5 where a general method is outlined to fit continuous distributions to a grouped data set. Distributions such as the Weibull, the log-logistic and the Pareto distributions can be fitted very effectively by formulating the vector of constraints in terms of a linear model. In Part II it is explained how to model a grouped response variable in a multifactor design. This multifactor design arise from a cross classification of the various factors or independent variables to be analysed. The cross classification of the factors results in a total of T cells, each containing a frequency distribution. Distribution fitting is done simultaneously to each of the T cells of the multifactor design. Distribution fitting is also done under the additional constraints that the parameters of the underlying continuous distributions satisfy a certain structure or design. The effect of the factors on the grouped response variable may be evaluated from this fitted design. Applications of a single-factor and a two-factor model are considered to demonstrate the versatility of the technique. A two-way contingency table where the two variables have an underlying bivariate normal distribution is considered in Part III. The estimation of the bivariate normal distribution reveals the complete underlying continuous structure between the two variables. The ML estimate of the correlation coefficient ρ is used to great effect to describe the relationship between the two variables. Apart from an application a simulation study is also provided to support the method proposed. / Thesis (PhD (Mathematical Statistics))--University of Pretoria, 2007. / Statistics / unrestricted
|
692 |
Stochastic Modeling and Statistical AnalysisWu, Ling 01 April 2010 (has links)
The objective of the present study is to investigate option pricing and forecasting problems in finance. This is achieved by developing stochastic models in the framework of classical modeling approach.
In this study, by utilizing the stock price data, we examine the correctness of the existing Geometric Brownian Motion (GBM) model under standard statistical tests. By recognizing the problems, we attempted to demonstrate the development of modified linear models under different data partitioning processes with or without jumps. Empirical comparisons between the constructed and GBM models are outlined.
By analyzing the residual errors, we observed the nonlinearity in the data set. In order to incorporate this nonlinearity, we further employed the classical model building approach to develop nonlinear stochastic models. Based on the nature of the problems and the knowledge of existing nonlinear models, three different nonlinear stochastic models are proposed. Furthermore, under different data partitioning processes with equal and unequal intervals, a few modified nonlinear models are developed. Again, empirical comparisons between the constructed nonlinear stochastic and GBM models in the context of three data sets are outlined.
Stochastic dynamic models are also used to predict the future dynamic state of processes. This is achieved by modifying the nonlinear stochastic models from constant to time varying coefficients, and then time series models are constructed. Using these constructed time series models, the prediction and comparison problems with the existing time series models are analyzed in the context of three data sets. The study shows that the nonlinear stochastic model 2 with time varying coefficients is robust with respect different data sets.
We derive the option pricing formula in the context of three nonlinear stochastic models with time varying coefficients. The option pricing formula in the frame work of hybrid systems, namely, Hybrid GBM (HGBM) and hybrid nonlinear stochastic models are also initiated.
Finally, based on our initial investigation about the significance of presented nonlinear stochastic models in forecasting and option pricing problems, we propose to continue and further explore our study in the context of nonlinear stochastic hybrid modeling approach.
|
693 |
Evaluation of statistical cloud parameterizationsBrück, Heiner Matthias 06 October 2016 (has links)
This work is motivated by the question: how much complexity is appropriate for a cloud parameterization used in general circulation models (GCM).
To approach this question, cloud parameterizations across the complexity range are explored using general circulation models and theoretical Monte-Carlo simulations. Their results are compared with high-resolution satellite observations and simulations that resolve the GCM subgrid-scale variability explicitly.
A process-orientated evaluation is facilitated by GCM forecast simulations which reproduce the synoptic state.
For this purpose novel methods were develop to
a) conceptually relate the underlying saturation deficit probability density function (PDF) with its saturated cloudy part,
b) analytically compute the vertical integrated liquid water path (LWP) variability,
c) diagnose the relevant PDF-moments from cloud parameterizations,
d) derive high-resolution LWP from satellite observations
and e) deduce the LWP statistics by aggregating the LWP onto boxes equivalent to the GCM grid size. On this basis, this work shows that it is possible to evaluate the sub-grid scale variability of cloud parameterizations in terms of cloud variables.
Differences among the PDF types increase with complexity, in particular the more advanced cloud parameterizations can make use of their double Gaussian PDF in conditions, where cumulus convection forms a separate mode with respect to the remainder of the grid-box. Therefore, it is concluded that the difference between unimodal and bimodal PDFs is more important, than the shape within each mode.
However, the simulations and their evaluation reveals that the advanced parameterizations do not take full advantage of their abilities and their statistical relationships are broadly similar to less complex PDF shapes, while the results from observations and cloud resolving simulations indicate even more complex distributions.
Therefore, this work suggests that the use of less complex PDF shapes might yield a better trade-off.
With increasing model resolution initial weaknesses of simpler, e.g. unimodal PDFs, will be diminished. While cloud schemes for coarse-resolved models need to parameterize multiple cloud regimes per grid-box, higher spatial resolution of future GCMs will separate them better, so that the unimodal approximation improves.
|
694 |
Test Validity and Statistical AnalysisSargsyan, Alex 17 September 2018 (has links)
No description available.
|
695 |
Statistics of cycles in large networksKlemm, Konstantin, Stadler, Peter F. 06 February 2019 (has links)
The occurrence of self-avoiding closed paths (cycles) in networks is studied under varying rules of wiring. As a main result, we find that the dependence between network size and typical cycle length is algebraic, (h) proportional to Nalpha, with distinct values of for different wiring rules. The Barabasi-Albert model has alpha=1. Different preferential and nonpreferential attachment rules and the growing Internet graph yield alpha<1. Computation of the statistics of cycles at arbitrary length is made possible by the introduction of an efficient sampling algorithm.
|
696 |
Rank statistics of forecast ensemblesSiegert, Stefan 21 December 2012 (has links)
Ensembles are today routinely applied to estimate uncertainty in numerical predictions of complex systems such as the weather. Instead of initializing a single numerical forecast, using only the best guess of the present state as initial conditions, a collection (an ensemble) of forecasts whose members start from slightly different initial conditions is calculated. By varying the initial conditions within their error bars, the sensitivity of the resulting forecasts to these measurement errors can be accounted for. The ensemble approach can also be applied to estimate forecast errors that are due to insufficiently known model parameters by varying these parameters between ensemble members.
An important (and difficult) question in ensemble weather forecasting is how well does an ensemble of forecasts reproduce the actual forecast uncertainty. A widely used criterion to assess the quality of forecast ensembles is statistical consistency which demands that the ensemble members and the corresponding measurement (the ``verification\'\') behave like random independent draws from the same underlying probability distribution. Since this forecast distribution is generally unknown, such an analysis is nontrivial. An established criterion to assess statistical consistency of a historical archive of scalar ensembles and verifications is uniformity of the verification rank: If the verification falls between the (k-1)-st and k-th largest ensemble member it is said to have rank k. Statistical consistency implies that the average frequency of occurrence should be the same for each rank.
A central result of the present thesis is that, in a statistically consistent K-member ensemble, the (K+1)-dimensional vector of rank probabilities is a random vector that is uniformly distributed on the K-dimensional probability simplex. This behavior is universal for all possible forecast distributions. It thus provides a way to describe forecast ensembles in a nonparametric way, without making any assumptions about the statistical behavior of the ensemble data. The physical details of the forecast model are eliminated, and the notion of statistical consistency is captured in an elementary way. Two applications of this result to ensemble analysis are presented.
Ensemble stratification, the partitioning of an archive of ensemble forecasts into subsets using a discriminating criterion, is considered in the light of the above result. It is shown that certain stratification criteria can make the individual subsets of ensembles appear statistically inconsistent, even though the unstratified ensemble is statistically consistent. This effect is explained by considering statistical fluctuations of rank probabilities. A new hypothesis test is developed to assess statistical consistency of stratified ensembles while taking these potentially misleading stratification effects into account.
The distribution of rank probabilities is further used to study the predictability of outliers, which are defined as events where the verification falls outside the range of the ensemble, being either smaller than the smallest, or larger than the largest ensemble member. It is shown that these events are better predictable than by a naive benchmark prediction, which unconditionally issues the average outlier frequency of 2/(K+1) as a forecast. Predictability of outlier events, quantified in terms of probabilistic skill scores and receiver operating characteristics (ROC), is shown to be universal in a hypothetical forecast ensemble. An empirical study shows that in an operational temperature forecast ensemble, outliers are likewise predictable, and that the corresponding predictability measures agree with the analytically calculated ones.
|
697 |
Tests of statistical normalityFreeman, Daniel H. January 1970 (has links)
Thesis (M.A.)--Boston University / PLEASE NOTE: Boston University Libraries did not receive an Authorization To Manage form for this thesis or dissertation. It is therefore not openly accessible, though it may be available by request. If you are the author or principal advisor of this work and would like to request open access for it, please contact us at open-help@bu.edu. Thank you. / The purpose of this paper is to discuss several tests of statistical normality. By normality it is meant that a simple random sample is drawn from a population with a normal or Gaussian distribution. There are a number of such tests in existence. For example, the chi-squared (CS), √b1 , and b2 are reasonably well known. Others, such as Geary's (a) are not as popular. The discussion of tests of normality has been quite thorough in the various journals. However, they have not brought together, with a complete discussion including exrunples and comparisons.
The discussion will proceed in four parts. The effect of non-normality will be briefly reviewed. Several examples are indicated here, such as the t-statistic and prediction intervals where non-normality alters the significance level. Some of the tests where non-normality is not too harmful will also be indicated. [TRUNCATED] / 2031-01-01
|
698 |
On the Statistics of Trustworthiness PredictionHauke, Sascha 14 January 2015 (has links) (PDF)
Trust and trustworthiness facilitate interactions between human beings worldwide, every day. They enable the formation of friendships, making of profits and the adoption of new technologies, making life not only more pleasant, but furthering the societal development. Trust, for lack of a better word, is good. When human beings trust, they rely on the trusted party to be trustworthy, that is, literally worthy of the trust that is being placed in them. If it turns out that the trusted party is unworthy of the trust placed into it, the truster has misplaced its trust, has unwarrantedly relied and is liable to experience possibly unpleasant consequences. Human social evolution has equipped us with tools for determining another’s trustworthiness through experience, cues and observations with which we aim to minimise the risk of misplacing our trust.
Social adaptation, however, is a slow process and the cues that are helpful in real, physical environments where we can observe and hear our interlocutors are less helpful in interactions that are conducted over data networks with other humans or computers, or even between two computers. This presents a challenge in a world where the virtual and the physical intermesh increasingly. A challenge that computational trust models seek to address by applying computational evidence-based methods to estimate trustworthiness.
In this thesis, the state-of-the-art in evidence-based trust models is extended and improved upon – in particular with regard to their statistical modelling. The statistics behind (Bayesian) trustworthiness estimation will receive special attention, their extension bringing about improvements in trustworthiness estimation that encompass the fol- lowing aspects: (i.) statistically well-founded estimators for binomial and multinomial models of trust that can accurately estimate the trustworthiness of another party and those that can express the inher- ent uncertainty of the trustworthiness estimate in a statistically meaningful way, (ii.) better integration of recommendations by third parties using advanced methods for determining the reliability of the received recommendations, (iii.) improved responsiveness to changes in the behaviour of trusted parties, and (iv.) increasing the generalisability of trust-relevant information over a set of trusted parties. Novel estimators, methods for combining recommendations and other trust- relevant information, change detectors, as well as a mapping for integrating stereotype-based trustworthiness estimates, are bundled in an improved Bayesian trust model, Multinomial CertainTrust.
Specific scientific contributions are structured into three distinct categories:
1. A Model for Trustworthiness Estimation: The statistics of trustworthiness estimation are investigated to design fully multinomial trustworthiness estimation model. Leveraging the assumptions behind the Bayesian estimation of binomial and multinomial proportions, accurate trustworthiness and certainty estimators are presented, and the integration of subjectivity via informed and non-informed Bayesian priors is discussed.
2. Methods for Trustworthiness Information Processing: Methods for facilitating trust propagation and accounting for concept drift in the behaviour of the trusted parties are introduced. All methods are applicable, by design, to both the binomial case and the multinomial case of trustworthiness estimation.
3. Further extension for trustworthiness estimation: Two methods for addressing the potential lack of direct experiences with new trustee in feedback-based trust models are presented. For one, the dedicated modelling of particular roles and the trust delegation between them is shown to be principally possible as an extension to existing feedback- based trust models. For another, a more general approach for feature-based generalisation using model-free, supervised machine-learners, is introduced.
The general properties of the trustworthiness and certainty estimators are derived formally from the basic assumptions underlying binomial and multinomial estimation problems, harnessing fundamentals of Bayesian statistics. Desired properties for the introduced certainty estimators, first postulated by Wang & Singh, are shown to hold through formal argument. The general soundness and applicability of the proposed certainty estimators is founded on the statistical properties of interval estimation techniques discussed in the related statistics work and formally and rigorously shown there.
The core estimation system and additional methods, in their entirety constituting the Multinomial CertainTrust model, are implemented in R, along with competing methods from the related work, specifically for determining recommender trustworthiness and coping with changing behaviour through ageing. The performance of the novel methods introduced in this thesis was tested against established methods from the related work in simulations.
Methods for hardcoding indicators of trustworthiness were implemented within a multi-agent framework and shown to be functional in an agent-based simulation. Furthermore, supervised machine-learners were tested for their applicability by collecting a real-world data set of reputation data from a hotel booking site and evaluating their capabilities against this data set. The hotel data set exhibits properties, such as a high imbalance in the ratings, that appears typical of data that is generated from reputation systems, as these are also present in other data sets.
|
699 |
Statistical problems in pasture researchRobinson, P 22 November 2016 (has links)
No description available.
|
700 |
INTROSTAT (Statistics textbook)Underhill, Les, Bradfield, Dave January 2013 (has links)
IntroStat was designed to meet the needs of students, primarily those in business, commerce and management, for a course in applied statistics. IntroSTAT is designed as a lecture-book. One of the aims is to maximize the time spent in explaining concepts and doing examples. The book is commonly used as part of first year courses into Statistics.
|
Page generated in 0.1835 seconds