Global ETD Search

691	Statistical analysis of grouped data Crafford, Gretel 01 July 2008 (has links) The maximum likelihood (ML) estimation procedure of Matthews and Crowther (1995: A maximum likelihood estimation procedure when modelling in terms of constraints. South African Statistical Journal, 29, 29-51) is utilized to fit a continuous distribution to a grouped data set. This grouped data set may be a single frequency distribution or various frequency distributions that arise from a cross classification of several factors in a multifactor design. It will also be shown how to fit a bivariate normal distribution to a two-way contingency table where the two underlying continuous variables are jointly normally distributed. This thesis is organized in three different parts, each playing a vital role in the explanation of analysing grouped data with the ML estimation of Matthews and Crowther. In Part I the ML estimation procedure of Matthews and Crowther is formulated. This procedure plays an integral role and is implemented in all three parts of the thesis. In Part I the exponential distribution is fitted to a grouped data set to explain the technique. Two different formulations of the constraints are employed in the ML estimation procedure and provide identical results. The justification of the method is further motivated by a simulation study. Similar to the exponential distribution, the estimation of the normal distribution is also explained in detail. Part I is summarized in Chapter 5 where a general method is outlined to fit continuous distributions to a grouped data set. Distributions such as the Weibull, the log-logistic and the Pareto distributions can be fitted very effectively by formulating the vector of constraints in terms of a linear model. In Part II it is explained how to model a grouped response variable in a multifactor design. This multifactor design arise from a cross classification of the various factors or independent variables to be analysed. The cross classification of the factors results in a total of T cells, each containing a frequency distribution. Distribution fitting is done simultaneously to each of the T cells of the multifactor design. Distribution fitting is also done under the additional constraints that the parameters of the underlying continuous distributions satisfy a certain structure or design. The effect of the factors on the grouped response variable may be evaluated from this fitted design. Applications of a single-factor and a two-factor model are considered to demonstrate the versatility of the technique. A two-way contingency table where the two variables have an underlying bivariate normal distribution is considered in Part III. The estimation of the bivariate normal distribution reveals the complete underlying continuous structure between the two variables. The ML estimate of the correlation coefficient ρ is used to great effect to describe the relationship between the two variables. Apart from an application a simulation study is also provided to support the method proposed. / Thesis (PhD (Mathematical Statistics))--University of Pretoria, 2007. / Statistics / unrestricted Grouped data set UCTD
692	Stochastic Modeling and Statistical Analysis Wu, Ling 01 April 2010 (has links) The objective of the present study is to investigate option pricing and forecasting problems in finance. This is achieved by developing stochastic models in the framework of classical modeling approach. In this study, by utilizing the stock price data, we examine the correctness of the existing Geometric Brownian Motion (GBM) model under standard statistical tests. By recognizing the problems, we attempted to demonstrate the development of modified linear models under different data partitioning processes with or without jumps. Empirical comparisons between the constructed and GBM models are outlined. By analyzing the residual errors, we observed the nonlinearity in the data set. In order to incorporate this nonlinearity, we further employed the classical model building approach to develop nonlinear stochastic models. Based on the nature of the problems and the knowledge of existing nonlinear models, three different nonlinear stochastic models are proposed. Furthermore, under different data partitioning processes with equal and unequal intervals, a few modified nonlinear models are developed. Again, empirical comparisons between the constructed nonlinear stochastic and GBM models in the context of three data sets are outlined. Stochastic dynamic models are also used to predict the future dynamic state of processes. This is achieved by modifying the nonlinear stochastic models from constant to time varying coefficients, and then time series models are constructed. Using these constructed time series models, the prediction and comparison problems with the existing time series models are analyzed in the context of three data sets. The study shows that the nonlinear stochastic model 2 with time varying coefficients is robust with respect different data sets. We derive the option pricing formula in the context of three nonlinear stochastic models with time varying coefficients. The option pricing formula in the frame work of hybrid systems, namely, Hybrid GBM (HGBM) and hybrid nonlinear stochastic models are also initiated. Finally, based on our initial investigation about the significance of presented nonlinear stochastic models in forecasting and option pricing problems, we propose to continue and further explore our study in the context of nonlinear stochastic hybrid modeling approach. Hybrid System Nonlinear Models Option Pricing Forecasting ARIMA American Studies Arts and Humanities
693	Evaluation of statistical cloud parameterizations Brück, Heiner Matthias 06 October 2016 (has links) This work is motivated by the question: how much complexity is appropriate for a cloud parameterization used in general circulation models (GCM). To approach this question, cloud parameterizations across the complexity range are explored using general circulation models and theoretical Monte-Carlo simulations. Their results are compared with high-resolution satellite observations and simulations that resolve the GCM subgrid-scale variability explicitly. A process-orientated evaluation is facilitated by GCM forecast simulations which reproduce the synoptic state. For this purpose novel methods were develop to a) conceptually relate the underlying saturation deficit probability density function (PDF) with its saturated cloudy part, b) analytically compute the vertical integrated liquid water path (LWP) variability, c) diagnose the relevant PDF-moments from cloud parameterizations, d) derive high-resolution LWP from satellite observations and e) deduce the LWP statistics by aggregating the LWP onto boxes equivalent to the GCM grid size. On this basis, this work shows that it is possible to evaluate the sub-grid scale variability of cloud parameterizations in terms of cloud variables. Differences among the PDF types increase with complexity, in particular the more advanced cloud parameterizations can make use of their double Gaussian PDF in conditions, where cumulus convection forms a separate mode with respect to the remainder of the grid-box. Therefore, it is concluded that the difference between unimodal and bimodal PDFs is more important, than the shape within each mode. However, the simulations and their evaluation reveals that the advanced parameterizations do not take full advantage of their abilities and their statistical relationships are broadly similar to less complex PDF shapes, while the results from observations and cloud resolving simulations indicate even more complex distributions. Therefore, this work suggests that the use of less complex PDF shapes might yield a better trade-off. With increasing model resolution initial weaknesses of simpler, e.g. unimodal PDFs, will be diminished. While cloud schemes for coarse-resolved models need to parameterize multiple cloud regimes per grid-box, higher spatial resolution of future GCMs will separate them better, so that the unimodal approximation improves. info:eu-repo/classification/ddc/550 ddc:550
694	Test Validity and Statistical Analysis Sargsyan, Alex 17 September 2018 (has links) No description available. test validity statistical analysis College of Nursing
695	Statistics of cycles in large networks Klemm, Konstantin, Stadler, Peter F. 06 February 2019 (has links) The occurrence of self-avoiding closed paths (cycles) in networks is studied under varying rules of wiring. As a main result, we find that the dependence between network size and typical cycle length is algebraic, (h) proportional to Nalpha, with distinct values of for different wiring rules. The Barabasi-Albert model has alpha=1. Different preferential and nonpreferential attachment rules and the growing Internet graph yield alpha<1. Computation of the statistics of cycles at arbitrary length is made possible by the introduction of an efficient sampling algorithm. Informatics, Statistics, Physics info:eu-repo/classification/ddc/530 ddc:530
696	Rank statistics of forecast ensembles Siegert, Stefan 21 December 2012 (has links) Ensembles are today routinely applied to estimate uncertainty in numerical predictions of complex systems such as the weather. Instead of initializing a single numerical forecast, using only the best guess of the present state as initial conditions, a collection (an ensemble) of forecasts whose members start from slightly different initial conditions is calculated. By varying the initial conditions within their error bars, the sensitivity of the resulting forecasts to these measurement errors can be accounted for. The ensemble approach can also be applied to estimate forecast errors that are due to insufficiently known model parameters by varying these parameters between ensemble members. An important (and difficult) question in ensemble weather forecasting is how well does an ensemble of forecasts reproduce the actual forecast uncertainty. A widely used criterion to assess the quality of forecast ensembles is statistical consistency which demands that the ensemble members and the corresponding measurement (the ``verification\'\') behave like random independent draws from the same underlying probability distribution. Since this forecast distribution is generally unknown, such an analysis is nontrivial. An established criterion to assess statistical consistency of a historical archive of scalar ensembles and verifications is uniformity of the verification rank: If the verification falls between the (k-1)-st and k-th largest ensemble member it is said to have rank k. Statistical consistency implies that the average frequency of occurrence should be the same for each rank. A central result of the present thesis is that, in a statistically consistent K-member ensemble, the (K+1)-dimensional vector of rank probabilities is a random vector that is uniformly distributed on the K-dimensional probability simplex. This behavior is universal for all possible forecast distributions. It thus provides a way to describe forecast ensembles in a nonparametric way, without making any assumptions about the statistical behavior of the ensemble data. The physical details of the forecast model are eliminated, and the notion of statistical consistency is captured in an elementary way. Two applications of this result to ensemble analysis are presented. Ensemble stratification, the partitioning of an archive of ensemble forecasts into subsets using a discriminating criterion, is considered in the light of the above result. It is shown that certain stratification criteria can make the individual subsets of ensembles appear statistically inconsistent, even though the unstratified ensemble is statistically consistent. This effect is explained by considering statistical fluctuations of rank probabilities. A new hypothesis test is developed to assess statistical consistency of stratified ensembles while taking these potentially misleading stratification effects into account. The distribution of rank probabilities is further used to study the predictability of outliers, which are defined as events where the verification falls outside the range of the ensemble, being either smaller than the smallest, or larger than the largest ensemble member. It is shown that these events are better predictable than by a naive benchmark prediction, which unconditionally issues the average outlier frequency of 2/(K+1) as a forecast. Predictability of outlier events, quantified in terms of probabilistic skill scores and receiver operating characteristics (ROC), is shown to be universal in a hypothetical forecast ensemble. An empirical study shows that in an operational temperature forecast ensemble, outliers are likewise predictable, and that the corresponding predictability measures agree with the analytically calculated ones. info:eu-repo/classification/ddc/510 ddc:510
697	On the Statistics of Trustworthiness Prediction Hauke, Sascha 14 January 2015 (has links) (PDF) Trust and trustworthiness facilitate interactions between human beings worldwide, every day. They enable the formation of friendships, making of profits and the adoption of new technologies, making life not only more pleasant, but furthering the societal development. Trust, for lack of a better word, is good. When human beings trust, they rely on the trusted party to be trustworthy, that is, literally worthy of the trust that is being placed in them. If it turns out that the trusted party is unworthy of the trust placed into it, the truster has misplaced its trust, has unwarrantedly relied and is liable to experience possibly unpleasant consequences. Human social evolution has equipped us with tools for determining another’s trustworthiness through experience, cues and observations with which we aim to minimise the risk of misplacing our trust. Social adaptation, however, is a slow process and the cues that are helpful in real, physical environments where we can observe and hear our interlocutors are less helpful in interactions that are conducted over data networks with other humans or computers, or even between two computers. This presents a challenge in a world where the virtual and the physical intermesh increasingly. A challenge that computational trust models seek to address by applying computational evidence-based methods to estimate trustworthiness. In this thesis, the state-of-the-art in evidence-based trust models is extended and improved upon – in particular with regard to their statistical modelling. The statistics behind (Bayesian) trustworthiness estimation will receive special attention, their extension bringing about improvements in trustworthiness estimation that encompass the fol- lowing aspects: (i.) statistically well-founded estimators for binomial and multinomial models of trust that can accurately estimate the trustworthiness of another party and those that can express the inher- ent uncertainty of the trustworthiness estimate in a statistically meaningful way, (ii.) better integration of recommendations by third parties using advanced methods for determining the reliability of the received recommendations, (iii.) improved responsiveness to changes in the behaviour of trusted parties, and (iv.) increasing the generalisability of trust-relevant information over a set of trusted parties. Novel estimators, methods for combining recommendations and other trust- relevant information, change detectors, as well as a mapping for integrating stereotype-based trustworthiness estimates, are bundled in an improved Bayesian trust model, Multinomial CertainTrust. Specific scientific contributions are structured into three distinct categories: 1. A Model for Trustworthiness Estimation: The statistics of trustworthiness estimation are investigated to design fully multinomial trustworthiness estimation model. Leveraging the assumptions behind the Bayesian estimation of binomial and multinomial proportions, accurate trustworthiness and certainty estimators are presented, and the integration of subjectivity via informed and non-informed Bayesian priors is discussed. 2. Methods for Trustworthiness Information Processing: Methods for facilitating trust propagation and accounting for concept drift in the behaviour of the trusted parties are introduced. All methods are applicable, by design, to both the binomial case and the multinomial case of trustworthiness estimation. 3. Further extension for trustworthiness estimation: Two methods for addressing the potential lack of direct experiences with new trustee in feedback-based trust models are presented. For one, the dedicated modelling of particular roles and the trust delegation between them is shown to be principally possible as an extension to existing feedback- based trust models. For another, a more general approach for feature-based generalisation using model-free, supervised machine-learners, is introduced. The general properties of the trustworthiness and certainty estimators are derived formally from the basic assumptions underlying binomial and multinomial estimation problems, harnessing fundamentals of Bayesian statistics. Desired properties for the introduced certainty estimators, first postulated by Wang & Singh, are shown to hold through formal argument. The general soundness and applicability of the proposed certainty estimators is founded on the statistical properties of interval estimation techniques discussed in the related statistics work and formally and rigorously shown there. The core estimation system and additional methods, in their entirety constituting the Multinomial CertainTrust model, are implemented in R, along with competing methods from the related work, specifically for determining recommender trustworthiness and coping with changing behaviour through ageing. The performance of the novel methods introduced in this thesis was tested against established methods from the related work in simulations. Methods for hardcoding indicators of trustworthiness were implemented within a multi-agent framework and shown to be functional in an agent-based simulation. Furthermore, supervised machine-learners were tested for their applicability by collecting a real-world data set of reputation data from a hotel booking site and evaluating their capabilities against this data set. The hotel data set exhibits properties, such as a high imbalance in the ratings, that appears typical of data that is generated from reputation systems, as these are also present in other data sets.
698	Statistical problems in pasture research Robinson, P 22 November 2016 (has links) No description available.
699	INTROSTAT (Statistics textbook) Underhill, Les, Bradfield, Dave January 2013 (has links) IntroStat was designed to meet the needs of students, primarily those in business, commerce and management, for a course in applied statistics. IntroSTAT is designed as a lecture-book. One of the aims is to maximize the time spent in explaining concepts and doing examples. The book is commonly used as part of first year courses into Statistics. binomial central limit theorem chi-squared distribution exponential first year statistics course IntroStat random variables regression and correlation STA1000F/S STA1001F STA1006S STA1007S STA1100S statistics t-distribution
700	Statistical Mechanics of Microbiomes: Cui, Wenping January 2021 (has links) Thesis advisor: Pankaj Mehta / Thesis advisor: Ziqiang Wang / Nature has revealed an astounding degree of phylogenetic and physiological diversity in natural environments -- especially in the microbial world. Microbial communities are incredibly diverse, ranging from 500-1000 species in human guts to over 1000 species in marine ecosystems. Historically, theoretical ecologists have devoted considerable effort to analyzing ecosystems consisting of a few species. However, analytical approaches and theoretical insights derived from small ecosystems consisting of a few species may not scale up to diverse ecosystems. Understanding such large complex ecosystems poses fundamental challenges to current theories and analytical approaches for modeling and understanding the microbial world. One promising approach for tackling this challenge that I develop in my thesis is to adapt and expand ideas from statistical mechanics to theoretical ecology. Statistical mechanics has helped us to understand how collective behaviors emerge from the interaction of many individual components. In this thesis, I present a unified theoretical framework for understanding complex ecosystems based on statistical mechanics, random matrix theories, and convex optimization. My thesis work has three key aspects: modeling, simulations, and theories. Modeling: Classical ecological models often focus on predator-prey relationships. However, this is not the norm in the microbial world. Unlike most macroscopic organisms, microbes relie on consuming and producing small organic molecules for energy and reproduction. In this thesis, we develop a new Microbial Consumer Resource Model that takes into account these types of metabolic cross-feeding interactions. We demonstrate that this model can qualitatively reproduce and explain statistical patterns observed in large survey data, including Earth Microbiome Project and the Human Microbiome Project. Simulations: Computational simulations are essential in theoretical ecology. Complex ecological models often involve ordinary differential equations (ODE) containing hundreds to thousands of interacting variables. Typical ODE solvers are based on numerical integration methods, which are both time and resource intensive. To overcome this bottleneck, we derived a surprising duality between constrained convex optimization and generalized consumer-resource models describing ecological dynamics. This allows us to develop a fast algorithm to solve the steady-state of complex ecological models. This improves computational performance by between 2-3 orders of magnitude compared to direct numerical integration of the corresponding ODEs. Theories:Few theoretical approaches allow for the analytic study of communities containing a large number of species. Recently, there has been considerable interest in the idea that ecosystems can be thought of as a type of disordered systems. This mapping suggests that understanding community coexistence patterns is actually a problem in "spin-glass'' physics. This has motivated physicists to use insights from spin glass theory to uncover the universal features of complex ecosystems. In this thesis, I use and extend the cavity method, originally developed in spin glass theories, to answer fundamental ecological questions regarding the stability, diversity, and robustness of ecosystems. I use the cavity method to derive new species backing bounds and uncover novel phase transitions to typicality. / Thesis (PhD) — Boston College, 2021. / Submitted to: Boston College. Graduate School of Arts and Sciences. / Discipline: Physics. Cavity method Ecology Population dynamics Random Matrix Theories

Search results