Global ETD Search

1	Hyper Markov Non-Parametric Processes for Mixture Modeling and Model Selection Heinz, Daniel 01 June 2010 (has links) Markov distributions describe multivariate data with conditional independence structures. Dawid and Lauritzen (1993) extended this idea to hyper Markov laws for prior distributions. A hyper Markov law is a distribution over Markov distributions whose marginals satisfy the same conditional independence constraints. These laws have been used for Gaussian mixtures (Escobar, 1994; Escobar and West, 1995) and contingency tables (Liu and Massam, 2006; Dobra and Massam, 2009). In this paper, we develop a family of non-parametric hyper Markov laws that we call hyper Dirichlet processes, combining the ideas of hyper Markov laws and non-parametric processes. Hyper Dirichlet processes are joint laws with Dirichlet process laws for particular marginals. We also describe a more general class of Dirichlet processes that are not hyper Markov, but still contain useful properties for describing graphical data. The graphical Dirichlet processes are simple Dirichlet processes with a hyper Markov base measure. This class allows an extremely straight-forward application of existing Dirichlet knowledge and technology to graphical settings. Given the wide-spread use of Dirichlet processes, there are many applications of this framework waiting to be explored. One broad class of applications, known as Dirichlet process mixtures, has been used for constructing mixture densities such that the underlying number of components may be determined by the data (Lo, 1984; Escobar, 1994; Escobar and West, 1995). I consider the use of the new graphical Dirichlet process in this setting, which imparts a conditional independence structure inside each component. In other words, given the component or cluster membership, the data exhibit the desired independence structure. We discuss two applications. Expanding on the work of Escobar and West (1995), we estimate a non-parametric mixture of Markov Gaussians using a Gibbs sampler. Secondly, we employ the Mode-Oriented Stochastic Search of Dobra and Massam (2009) for determining a suitable conditional independence model, focusing on contingency tables. In general, the mixing induced by a Dirichlet process does not drastically increase the complexity beyond that of a simpler Bayesian hierarchical models sans mixture components. We provide a specific representation for decomposable graphs with useful algorithms for local updates. Statistical Models
2	Generalization Error Bounds for Time Series McDonald, Daniel J. 06 April 2012 (has links) In this thesis, I derive generalization error bounds — bounds on the expected inaccuracy of the predictions — for time series forecasting models. These bounds allow forecasters to select among competing models, and to declare that, with high probability, their chosen model will perform well — without making strong assumptions about the data generating process or appealing to asymptotic theory. Expanding upon results from statistical learning theory, I demonstrate how these techniques can help time series forecasters to choose models which behave well under uncertainty. I also show how to estimate the β-mixing coefficients for dependent data so that my results can be used empirically. I use the bound explicitly to evaluate different predictive models for the volatility of IBM stock and for a standard set of macroeconomic variables. Taken together my results show how to control the generalization error of time series models with fixed or growing memory. Statistical Models
3	On the identification and fitting of models to multivariate time series using state space methods Swift, A. L. January 1987 (has links) No description available. 519.5 Multivariate statistical models
4	Beyond Geometric Models: Multivariate Statistical Ecology with Likelihood Functions Walker, Steven C. 23 February 2011 (has links) Ecological problems often require multivariate analyses. Ever since Bray and Curtis (1957) drew an analogy between Euclidean distance and community dissimilarity, most multivariate ecological inference has been based on geometric ideas. For example, ecologists routinely use distance-based ordination methods (e.g. multidimensional scaling) to enhance the interpretability of multivariate data. More recently, distance-based diversity indices that account for functional differences between species are now routinely used. But in most other areas of science, inference is based on Fisher's (1922) likelihood concept; statisticians view likelihood as an advance over purely geometric approaches. Nevertheless, likelihood-based reasoning is rare in multivariate statistical ecology. Using ordination and functional diversity as case studies, my thesis addresses the questions: Why is likelihood rare in multivariate statistical ecology? Can likelihood be of practical use in multivariate analyses of real ecological data? Should the likelihood concept replace multidimensional geometry as the foundation for multivariate statistical ecology? I trace the history of quantitative plant ecology to argue that the geometric focus of contemporary multivariate statistical ecology is a legacy of an early 20th century debate on the nature of plant communities. Using the Rao-Blackwell and Lehmann-Scheffé theorems, which both depend on the likelihood concept, I show how to reduce bias and sampling variability in estimators of functional diversity. I also show how to use likelihood-based information criteria to select among ordination methods. Using computationally intensive Markov-chain Monte Carlo methods, I demonstrate how to expand the range of likelihood-based ordination procedures that are computationally feasible. Finally, using philosophical ideas from formal measurement theory, I argue that a likelihood-based multivariate statistical ecology outperforms the geometry-based alternative by providing a stronger connection between analysis and the real world. Likelihood should be used more often in multivariate ecology. Ecology Multivariate statistics Likelihood Statistical models 0329
5	Modeling Subset Behavior: Prescriptive Analytics for Professional Basketball Data Bynum, Lucius 01 January 2018 (has links) Sports analytics problems have become increasingly prominent in the past decade. Modern image processing capabilities allow coaching staff to easily capture detailed game-time statistics on their players, opponents, team configurations, and plays. The challenge is to turn that data into meaningful insights for team managers and coaches. This project uses descriptive and predictive techniques on publicly available NBA basketball data to identify powerful combinations of players and predict how they will perform against other teams. Applied Statistics Other Applied Mathematics Statistical Models
6	Determining Chlorophyll-a Concentrations in Aquatic Systems with New Statistical Methods and Models Dimberg, Peter January 2011 (has links) Chlorophyll-a (chl-a) concentration is an indicator of the trophic status and is extensively used as a measurement of the algal biomass which affects the level of eutrophication in aquatic systems. High concentration of chl-a may indicate high biomass of phytoplankton which can decrease the quality of water or eliminate important functional groups in the ecosystem. Predicting chl-a concentrations is desirable to understand how great impact chl-a may have in aquatic systems for different scenarios during long-time periods and seasonal variation. Several models of predicting annual or summer chl-a concentration have been designed using total phosphorus, total nitrogen or both in combination as in-parameters. These models have high predictive power but are not constructed for evaluating the long-term change or predicting the seasonal variation in a system since the input parameters often are annual values or values from other specific periods. The models are in other words limited to the range where they were constructed. The aim with this thesis was to complement these models with other methods and models which gives a more appropriate image of how the chl-a concentration in an aquatic system acts, both in a short as well as a long-time perspective. The results showed that with a new method called Statistically meaningful trend the Baltic Proper have not had any change in chl-a concentrations for the period 1975 to 2007 which contradicts the old result observing the p-value from the trend line of the raw data. It is possible to predict seasonal variation of median chl-a concentration in lakes from a wide geographically range using summer total phosphorus and latitude as an in-parameter. It is also possible to predict the probability of reaching different monthly median chl-a concentrations using Markov chains or a direct relationship between two months. These results give a proper image of how the chl-a concentration in aquatic systems vary and can be used to validate how different actions may or may not reduce the problem of potentially harmful algal blooms. / Koncentrationen av klorofyll-a (chl-a) är en indikator på vilken trofinivå ett akvatiskt system har och används som ett mått på algbiomassa som påverkar övergödningen i akvatiska system. Höga koncentrationer av chl-a i sjöar kan indikera hög biomassa av fytoplankton och försämra kvalitén i vattnet eller eliminera viktiga funktionella grupper i ett ekosystem. Det är önskvärt att kunna prediktera chl-a koncentrationer för att förstå hur stor påverkan chl-a kan ha för olika scenarier i akvatiska system under längre perioder samt under säsongsvariationer. Flera modeller har tagits fram som predikterar årsvärden eller sommarvärden av chl-a koncentrationer och i dessa ingår totalfosfor, totalkväve eller en kombination av båda som inparametrar. Dessa modeller har hög prediktiv kraft men är inte utvecklade för att kunna utvärdera förändringar över längre perioder eller prediktera säsongsvariationer i ett system eftersom inparametrarna ofta är årsmedelvärden eller värden från andra specifika perioder. Modellerna är med andra ord begränsade till den domän som de togs fram för. Målet med denna avhandling var att komplettera dessa modeller med andra metoder och modeller vilket ger en bättre förståelse för hur chl-a koncentrationer i akvatiska system varierar, både i ett kortsiktigt och ett längre perspektiv. Resultaten visade att med en ny metod som kallas för Statistiskt meningsfull trend så har egentliga Östersjön inte haft någon förändring av chl-a koncentrationer under perioden 1975 till 2007 vilket motsäger tidigare resultat då p-värdet tas fram från en trendlinje av rådata. Det är möjligt att prediktera säsongsvariationer av median chl-a koncentrationer i sjöar från en bred geografisk domän med totalfosfor från sommar och latitud som inparametrar. Det är även möjligt att beräkna sannolikhetenav ett predikterat värde för olika månadsmedianer av chl-a koncentrationer med Markovkedjor eller ett direkt samband mellan två månader. Dessa resultat ger en reell förståelse för hur chl-a koncentrationer i akvatiska system varierar och kan användas till att validera hur olika åtgärder kan eller inte kan reducera problemet av de potentiellt skadliga algblomningarna. Chlorophyll-a statistical models aquatic systems lakes
7	Empirical and Kinetic Models for the Determination of Pharmaceutical Product Stability Khalifa, Nagwa 24 January 2011 (has links) Drug stability is one of the vital subjects in the pharmaceutical industry. All drug products should be kept stable and protected against any chemical, physical, and microbiological degradation to ensure their efficacy and safety until released for public use. Hence, stability is very important to be estimated or predicted. This work involved studying the stability of three different drug agents using three different mathematical models. These models included both empirical models (linear regression and artificial neural network), and mechanistic (kinetic) models. The stability of each drug in the three cases studied was expressed in terms of concentration, hardness, temperature and humidity. The predicted values obtained from the models were compared to the observed values of drug concentrations obtained experimentally and then evaluated by calculating the mean of squared. Among the models used in this work, the mechanistic model was found to be the most accurate and reliable method of stability testing given the fact that it had the smallest calculated errors. Overall, the accuracy of these mathematical models as indicated by the proximity of their stability measurements to the observed values, led to the assumption that such models can be reliable and time-saving alternatives to the analytical techniques used in practice. Drug stability Statistical models Chemical Engineering
8	Examining the application of conway-maxwell-poisson models for analyzing traffic crash data Geedipally, Srinivas Reddy 15 May 2009 (has links) Statistical models have been very popular for estimating the performance of highway safety improvement programs which are intended to reduce motor vehicle crashes. The traditional Poisson and Poisson-gamma (negative binomial) models are the most popular probabilistic models used by transportation safety analysts for analyzing traffic crash data. The Poisson-gamma model is usually preferred over traditional Poisson model since crash data usually exhibit over-dispersion. Although the Poisson-gamma model is popular in traffic safety analysis, this model has limitations particularly when crash data are characterized by small sample size and low sample mean values. Also, researchers have found that the Poisson-gamma model has difficulties in handling under-dispersed crash data. The primary objective of this research is to evaluate the performance of the Conway-Maxwell-Poisson (COM-Poisson) model for various situations and to examine its application for analyzing traffic crash datasets exhibiting over- and under-dispersion. This study makes use of various simulated and observed crash datasets for accomplishing the objectives of this research. Using a simulation study, it was found that the COM-Poisson model can handle under-, equi- and over-dispersed datasets with different mean values, although the credible intervals are found to be wider for low sample mean values. The computational burden of its implementation is also not prohibitive. Using intersection crash data collected in Toronto and segment crash data collected in Texas, the results show that COM-Poisson models perform as well as Poisson-gamma models in terms of goodness-of-fit statistics and predictive performance. With the use of crash data collected at railway-highway crossings in South Korea, several COM-Poisson models were estimated and it was found that the COM-Poisson model can handle crash data when the modeling output shows signs of under-dispersion. The results also show that the COM-Poisson model provides better statistical performance than the gamma probability and traditional Poisson models. Furthermore, it was found that the COM-Poisson model has limitations similar to that of the Poisson-gamma model when handling data with low sample mean and small sample size. Despite its limitations for low sample mean values for over-dispersed datasets, the COM-Poisson is still a flexible method for analyzing crash data. Conway-Maxwell-Poisson traffic crashes statistical models
9	Bayesian inference for models with infinite-dimensionally generated intractable components Villalobos, Isadora Antoniano January 2012 (has links) No description available. 519.5
10	Beyond Geometric Models: Multivariate Statistical Ecology with Likelihood Functions Walker, Steven C. 23 February 2011 (has links) Ecological problems often require multivariate analyses. Ever since Bray and Curtis (1957) drew an analogy between Euclidean distance and community dissimilarity, most multivariate ecological inference has been based on geometric ideas. For example, ecologists routinely use distance-based ordination methods (e.g. multidimensional scaling) to enhance the interpretability of multivariate data. More recently, distance-based diversity indices that account for functional differences between species are now routinely used. But in most other areas of science, inference is based on Fisher's (1922) likelihood concept; statisticians view likelihood as an advance over purely geometric approaches. Nevertheless, likelihood-based reasoning is rare in multivariate statistical ecology. Using ordination and functional diversity as case studies, my thesis addresses the questions: Why is likelihood rare in multivariate statistical ecology? Can likelihood be of practical use in multivariate analyses of real ecological data? Should the likelihood concept replace multidimensional geometry as the foundation for multivariate statistical ecology? I trace the history of quantitative plant ecology to argue that the geometric focus of contemporary multivariate statistical ecology is a legacy of an early 20th century debate on the nature of plant communities. Using the Rao-Blackwell and Lehmann-Scheffé theorems, which both depend on the likelihood concept, I show how to reduce bias and sampling variability in estimators of functional diversity. I also show how to use likelihood-based information criteria to select among ordination methods. Using computationally intensive Markov-chain Monte Carlo methods, I demonstrate how to expand the range of likelihood-based ordination procedures that are computationally feasible. Finally, using philosophical ideas from formal measurement theory, I argue that a likelihood-based multivariate statistical ecology outperforms the geometry-based alternative by providing a stronger connection between analysis and the real world. Likelihood should be used more often in multivariate ecology. Ecology Multivariate statistics Likelihood Statistical models 0329

Search results