Global ETD Search

121	Applications of Monte Carlo Methods in Statistical Inference Using Regression Analysis Huh, Ji Young 01 January 2015 (has links) This paper studies the use of Monte Carlo simulation techniques in the field of econometrics, specifically statistical inference. First, I examine several estimators by deriving properties explicitly and generate their distributions through simulations. Here, simulations are used to illustrate and support the analytical results. Then, I look at test statistics where derivations are costly because of the sensitivity of their critical values to the data generating processes. Simulations here establish significance and necessity for drawing statistical inference. Overall, the paper examines when and how simulations are needed in studying econometric theories. Monte Carlo Econometrics Statistical Reference Estimator Dickey-Fuller Durbin-Watson Applied Statistics Econometrics Statistical Models
122	BAYESIAN SEMIPARAMETRIC GENERALIZATIONS OF LINEAR MODELS USING POLYA TREES Schoergendorfer, Angela 01 January 2011 (has links) In a Bayesian framework, prior distributions on a space of nonparametric continuous distributions may be defined using Polya trees. This dissertation addresses statistical problems for which the Polya tree idea can be utilized to provide efficient and practical methodological solutions. One problem considered is the estimation of risks, odds ratios, or other similar measures that are derived by specifying a threshold for an observed continuous variable. It has been previously shown that fitting a linear model to the continuous outcome under the assumption of a logistic error distribution leads to more efficient odds ratio estimates. We will show that deviations from the assumption of logistic error can result in great bias in odds ratio estimates. A one-step approximation to the Savage-Dickey ratio will be presented as a Bayesian test for distributional assumptions in the traditional logistic regression model. The approximation utilizes least-squares estimates in the place of a full Bayesian Markov Chain simulation, and the equivalence of inferences based on the two implementations will be shown. A framework for flexible, semiparametric estimation of risks in the case that the assumption of logistic error is rejected will be proposed. A second application deals with regression scenarios in which residuals are correlated and their distribution evolves over an ordinal covariate such as time. In the context of prediction, such complex error distributions need to be modeled carefully and flexibly. The proposed model introduces dependent, but separate Polya tree priors for each time point, thus pooling information across time points to model gradual changes in distributional shapes. Theoretical properties of the proposed model will be outlined, and its potential predictive advantages in simulated scenarios and real data will be demonstrated. Polya trees risk estimation logistic regression Bayesian nonparametrics longitudinal data Applied Statistics Statistical Methodology
123	Measurement of body posture using multivariate statistical techniques Petkov , John January 2005 (has links) The aim of this thesis is to develop a quantitative measure of postural defects known as lordosis and kyphosis. The measurement of these is an important part of their identification and treatment. Body Posture Statistical Theory Applied Statistics Biological Mathematics Measurement Statistical techniques
124	Statistical models for earthquakes incorporating ancillary data : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Statistics at Massey University, Palmerston North, New Zealand Wang, Ting January 2010 (has links) This thesis consists of two parts. The first part proposes a new model – the Markov-modulated Hawkes process with stepwise decay (MMHPSD) to investigate the seismicity rate. The MMHPSD is a self-exciting process which switches among different states, in each of which the process has distinguishable background seismicity and decay rates. Parameter estimation is developed via the expectation maximization algorithm. The model is applied to data from the Landers earthquake sequence, demonstrating that it is useful for modelling changes in the temporal patterns of seismicity. The states in the model can capture the behavior of main shocks, large aftershocks, secondary aftershocks and a period of quiescence with different background rates and decay rates. The state transitions can then explain the seismicity rate changes and help indicate if there is any seismicity shadow or relative quiescence. The second part of this thesis develops statistical methods to examine earthquake sequences possessing ancillary data, in this case groundwater level data or GPS measurements of deformation. For the former, signals from groundwater level data at Tangshan Well, China, are extracted for the period from 2002 to 2005 using a moving window method. A number of different statistical techniques are used to detect and quantify coseismic responses to P, S, Love and Rayleigh wave arrivals. The P phase arrivals appear to trigger identifiable oscillations in groundwater level, whereas the Rayleigh waves amplify the water level movement. Identifiable coseismic responses are found for approximately 40 percent of magnitude 6+ earthquakes worldwide. A threshold in the relationship between earthquake magnitude and well–epicenter distance is also found, satisfied by 97% of the identified coseismic responses, above which coseismic changes in groundwater level at Tangshan Well are most likely. A non-linear filter measuring short-term deformation rate changes is introduced to extract signals from GPS data. For two case studies of a) deep earthquakes in central North Island, New Zealand, and b) shallow earthquakes in Southern California, a hidden Markov model (HMM) is fitted to the output from the filter. Mutual information analysis indicates that the state having the largest variation of deformation rate contains precursory information that indicates an elevated probability for earthquake occurrence. Statistical modelling Earthquakes
125	Dealing with sparsity in genotype x environment analyses : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Statistics at Massey University, Palmerston North, New Zealand Godfrey, A. Jonathan R. January 2004 (has links) Researchers are frequently faced with the problem of analyzing incomplete and often unbalanced genotype-by-environment (GxE) matrices which arise as a trials programme progresses over seasons. The principal data for this investigation, arising from a ten year programme of onion trials, has less than 2,300 of the 49,200 combinations from the 400 genotypes and 123 environments. This 'sparsity' renders standard GxE methodology inapplicable. Analysis of this data to identify onion varieties that suit the shorter, hotter days of tropical and subtropical locations therefore presented a unique challenge. Removal of some data to form a complete GxE matrix wastes information and is consequently undesirable. An incomplete GxE matrix can be analyzed using the additive main effects and multiplicative interaction (AMMI) model in conjunction with the EM algorithm but proved unsatisfactory in this instance. Cluster analysis has been commonly used in GxE analyses, but current methods are inadequate when the data matrix is incomplete. If clustering is to be applied to incomplete data sets, one of two routes needs to be taken: either the clustering procedure must be modified to handle the missing data, or the missing entries must be imputed so that standard cluster analysis can be performed. A new clustering method capable of handling incomplete data has been developed. 'Two-stage clustering', as it has been named, relies on a partitioning of squared Euclidean distance into two independent components, the GxE interaction and the genotype main effect. These components are used in the first and second stages of clustering respectively. Two-stage clustering forms the basis for imputing missing values in a GxE matrix, so that a more complete data array is available for other GxE analyses. 'Two-stage imputation' estimates unobserved GxE yields using inter-genotype similarities to adjust observed yield data in the environment in which the yield is missing. This new imputation method is transferrable to any two-way data situation where all observations are measured on the same scale and the two factors are expected to have significant interaction. This simple, but effective, imputation method is shown to improve on an existing method that confounds the GxE interaction and the genotype main effect. Future development of two-stage imputation will use a parameterization of two-stage clustering in a multiple imputation process. Varieties recommended for use in a certain environment would normally be chosen using results from similar environments. Differing cluster analysis approaches were applied, but led to inconsistent environment clusterings. A graphical summary tool, created to ease the difficulty in identifying the differences between pairs of clusterings, proved especially useful when the number of clusters and clustered observations were high. 'Cluster influence diagrams' were also used to investigate the effects the new imputation method had on the qualitative structure of the data. A consequence of the principal data's sparsity was that imputed values were found to be dependent on the existence of observable inter-genotype relationships, rather than the strength of these observable relationships. As a result of this investigation, practical recommendations are provided for limiting the detrimental effects of sparsity. Applying these recommendations will enhance the future ability of two-stage imputation to identify those onion varieties that suit tropical and subtropical locations. Statistical analyses Onion genotypes
126	Distribution Fits for Various Parameters in the Hurricane Model Oxenyuk, Victoria 20 March 2014 (has links) The FPHLM is the only open public hurricane loss evaluation model available for assessment of hazard to insured residential property from hurricanes in Florida. The model consists of three independent components: the atmospheric science component, the vulnerability component and the actuarial component. The atmospheric component simulates thousands of storms, their wind speeds and their decay once on land on the basis of historical hurricane statistics defining wind risk for all residential zip codes in Florida. The focus of the thesis was to analyze atmospheric science component of the Florida Public Hurricane Loss Model, replicate statistical procedures used to model various parameters of atmospheric science component and to validate the model. I establish the distribution for modeling annual hurricane occurrence, choose the best fitting distribution for the radius of maximum winds and compute the expression for the pressure profile parameter Holland B. Hurricane model Distribution fit Florida Public Hurricane Loss Model Applied Statistics Atmospheric Sciences Other Statistics and Probability
127	Aplicação de modelos multiníveis na análise de dados de medidas repetidas no tempo. / Multilevel models applied in the analysis of repeated measure data. Genevile Carife Bergamo 28 October 2002 (has links) Em muitos trabalhos científicos, é comum encontrar os dados estruturados de forma hierarquica, ou seja, os indivíduos em estudo estão agrupados em unidades de nível mais baixo, que por sua vez pertencem a unidades de um nível mais alto e assim sucessivamente. Na análise desse tipo de dados é importante levar em conta a estrutura hierarquica uma vez que, não faze-la, pode implicar na superestimação dos coecientes do modelo em estudo. Assim, para facilitar a análise de dados seguindo uma estrutura hierarquica, foram desenvolvidos os modelos multiníveis. Tais modelos levam em conta toda a variabilidade existente para os dados num mesmo nível como nos diferentes níveis da hierarquia. No caso da análise de dados de medidas repetidas no tempo, uma estrutura hierarquica em dois níveis pode ser considerada, organizando as ocasiões de medidas, no primeiro nível, para cada indivíduo no segundo nível. Neste trabalho, é feita uma abordagem dos modelos multiníveis para vários níveis da hierarquia bem como os métodos de estimação e teste dos parâmetros envolvidos no modelo. Como aplicação, foram analisados dados provenientes do Programa de Atenção ao Idoso (PAI), desenvolvido no ambulatório municipal Dr. Plinio do Prado Coutinho em Alfenas, M.G., em que foram observadas as variáveis Indice de Massa Corporea (imc) e Pressão Arterial dos idosos durante 22 meses. Também, foram analisados dados referentes ao teor de proteína no leite de 79 vacas australianas, coletados durante 19 semanas após o parto e submetidas a três dietas (Diggle et al., 1994). Para os dados do "PAI", foi possível verificar que as diferentes medidas de pressão arterial estão relacionadas positivamente com o imcao longo do tempo, independente de sexo, idade e estado civil. Já nos dados relativos ao teor de proteína no leite, notou-se uma redução do teor de proteína no leite ao longo do tempo, independente dos tratamentos aplicados. Foram utilizados os softwares MLwiN e SAS para a realização das análises. / It is common to and data structured in a hierarchical form in several scientific works, that is, the studied subjects are nested in the lowest level unites, that belong to the highest level unites, and so on. To analyze these sort of data it is important to take in account the hierarchical structure once, if does not do it, the coeficients can be overestimated in the studied model. Then, in order to become easier the data analysis according to the hierarchical structure, multilevel models were developed. Such models take into account all the existing variability for the data at the same level as well as in diferent levels of the hierarchy. In the case of repeated measure data, a two levels hierarchical structure can be considered, organizing the occasions at the first level for each subject at the second level. In this work, na approach of the multilevel models for several levels are made as well as the estimation methods and the tests for the involved parameters in the model. As an application, data from the Elderly Care Program (ECP), developed at outpatient clinic Dr. Plinio do Prado Coutinho at Alfenas, M.G., where the Body Mass Index and the Bloody Pressure were observed from elderly people for 22 months. Also, it was analyzed the milk protein content of 79 australian cows during 19 weeks after calving and subject to three diets (Diggle et al., 1994). For the data of the ECP it was possible to observe that the bloody pressure are positively related to the occasions, independently of sex, race and marital status. For the data form the milk protein content, a reduce in the content in the occasions even after the diets are included. MLwiN and SAS softwares were used to run the analysis. análise de dados estatística aplicada medidas repetidas modelos matemáticos applied statistics data analysis mathematical models repeated measures
128	Step-Selection Functions for Modeling Animal Movement -- Case Study: African Buffalo Adar, Maia 01 January 2018 (has links) Understanding what factors influence wildlife movement allows landscape planners to make informed decisions that benefit both animals and humans. New quantitative methods, such as step-selection functions, provide valuable objective analyses of wildlife connectivity. This paper provides a framework for creating a step-selection function and demonstrates its use in a case study. The first section provides a general introduction about wildlife connectivity research. The second section explains the math behind the step-selection function using a simple example. The last section gives the results of a step-selection model for African buffalo in the Kavango Zambezi Transfrontier Conservation Area. Buffalo were found to avoid fences, rivers, and anthropogenic land use; however, there was great variation in individual buffalo's preferences. Step-selection functions Movement ecology Wildlife connectivity African Buffalo conservation Applied Statistics Biostatistics
129	Modelos lineares mistos: estruturas de matrizes de variâncias e covariâncias e seleção de modelos. / Mixed linear models: structures of matrix of variances and covariances and selection of models. Jomar Antonio Camarinha Filho 27 September 2002 (has links) É muito comum encontrar nas áreas agronômica e biológica experimentos cujas observações são correlacionadas. Porém, tais correlações, em tese, podem estar associadas às parcelas ou às subparcelas, dependendo do plano experimental adotado. Além disso, a metodologia de modelos lineares mistos vem sendo utilizada com mais freqüência, principalmente após os trabalhos de Searle (1988), Searle at al. (1992), Wolfinger (1993b) entre outros. O sucesso do procedimento de modelagem está fortemente associado ao exame dos efeitos aleatórios que devem permanecer no modelo e na possibilidade de se introduzir, no modelo, estruturas de variâncias e covariâncias das variáveis aleatórias que, para o modelo linear misto, podem estar inseridas no resíduo e, também, na parte aleatória associada ao fator aleatório conhecido. Nesse contexto, o Teste da Razão de Verossimilhança e o Critério de Akaike podem auxiliar na tarefa de escolha do modelo mais apropriado para análise dos dados, além de permitir verificar que escolhas de modelos inadequadas acarretam em conclusões divergentes em relação aos efeitos fixos do modelo. Com o desenvolvimento do Proc Mixed do SAS (Littel at al. 1996), utilizado neste trabalho, a análise desses experimentos, tratada pela metodologia modelos lineares mistos, tornou-se mais usual e segura. Com a finalidade de se atingir o objetivo deste trabalho, utilizaram-se dois exemplos (A e B) sobre a resposta da produtividade de três cultivares de trigo, em relação a níveis de irrigação por aspersão line-source. Foram criados e analisados 29 modelos para o Exemplo A e 16 modelos para o Exemplo B. Pôde-se verificar, para cada um dos exemplos, que as conclusões em relação aos efeitos fixos se modificaram de acordo com o modelo adotado. Notou-se, também, que o Critério de Akaike deve ser visto com cautela. Ao se comparar modelos similares entre os dois exemplos, ratificou-se a importância de se programar corretamente no Proc Mixed. Nesse contexto, conclui-se que é fundamental conduzir a análise de experimentos de forma ampla, buscando vários modelos e verificando quais têm lógica em relação ao plano experimental, evitando erros ao término da análise. / In Biology and Agronomy, experiments that produce correlated observations are often found. Theoretically, these correlations may be associated with whole-plots or subplots, according to the chosen experimental design. Also, the mixed linear model methodology is now being used much more frequently, especially after the works of Searle (1988), Searle et al. (1992) and Wolfinger (1993b), among others. The success of the modeling procedure is strongly associated with the examination of the random effects that must remain within the model and the possibility of introducing variance-covariance structures of random variables in the model. In the case of the mixed linear model, they may be included in the residual error or in the random part which is associated with the known random factor. In this context, the Likelihood Ratio Test and Akaike's Information Criterion can help in choosing the most appropriate model for data analysis. They also enable the verification of inadequate choice of models which can lead to divergent conclusions regarding the fixed effects of the model. With the development of the SAS Mixed Procedure (Little at al. 1996), which was used in this work, analysis of these experiments, conducted through the mixed linear model methodology, has become more usual and secure. In order to achieve the target of this work, two examples were utilized (A and B) involving the productivity response of three varieties of wheat, in regards to irrigation levels by line-source aspersion. Twenty-nine models for Example A and 16 models for Example B were created and analyzed. For each example, it was verified that conclusions regarding fixed effects changed according to the model adopted. It was also verified that Akaikes Information Criterion must be regarded with caution. When comparing similar models between the two examples, the importance of correct programming in the Mixed Procedure was confirmed. In this context, it can be concluded that it is fundamental to conduct the experiment analysis in an ample manner, looking for various models and verifying which ones make sense according to the experimental plan, thus avoiding errors at analysis completion. análise de variância estatística aplicada modelos lineares verossimilhança analysis of variance applied statistics likelihood linear models
130	A Comparison of Some Confidence Intervals for Estimating the Kurtosis Parameter Jerome, Guensley 15 June 2017 (has links) Several methods have been proposed to estimate the kurtosis of a distribution. The three common estimators are: g2, G2 and b2. This thesis addressed the performance of these estimators by comparing them under the same simulation environments and conditions. The performance of these estimators are compared through confidence intervals by determining the average width and probabilities of capturing the kurtosis parameter of a distribution. We considered and compared classical and non-parametric methods in constructing these intervals. Classical method assumes normality to construct the confidence intervals while the non-parametric methods rely on bootstrap techniques. The bootstrap techniques used are: Bias-Corrected Standard Bootstrap, Efron’s Percentile Bootstrap, Hall’s Percentile Bootstrap and Bias-Corrected Percentile Bootstrap. We have found significant differences in the performance of classical and bootstrap estimators. We observed that the parametric method works well in terms of coverage probability when data come from a normal distribution, while the bootstrap intervals struggled in constantly reaching a 95% confidence level. When sample data are from a distribution with negative kurtosis, both parametric and bootstrap confidence intervals performed well, although we noticed that bootstrap methods tend to have smaller intervals. When it comes to positive kurtosis, bootstrap methods perform slightly better than classical methods in coverage probability. Among the three kurtosis estimators, G2 performed better. Among bootstrap techniques, Efron’s Percentile intervals had the best coverage. Kurtosis Confidence Interval Bootstrap Simulation Kurtosis Parameter Kurtosis Estimator Applied Statistics Other Statistics and Probability Statistical Methodology

Search results