Spelling suggestions: "subject:"goodness off iit"" "subject:"goodness off hiit""
121 |
General conditional linear models with time-dependent coefficients under censoring and truncationTeodorescu, Bianca 19 December 2008 (has links)
In survival analysis interest often lies in the relationship between the survival function and a certain number of covariates. It usually happens that for some individuals we cannot observe the event of interest, due to the presence of right censoring and/or left truncation. A typical example is given by a retrospective medical study, in which one is interested in the time interval between birth and death due to a certain disease. Patients who die of the disease at early age will rarely have entered the study before death and are therefore left truncated. On the other hand, for patients who are alive at the end of the study, only a lower bound of the true survival time is known and these patients are hence right censored.
In the case of censored and/or truncated responses, lots of models exist in the literature that describe the relationship between the survival function and the covariates (proportional hazards model or Cox model, log-logistic model, accelerated failure time model, additive risks model, etc.). In these models, the regression coefficients are usually supposed to be constant over time. In practice, the structure of the data might however be more complex, and it might therefore be better to consider coefficients that can vary over time. In the previous examples, certain covariates (e.g. age at diagnosis, type of surgery, extension of tumor, etc.) can have a relatively high impact on early age survival, but a lower influence at higher age. This motivated a number of authors to extend the Cox model to allow for time-dependent coefficients or consider other type of time-dependent coefficients models like the additive hazards model.
In practice it is of great use to have at hand a method to check the validity of the above mentioned models.
First we consider a very general model, which includes as special cases the above mentioned models (Cox model, additive model, log-logistic model, linear transformation models, etc.) with time-dependent coefficients and study the parameter estimation by means of a least squares approach. The response is allowed to be subject to right censoring and/or left truncation.
Secondly we propose an omnibus goodness-of-fit test that will test if the general time-dependent model considered above fits the data. A bootstrap version, to approximate the critical values of the test is also proposed.
In this dissertation, for each proposed method, the finite sample performance is evaluated in a simulation study and then applied to a real data set.
|
122 |
Resultatpåverkan av olika fördelningar på parametern operationstid vid simuleringsstudier. : <html /> / <html /> : <html />Bengtsson, Angelica, Kuc, Arlena January 2011 (has links)
I detta arbete har studerats hur flödet i en flerstegs- bearbetningsprocess påverkas av stokastiska fluktuationer och störningar i de enskilda processtegen. Mera bestämt har analys utförts av hur de stokastiska variationerna i operationstiderna kan och bör modelleras vid simuleringsstudier. Även hur påverkan av valet av sådana stokastiska modeller kan tänkas ha på processen i sin helhet, till exempel avseende total genomloppstid. Examensarbetet syftar till att undersöka hur val av fördelning på parametern operationstid, påverkar resultatfaktorn genomloppstid vid flödessimuleringar. För att finna svar på denna påverkan har en fallstudie utförts, med utgångspunkt av indata från en produkt som tillverkas på Volvo Aero. Denna produkt genomgår en tillverkningssekvens innehållande 18 stycken bearbetningsoperationer innefattande tre olika processtyper (automatisk, halvautomatisk och manuell). Dessa tre processtyper är i olika grad beroende av operatörers insats. De 18 bearbetningsoperationernas processtid har analyserats numeriskt och grafiskt. Programvaran Stat:fit har använts som hjälpmedel för att erhålla svar på lämplig fördelning per tillverkningsoperation samt vilka teoretiska fördelningar som är lämpliga att använda för de tre olika processtyperna. De rekommenderade fördelningsteorierna per tillverkningsoperation har genomgått fördelningstest (Chi2, Kolmogorov-Smirnov och Anderson-Darling) och använts som grund vid skapande av försöksplan till simuleringsstudien. Simuleringsstudien har utförts enligt försöksplan i programvaran Simul8. Samtliga körningar från simuleringsmodellen är statistiskt säkerställda med 95 % konfidensintervall. Fallstudien har visat att resultatpåverkan från operationstidernas fördelningstyp är relativt liten vid simulering av komplexa system där faktorer som nivå av tillverkningsvolym och tillgänglighet har större påverkan på resultatfaktorn genomloppstid. Vid enklare modeller utan begränsning i form av reducerad tillgänglighet synliggörs skillnad i simuleringsresultat av olika val av fördelning på parametern operationstid. Fördelningen av dessa simuleringsresultat styrks av den centrala gränsvärdessatsen, det vill säga att om antalet observerade värden är tillräckligt stort, uppträder resultatet som normalfördelat. / Discrete event simulation is used to imitate and analyze how systems change over time. The actual behavior of the variation in the system is interpreted by using discrete and continuous probability distributions. In the software program Simul8, simulation models are created based on the information collected from the production. Shifts, operation time and efficiency are examples of information required for the modeling process. The aim with this bachelor´s thesis was to investigate how different choice of probability distributions on the parameter operation time affects the result of a discrete event simulation. The thesis is a result of a case study performed at Volvo Aero Corporation, Sweden. The case study involves investigation of probability distribution for 18 manufacturing operations for a product. The manufacturing sequence consists of three different types of processes (automatic, semiautomatic and manual). These three types of processes need different level of instrumentality. The commercial statistical computer software, Stat:fit has been used to find proper probability distribution for each of the manufacturing operations. The results from Stat:fit have been used to analyze if there are any connections between the process type and the probability distributions. The recommended probability distributions have been tested with Goodness-of-fit tests (Chi2, Kolmogorov-Smirnov and Anderson-Darling) using Stat:fit and used in the simulation modeling. The simulation model has been validated and verified by a simulation advisor at Volvo Aero. Five different simulation models have been evaluated in Simul8, with five different types of distributions. All simulation runs have been statistical proved, in Simul8 with 95% confidence interval. The result of this study indicates that the variation of process time has limited effect for complex simulation models containing low level of efficiency and high load factors, concerning the result of throughput time. For simple models, excluded from restricted efficiency, the effect on the throughput time is featured.
|
123 |
Ένας έλεγχος καλής προσαρμογής για συνεχείς δισδιάστατες κατανομέςΑλεξόπουλος, Ανδρέας 06 November 2007 (has links)
Η παρούσα διπλωματική εργασία αντλεί την θεματολογία της από την θεωρία ελέγχων καλής προσαρμογής. Δίνονται τα βασικά σημεία της θεωρίας ελεγχοσυναρτήσεων και στη συνέχεια παρουσιάζεται η επέκταση του έλεγχου των Kolmogorov-Smirnov στο διδιάστατο χώρο καθώς και μια τροποποίησή της. Βασικό βοήθημα για την επέκταση αυτή αποτελεί το θεώρημα του Rosenblatt, το οποίο προτείνει ένα μετασχηματισμό μιας απόλυτα συνεχούς k-διάστατης κατανομής σε ομοιόμορφη κατανομή στον k-διάστατο υπερκύβο. Παρουσιάζεται επίσης το στατιστικό Α, το οποίο προτάθηκε από τον Damico. Η ιδιαιτερότητα αυτού του στατιστικού είναι ότι έχει διακριτή κατανομή.
Προτείνεται ένα στατιστικό για τον έλεγχο καλής προσαρμογής συνεχών δεδομένων αρχικά στις δύο και στη συνέχεια στις k διαστάσεις. Ως εργαλεία χρησιμοποιήθηκαν το στατιστικό Α και το Θεώρημα του Rosenblatt. Για διάφορα μεγέθη δείγματος, δίνονται ο πίνακας πιθανοτήτων για τις τιμές του στατιστικού καθώς και ο πίνακας με τις κρίσιμες τιμές για διάφορες τιμές του p-value. Οι πίνακες αυτοί προέκυψαν κυρίως με μεθόδους προσομοίωσης. Τέλος, υπολογίστηκε η ισχύς του ελέγχου και γίνεται σύγκριση με την ισχύ του διδιάστατου Kolmogorov-Smirnov. / This project is based in theory of goodness-fit-tests. We present the most important componenets of test funcion theory. Also, we present the extension of the Kolmogorov-Smirnov test in bivariate case and an approximation. This extension is based on Rosenblatt's theorem, which suggests a transformation of an absolutly continious k-variate distribution into the uniform distribution of the k-dimentional hypercube. Moreover, is presented the statistic A, which was suggested from Damico. The particularity of this statistic is that has a district contribution.
We suggest a goodnes-of-fit test for continious data first on two dimensions and after on k dimensions. This new statistic uses Rosenblatt's transformation and the statistic A. For different sizes of sample, are given the table of probablities and the table with the critical values. These tables were arised with simulation methods. Finally, was computed the power of the test and was compared with the power of the bivariate Kolmogorv-Smirnov.
|
124 |
Le progiciel PoweR : un outil de recherche reproductible pour faciliter les calculs de puissance de certains tests d'hypothèses au moyen de simulations de Monte CarloTran, Viet Anh 06 1900 (has links)
Notre progiciel PoweR vise à faciliter l'obtention ou la vérification des études empiriques de puissance pour les tests d'ajustement. En tant que tel, il peut être considéré comme un outil de calcul de recherche reproductible, car il devient très facile à reproduire (ou détecter les erreurs) des résultats de simulation déjà publiés dans la littérature. En utilisant notre progiciel, il devient facile de concevoir de nouvelles études de simulation. Les valeurs critiques et puissances de nombreuses statistiques de tests sous une grande variété de distributions alternatives sont obtenues très rapidement et avec précision en utilisant un C/C++ et R environnement. On peut même compter sur le progiciel snow de R pour le calcul parallèle, en utilisant un processeur multicœur. Les résultats peuvent être affichés en utilisant des tables latex ou des graphiques spécialisés, qui peuvent être incorporés directement dans vos publications. Ce document donne un aperçu des principaux objectifs et les principes de conception ainsi que les stratégies d'adaptation et d'extension. / Package PoweR aims at facilitating the obtainment or verification of empirical power studies for goodness-of-fit tests. As such, it can be seen as a reproducible research computational tool because it becomes very easy to reproduce (or detect errors in) simulation results already published in the literature. Using our package, it becomes easy to design new simulation studies. The empirical levels and powers for many statistical test statistics under a wide variety of alternative distributions are obtained fastly and accurately using a C/C++ and R environment. One can even rely on package snow to parallelize their computations, using a multicore processor. The results can be displayed using LaTeX tables or specialized graphs, which can be directly incorporated into your publications. This paper gives an overview of the main design aims and principles as well as strategies for adaptation and extension. Hand-on illustrations are presented to get new users started easily.
|
125 |
Graph Structured Normal Means InferenceSharpnack, James 01 May 2013 (has links)
This thesis addresses statistical estimation and testing of signals over a graph when measurements are noisy and high-dimensional. Graph structured patterns appear in applications as diverse as sensor networks, virology in human networks, congestion in internet routers, and advertising in social networks. We will develop asymptotic guarantees of the performance of statistical estimators and tests, by stating conditions for consistency by properties of the graph (e.g. graph spectra). The goal of this thesis is to demonstrate theoretically that by exploiting the graph structure one can achieve statistical consistency in extremely noisy conditions.
We begin with the study of a projection estimator called Laplacian eigenmaps, and find that eigenvalue concentration plays a central role in the ability to estimate graph structured patterns. We continue with the study of the edge lasso, a least squares procedure with total variation penalty, and determine combinatorial conditions under which changepoints (edges across which the underlying signal changes) on the graph are recovered. We will shift focus to testing for anomalous activations in the graph, using the scan statistic relaxations, the spectral scan statistic and the graph ellipsoid scan statistic. We will also show how one can form a decomposition of the graph from a spanning tree which will lead to a test for activity in the graph. This will lead to the construction of a spanning tree wavelet basis, which can be used to localize activations on the graph.
|
126 |
Nonparametric estimation of the mixing distribution in mixed models with random intercepts and slopesSaab, Rabih 24 April 2013 (has links)
Generalized linear mixture models (GLMM) are widely used in statistical applications to model count and binary data. We consider the problem of nonparametric likelihood estimation of mixing distributions in GLMM's with multiple random effects. The log-likelihood to be maximized has the general form
l(G)=Σi log∫f(yi,γ) dG(γ)
where f(.,γ) is a parametric family of component densities, yi is the ith observed response dependent variable, and G is a mixing distribution function of the random effects vector γ defined on Ω.
The literature presents many algorithms for maximum likelihood estimation (MLE) of G in the univariate random effect case such as the EM algorithm (Laird, 1978), the intra-simplex direction method, ISDM (Lesperance and Kalbfleish, 1992), and vertex exchange method, VEM (Bohning, 1985). In this dissertation, the constrained Newton method (CNM) in Wang (2007), which fits GLMM's with random intercepts only, is extended to fit clustered datasets with multiple random effects. Owing to the general equivalence theorem from the geometry of mixture likelihoods (see Lindsay, 1995), many NPMLE algorithms including CNM and ISDM maximize the directional derivative of the log-likelihood to add potential support points to the mixing distribution G. Our method, Direct Search Directional Derivative (DSDD), uses a directional search method to find local maxima of the multi-dimensional directional derivative function. The DSDD's performance is investigated in GLMM where f is a Bernoulli or Poisson distribution function. The algorithm is also extended to cover GLMM's with zero-inflated data.
Goodness-of-fit (GOF) and selection methods for mixed models have been developed in the literature, however their application in models with nonparametric random effects distributions is vague and ad-hoc. Some popular measures such as the Deviance Information Criteria (DIC), conditional Akaike Information Criteria (cAIC) and R2 statistics are potentially useful in this context. Additionally, some cross-validation goodness-of-fit methods popular in Bayesian applications, such as the conditional predictive ordinate (CPO) and numerical posterior predictive checks, can be applied with some minor modifications to suit the non-Bayesian approach. / Graduate / 0463 / rabihsaab@gmail.com
|
127 |
Testing Benford’s Law with the first two significant digitsWong, Stanley Chun Yu 07 September 2010 (has links)
Benford’s Law states that the first significant digit for most data is not uniformly distributed. Instead, it follows the distribution: P(d = d1) = log10(1 + 1/d1) for d1 ϵ {1, 2, …, 9}. In 2006, my supervisor, Dr. Mary Lesperance et. al tested the goodness-of-fit of data to Benford’s Law using the first significant digit. Here we extended the research to the first two significant digits by performing several statistical tests – LR-multinomial, LR-decreasing, LR-generalized Benford, LR-Rodriguez, Cramѐr-von Mises Wd2, Ud2, and Ad2 and Pearson’s χ2; and six simultaneous confidence intervals – Quesenberry, Goodman, Bailey Angular, Bailey Square, Fitzpatrick and Sison.
When testing compliance with Benford’s Law, we found that the test statistics LR-generalized Benford, Wd2 and Ad2 performed well for Generalized Benford distribution, Uniform/Benford mixture distribution and Hill/Benford mixture distribution while Pearson’s χ2 and LR-multinomial statistics are more appropriate for the contaminated additive/multiplicative distribution. With respect to simultaneous confidence intervals, we recommend Goodman and Sison to detect deviation from Benford’s Law.
|
128 |
Nonparametric estimation of the mixing distribution in mixed models with random intercepts and slopesSaab, Rabih 24 April 2013 (has links)
Generalized linear mixture models (GLMM) are widely used in statistical applications to model count and binary data. We consider the problem of nonparametric likelihood estimation of mixing distributions in GLMM's with multiple random effects. The log-likelihood to be maximized has the general form
l(G)=Σi log∫f(yi,γ) dG(γ)
where f(.,γ) is a parametric family of component densities, yi is the ith observed response dependent variable, and G is a mixing distribution function of the random effects vector γ defined on Ω.
The literature presents many algorithms for maximum likelihood estimation (MLE) of G in the univariate random effect case such as the EM algorithm (Laird, 1978), the intra-simplex direction method, ISDM (Lesperance and Kalbfleish, 1992), and vertex exchange method, VEM (Bohning, 1985). In this dissertation, the constrained Newton method (CNM) in Wang (2007), which fits GLMM's with random intercepts only, is extended to fit clustered datasets with multiple random effects. Owing to the general equivalence theorem from the geometry of mixture likelihoods (see Lindsay, 1995), many NPMLE algorithms including CNM and ISDM maximize the directional derivative of the log-likelihood to add potential support points to the mixing distribution G. Our method, Direct Search Directional Derivative (DSDD), uses a directional search method to find local maxima of the multi-dimensional directional derivative function. The DSDD's performance is investigated in GLMM where f is a Bernoulli or Poisson distribution function. The algorithm is also extended to cover GLMM's with zero-inflated data.
Goodness-of-fit (GOF) and selection methods for mixed models have been developed in the literature, however their application in models with nonparametric random effects distributions is vague and ad-hoc. Some popular measures such as the Deviance Information Criteria (DIC), conditional Akaike Information Criteria (cAIC) and R2 statistics are potentially useful in this context. Additionally, some cross-validation goodness-of-fit methods popular in Bayesian applications, such as the conditional predictive ordinate (CPO) and numerical posterior predictive checks, can be applied with some minor modifications to suit the non-Bayesian approach. / Graduate / 0463 / rabihsaab@gmail.com
|
129 |
Plánování a řízení projektu - možnost uplatnění simulační techniky / Planning and project management - the possibility of applying simulation techniquesŠEFRÁNEK, Jaroslav January 2013 (has links)
From the knowledge gained by studying scientific publications and other resources on possibilities of applying stochastic approach in planning, directing and controlling the projects will be evaluated time analysis of the construction project. The project will be analyzed using simulation techniques. The subject of research, the project Modernization of the Czech Budejovice - Nemanice, which will be designed for appropriate recommendations.
|
130 |
Statistical modelling of data from insect studies / Modelagem estatística de dados provenientes de estudos em entomologiaRafael de Andrade Moral 19 December 2017 (has links)
Data from insect studies may present different features. Univariate responses may be analyzed using generalized linear models (continuous and discrete data), survival models (time until event data), mixed effects models (longitudinal data), among other methods. These models may be used to analyse data from experiments which assess complex ecological processes, such as competition and predation. In that sense, computational tools are useful for researchers in several fields, e.g., insect biology and physiology, applied ecology and biological control. Using different datasets from entomology as motivation, as well as other types of datasets for illustration purposes, this work intended to develop new modelling frameworks and goodness-of-fit assessment tools. We propose accelerated failure rate mixed models with simultaneous location and scale modelling with regressors to analyse time-until-attack data from a choice test experiment. We use the exponential, Weibull and exponentiated-Weibull models, and assess goodness-of-fit using half-normal plots with simulation envelopes. These plots are the subject of an entire Chapter on an R package, called hnp, developed to implement them. We use datasets from different types of experiments to illustrate the use of these plots and the package. A bivariate extension to the N-mixture modelling framework is proposed to analyse longitudinal count data for two species from the same food web that may interact directly or indirectly, and example datasets from ecological studies are used. An advantage of this modelling framework is the computation of an asymmetric correlation coefficient, which may be used by ecologists to study the degree of association between species. The jointNmix R package was also developed to implement the estimation process for these models. Finally, we propose a goodness-of-fit assessment tool for bivariate models, analogous to the half-normal plot with a simulation envelope, and illustrate the approach with simulated data and insect competition data. This tool is also implemented in an R package, called bivrp. All software developed in this thesis is made available freely on the Comprehensive R Archive Network. / Dados provenientes de estudos com insetos podem apresentar características diferentes. Respostas univariadas podem ser analisadas utilizando-se modelos lineares generalizados (dados contínuos e discretos), modelos de análise de sobrevivência (dados de tempo até ocorrência de um evento), modelos de efeitos mistos (dados longitudinais), dentre outros métodos. Esses modelos podem ser usados para analisar dados provenientes de experimentos que avaliam processos ecológicos complexos, como competição e predação. Nesse sentido, ferramentas computacionais são úteis para pesquisadores em diversos campos, por exemplo, biologia e fisiologia de insetos, ecologia aplicada e controle biológico. Utilizando diferentes conjuntos de dados entomológicos como motivação, assim como outros tipos de dados para ilustrar os métodos, este trabalho teve como objetivos desenvolver novos modelos e ferramentas para avaliar a qualidade do ajuste. Foram propostos modelos de tempo de vida acelerado mistos, com modelagem simultânea dos parâmetros de locação e de escala com regressores, para analisar dados de tempo até ataque de um experimento que avaliou escolha de predadores. Foram utilizados modelos exponencial, Weibull e Weibull-exponenciado, e a qualidade do ajuste foi avaliada utilizando gráficos meio-normais com envelope de simulação. Esses gráficos são o assunto de um Capítulo inteiro sobre um pacote para o software R, chamado hnp, desenvolvido para implementá-los. Foram utilizados conjuntos de dados de diferentes tipos de experimentos para ilustrar o uso desses gráficos e do pacote. Uma extensão bivariada para os modelos chamados \"N-mixture\" foi proposta para analisar dados longitudinais de contagem para duas espécies pertencentes à mesma teia trófica, que podem interagir direta e indiretamente, e conjuntos de dados provenientes de estudos ecológicos são usados para ilustrar a abordagem. Uma vantagem dessa estratégica de modelagem é a obtenção de um coeficiente de correlação assimétrico, que pode ser utilizado por ecologistas para inferir acerca do grau de associação entre espécies. O pacote jointNmix foi desenvolvido para implemetar o processo de estimação para esses modelos. Finalmente, foi proposta uma ferramenta de avaliação de qualidade do ajuste para modelos bivariados, análoga ao gráfico meio-normal com envelope de simulação, e a metodologia _e ilustrada com dados simulados e dados de competição de insetos. Essa ferramenta está também implementada em um pacote para o R, chamado bivrp. Todo o software desenvolvido nesta tese está disponível, gratuitamente, na Comprehensive R Archive Network (CRAN).
|
Page generated in 0.0621 seconds