Spelling suggestions: "subject:"are cvents"" "subject:"are 5events""
11 |
Optimal Driver Risk ModelingMao, Huiying 21 August 2019 (has links)
The importance of traffic safety has prompted considerable research on predicting driver risk and evaluating the impact of risk factors. Driver risk modeling is challenging due to the rarity of motor vehicle crashes and heterogeneity in individual driver risk. Statistical modeling and analysis of such driver data are often associated with Big Data, considerable noise, and lacking informative predictors. This dissertation aims to develop several systematic techniques for traffic safety modeling, including finite sample bias correction, decision-adjusted modeling, and effective risk factor construction.
Poisson and negative binomial regression models are primary statistical analysis tools for traffic safety evaluation. The regression parameter estimation could suffer from the finite sample bias when the event frequency (e.g., the total number of crashes) is low, which is commonly observed in safety research. Through comprehensive simulation and two case studies, it is found that bias adjustment can provide more accurate estimation when evaluating the impacts of crash risk factors.
I also propose a decision-adjusted approach to construct an optimal kinematic-based driver risk prediction model. Decision-adjusted modeling fills the gap between conventional modeling methods and the decision-making perspective, i.e., on how the estimated model will be used. The key of the proposed method is to enable a decision-oriented objective function to properly adjust model estimation by selecting the optimal threshold for kinematic signatures and other model parameters. The decision-adjusted driver-risk prediction framework can outperform a general model selection rule such as the area under the curve (AUC), especially when predicting a small percentage of high-risk drivers.
For the third part, I develop a Multi-stratum Iterative Central Composite Design (miCCD) approach to effectively search for the optimal solution of any "black box" function in high dimensional space. Here the "black box" means that the specific formulation of the objective function is unknown or is complicated. The miCCD approach has two major parts: a multi-start scheme and local optimization. The multi-start scheme finds multiple adequate points to start with using space-filling designs (e.g. Latin hypercube sampling).
For each adequate starting point, iterative CCD converges to the local optimum. The miCCD is able to determine the optimal threshold of the kinematic signature as a function of the driving speed. / Doctor of Philosophy / When riding in a vehicle, it is common to have personal judgement about whether the driver is safe or risky. The drivers’ behavior may affect your opinion, for example, you may think a driver who frequently hard brakes during one trip is a risky driver, or perhaps a driver who almost took a turn too tightly may be deemed unsafe, but you do not know how much riskier these drivers are compared to an experienced driver. The goal of this dissertation is to show that it is possible to quantify driver risk using data and statistical methods. Risk quantification is not an easy task as crashes are rare and random events. The wildest driver may have no crashes involved in his/her driving history. The rareness and randomness of crash occurrence pose great challenges for driver risk modeling. The second chapter of this dissertation deals with the rare-event issue and provides more accurate estimation. Hard braking, rapid starts, and sharp turns are signs of risky driving behavior. How often these signals occur in a driver’s day-to-day driving reflects their driving habits, which is helpful in modeling driver risk. What magnitude of deceleration would be counted as a hard brake? How hard of a corner would be useful in predicting high-risk drivers? The third and fourth chapter of this dissertation attempt to find the optimal threshold and quantify how much these signals contribute to the assessment of the driver risk. In Chapter 3, I propose to choose the threshold based on the specific application scenario. In Chapter 4, I consider the threshold under different speed limit conditions. The modeling and results of this dissertation will be beneficial for driver fleet safety management, insurance services, and driver education programs.
|
12 |
Simulation d'évènements rares par Monte Carlo dans les réseaux hautement fiables / Rare event simulation using Monte Carlo in highly reliable networksSaggadi, Samira 08 July 2013 (has links)
Le calcul de la fiabilité des réseaux est en général un problème NP-difficile. On peut par exemple s’intéresser à la fiabilité des systèmes de télécommunications où l'on veut évaluer la probabilité qu’un groupe sélectionné de nœuds peuvent communiquer. Dans ce cas, un ensemble de nœuds déconnectés peut avoir des conséquences critiques, que ce soit financières ou au niveau de la sécurité. Une estimation précise de la fiabilité est ainsi nécessaire. Dans le cadre de ce travail, on s'intéresse à l’étude et au calcul de la fiabilité des réseaux hautement fiables. Dans ce cas la défiabilité est très petite, ce qui rend l’approche standard de Monte Carlo inutile, car elle nécessite un grand nombre d’itérations. Pour une bonne estimation de la fiabilité des réseaux au moindre coût, nous avons développé de nouvelles techniques de simulation basées sur la réduction de variance par échantillonnage préférentiel. / Network reliability determination, is an NP-hard problem. For instance, in telecommunications, it is desired to evaluate the probability that a selected group of nodes communicate or not. In this case, a set of disconnected nodes can lead to critical financials security consequences. A precise estimation of the reliability is, therefore, needed. In this work, we are interested in the study and the calculation of the reliability of highly reliable networks. In this case the unreliability is very small, which makes the standard Monte Carlo approach useless, because it requires a large number of iterations. For a good estimation of system reliability with minimum cost, we have developed new simulation techniques based on variance reduction using importance sampling.
|
13 |
Estimation de la disponibilité par simulation, pour des systèmes incluant des contraintes logistiques / Availability estimation by simulations for systems including logisticsRai, Ajit 09 July 2018 (has links)
L'analyse des FDM (Reliability, Availability and Maintainability en anglais) fait partie intégrante de l'estimation du coût du cycle de vie des systèmes ferroviaires. Ces systèmes sont hautement fiables et présentent une logistique complexe. Les simulations Monte Carlo dans leur forme standard sont inutiles dans l'estimation efficace des paramètres des FDM à cause de la problématique des événements rares. C'est ici que l'échantillonnage préférentiel joue son rôle. C'est une technique de réduction de la variance et d'accélération de simulations. Cependant, l'échantillonnage préférentiel inclut un changement de lois de probabilité (changement de mesure) du modèle mathématique. Le changement de mesure optimal est inconnu même si théoriquement il existe et fournit un estimateur avec une variance zéro. Dans cette thèse, l'objectif principal est d'estimer deux paramètres pour l'analyse des FDM: la fiabilité des réseaux statiques et l'indisponibilité asymptotique pour les systèmes dynamiques. Pour ce faire, la thèse propose des méthodes pour l'estimation et l'approximation du changement de mesure optimal et l'estimateur final. Les contributions se présentent en deux parties: la première partie étend la méthode de l'approximation du changement de mesure de l'estimateur à variance zéro pour l'échantillonnage préférentiel. La méthode estime la fiabilité des réseaux statiques et montre l'application à de réels systèmes ferroviaires. La seconde partie propose un algorithme en plusieurs étapes pour l'estimation de la distance de l'entropie croisée. Cela permet d'estimer l'indisponibilité asymptotique pour les systèmes markoviens hautement fiables avec des contraintes logistiques. Les résultats montrent une importante réduction de la variance et un gain par rapport aux simulations Monte Carlo. / RAM (Reliability, Availability and Maintainability) analysis forms an integral part in estimation of Life Cycle Costs (LCC) of passenger rail systems. These systems are highly reliable and include complex logistics. Standard Monte-Carlo simulations are rendered useless in efficient estimation of RAM metrics due to the issue of rare events. Systems failures of these complex passenger rail systems can include rare events and thus need efficient simulation techniques. Importance Sampling (IS) are an advanced class of variance reduction techniques that can overcome the limitations of standard simulations. IS techniques can provide acceleration of simulations, meaning, less variance in estimation of RAM metrics in same computational budget as a standard simulation. However, IS includes changing the probability laws (change of measure) that drive the mathematical models of the systems during simulations and the optimal IS change of measure is usually unknown, even though theroretically there exist a perfect one (zero-variance IS change of measure). In this thesis, we focus on the use of IS techniques and its application to estimate two RAM metrics : reliability (for static networks) and steady state availability (for dynamic systems). The thesis focuses on finding and/or approximating the optimal IS change of measure to efficiently estimate RAM metrics in rare events context. The contribution of the thesis is broadly divided into two main axis : first, we propose an adaptation of the approximate zero-variance IS method to estimate reliability of static networks and show the application on real passenger rail systems ; second, we propose a multi-level Cross-Entropy optimization scheme that can be used during pre-simulation to obtain CE optimized IS rates of Markovian Stochastic Petri Nets (SPNs) transitions and use them in main simulations to estimate steady state unavailability of highly reliably Markovian systems with complex logistics involved. Results from the methods show huge variance reduction and gain compared to MC simulations.
|
14 |
Stochastic discount factor bounds and rare events: a reviewMedeiros Júnior, Maurício da Silva 22 March 2016 (has links)
Submitted by Maurício da Silva Medeiros Júnior (mauriciojr.df@gmail.com) on 2016-04-06T21:06:49Z
No. of bitstreams: 1
Dissertação - Maurício Medeiros Jr. - FGV-EPGE.pdf: 2837403 bytes, checksum: dc338c6b56e600b8b9194b1c27abb080 (MD5) / Approved for entry into archive by BRUNA BARROS (bruna.barros@fgv.br) on 2016-04-19T18:34:15Z (GMT) No. of bitstreams: 1
Dissertação - Maurício Medeiros Jr. - FGV-EPGE.pdf: 2837403 bytes, checksum: dc338c6b56e600b8b9194b1c27abb080 (MD5) / Approved for entry into archive by Marcia Bacha (marcia.bacha@fgv.br) on 2016-04-27T18:16:31Z (GMT) No. of bitstreams: 1
Dissertação - Maurício Medeiros Jr. - FGV-EPGE.pdf: 2837403 bytes, checksum: dc338c6b56e600b8b9194b1c27abb080 (MD5) / Made available in DSpace on 2016-04-27T18:21:05Z (GMT). No. of bitstreams: 1
Dissertação - Maurício Medeiros Jr. - FGV-EPGE.pdf: 2837403 bytes, checksum: dc338c6b56e600b8b9194b1c27abb080 (MD5)
Previous issue date: 2016-03-22 / We aim to provide a review of the stochastic discount factor bounds usually applied to diagnose asset pricing models. In particular, we mainly discuss the bounds used to analyze the disaster model of Barro (2006). Our attention is focused in this disaster model since the stochastic discount factor bounds that are applied to study the performance of disaster models usually consider the approach of Barro (2006). We first present the entropy bounds that provide a diagnosis of the analyzed disaster model which are the methods of Almeida and Garcia (2012, 2016); Ghosh et al. (2016). Then, we discuss how their results according to the disaster model are related to each other and also present the findings of other methodologies that are similar to these bounds but provide different evidence about the performance of the framework developed by Barro (2006).
|
15 |
Simulações atomísticas de eventos raros através do método super-simétrico / Atomistic simulation of rare events via the super-symmetric methodLandinez Borda, Edgar Josué, 1984- 11 March 2010 (has links)
Orientador: Maurice de Koning / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Fisica Gleb Wataghin / Made available in DSpace on 2018-08-17T03:22:05Z (GMT). No. of bitstreams: 1
LandinezBorda_EdgarJosue_M.pdf: 6192602 bytes, checksum: b68b0a1398ca87f49a34034ae4473a58 (MD5)
Previous issue date: 2010 / Resumo: Nesta dissertação abordamos o problema da escala temporal nas simulações atomísticas, focando no problema de eventos raros. A solução deste problema so e possível com o desenvolvimento de técnicas especiais. Especificamente, estudamos o método super-simétrico para encontrar caminhos de reação. Este metodo não apresenta as limitições comuns de outros metodos para eventos raros. Aplicamos o método a três problemas padrão e encontramos que o método permite estudar as transições raras sem precisar de um conhecimento detalhado do sistema. Além disso permite observar qualitativamente os mecanismos de transição / Abstract: This thesis deals with the problem of time scale in atomistic simulations, focusing on the problem of rare events. The solution to this problem is only possible with the development of special techniques. Specifically, we studied the super-symmetric method to find reaction pathways. This method does not have the usual limitations of other methods for rare events. We apply the method to three standard problems and find that the method allows to study the rare transitions without a detailed knowledge of the system. In addition, it allows us to observe qualitatively the transition mechanisms / Mestrado / Física Estatistica e Termodinamica / Mestre em Física
|
16 |
Molecular simulation = methods and applications = Simulações moleculares : métodos e aplicações / Simulações moleculares : métodos e aplicaçõesFreitas, Rodrigo Moura, 1989- 23 August 2018 (has links)
Orientador: Maurice de Koning / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Física Gleb Wataghin / Made available in DSpace on 2018-08-23T00:50:21Z (GMT). No. of bitstreams: 1
Freitas_RodrigoMoura_M.pdf: 11496259 bytes, checksum: 41c29f22d80da01064cf7a3b9681b05f (MD5)
Previous issue date: 2013 / Resumo: Devido aos avanços conceptuais e técnicos feitos em física computacional e ciência dos materiais computacional nos estamos aptos a resolver problemas que eram inacessíveis a alguns anos atrás. Nessa dissertação estudamos a evolução de alguma destas técnicas, apresentando a teoria e técnicas de simulação computacional para estudar transições de fase de primeira ordem com ênfase nas técnicas mais avançadas de calculo de energia livre (Reversible Scaling) e métodos de simulação de eventos raros (Forward Flux Sampling) usando a técnica de simulação atomística da Dinâmica Molecular. A evolução e melhora da e ciência destas técnicas e apresentada junto com aplicações a sistemas simples que permitem solução exata e também ao caso mais complexo da transição de fase Martenstica. Também apresentamos a aplicação de métodos numéricos no estudo do modelo de Pauling para o gelo. Nos desenvolvemos e implementamos um novo algoritmo para a criação e ciente de estruturas de gelo desordenadas. Este algoritmo de geração de cristais de gelo nos permitiu criar células de gelo Ih de tamanhos que não eram possíveis antes. Usando este algoritmo abordamos o problema de efeitos de tamanho finito não estudados anteriormente / Abstract: Due to the conceptual and technical advances being made in computational physics and computational materials science we have been able to tackle problems that were inaccessible a few years ago. In this dissertation we study the evolution of some of these techniques, presenting the theory and simulation methods to study _rst order phase transitions with emphasis on state-of-the-art free-energy calculation (Reversible Scaling) and rare event (Forward Flux Sampling) methods using the atomistic simulation technique of Molecular Dynamics. The evolution and efficiency improvement of these techniques is presented together with applications to simple systems that allow exact solution as well as the more the complex case of Martensitic phase transitions. We also present the application of numerical methods to study Pauling\'s model of ice. We have developed and implemented a new algorithm for efficient generation of disordered ice structures. This ice generator algorithm allows us to create ice Ih cells of sizes not reported before. Using this algorithm we address finite size effects not studied before / Mestrado / Física / Mestre em Física
|
17 |
Estimation non paramétrique adaptative dans la théorie des valeurs extrêmes : application en environnement / Nonparametric adaptive estimation in the extreme value theory : application in ecologyPham, Quang Khoai 09 January 2015 (has links)
L'objectif de cette thèse est de développer des méthodes statistiques basées sur la théorie des valeurs extrêmes pour estimer des probabilités d'évènements rares et des quantiles extrêmes conditionnelles. Nous considérons une suite de variables aléatoires indépendantes X_{t_1}$, $X_{t_2}$,...$,$X_{t_n}$ associées aux temps $0≤t_{1}< … <t_{n}≤T_{\max}$ où $X_{t_i}$ a la fonction de répartition $F_{t_i}$ et $F_t$ est la loi conditionnelle de $X$ sachant $T=t \in [0,T_{\max}]$. Pour chaque $t \in [0,T_{\max}]$, nous proposons un estimateur non paramétrique de quantiles extrêmes de $F_t$. L'idée de notre approche consiste à ajuster pour chaque $t \in [0,T_{\max}]$ la queue de la distribution $F_{t}$, par une distribution de Pareto de paramètre $\theta_{t,\tau}$ à partir d'un seuil $\tau.$ Le paramètre $\theta_{t,\tau}$ est estimé en utilisant un estimateur non paramétrique à noyau de taille de fenêtre $h$ basé sur les observations plus grandes que $\tau$. Sous certaines hypothèses de régularité, nous montrons que l'estimateur adaptatif proposé de $\theta_{t,\tau} $ est consistant et nous donnons sa vitesse de convergence. Nous proposons une procédure de tests séquentiels pour déterminer le seuil $\tau$ et nous obtenons le paramètre $h$ suivant deux méthodes : la validation croisée et une approche adaptative. Nous proposons également une méthode pour choisir simultanément le seuil $\tau$ et la taille de la fenêtre $h$. Finalement, les procédures proposées sont étudiées sur des données simulées et sur des données réelles dans le but d'aider à la surveillance de systèmes aquatiques. / The objective of this PhD thesis is to develop statistical methods based on the theory of extreme values to estimate the probabilities of rare events and conditional extreme quantiles. We consider independent random variables $X_{t_1},…,X_{t_n}$ associated to a sequence of times $0 ≤t_1 <… < t_n ≤ T_{\max}$ where $X_{t_i}$ has distribution function $F_{t_i}$ and $F_t$ is the conditional distribution of $X$ given $T = t \in [0,T_{\max}]$. For each $ t \in [0, T {\max}]$, we propose a nonparametric adaptive estimator for extreme quantiles of $F_t$. The idea of our approach is to adjust the tail of the distribution function $F_t$ with a Pareto distribution of parameter $\theta {t,\tau}$ starting from a threshold $\tau$. The parameter $\theta {t,\tau}$ is estimated using a nonparametric kernel estimator of bandwidth $h$ based on the observations larger than $\tau$. We propose a sequence testing based procedure for the choice of the threshold $\tau$ and we determine the bandwidth $h$ by two methods: cross validation and an adaptive procedure. Under some regularity assumptions, we prove that the adaptive estimator of $\theta {t, \tau}$ is consistent and we determine its rate of convergence. We also propose a method to choose simultaneously the threshold $\tau$ and the bandwidth $h$. Finally, we study the proposed procedures by simulation and on real data set to contribute to the survey of aquatic systems.
|
18 |
Rare events simulation by shaking transformations : Non-intrusive resampler for dynamic programming / Simulation des événements rares par transformations de shaking : Rééchantillonneur non-intrusif pour la programmation dynamiqueLiu, Gang 23 November 2016 (has links)
Cette thèse contient deux parties: la simulation des événements rares et le rééchantillonnage non-intrusif stratifié pour la programmation dynamique. La première partie consiste à quantifier des statistiques liées aux événements très improbables mais dont les conséquences sont sévères. Nous proposons des transformations markoviennes sur l'espace des trajectoires et nous les combinons avec les systèmes de particules en interaction et l'ergodicité de chaîne de Markov, pour proposer des méthodes performantes et applicables en grande généralité. La deuxième partie consiste à résoudre numériquement le problème de programmation dynamique dans un contexte où nous avons à disposition seulement des données historiques en faible nombre et nous ne connaissons pas les valeurs des paramètres du modèle. Nous développons et analysons un nouveau schéma composé de stratification et rééchantillonnage / This thesis contains two parts: rare events simulation and non-intrusive stratified resampler for dynamic programming. The first part consists of quantifying statistics related to events which are unlikely to happen but which have serious consequences. We propose Markovian transformation on path spaces and combine them with the theories of interacting particle system and of Markov chain ergodicity to propose methods which apply very generally and have good performance. The second part consists of resolving dynamic programming problem numerically in a context where we only have historical observations of small size and we do not know the values of model parameters. We propose and analyze a new scheme with stratification and resampling techniques.
|
19 |
Nonparametric criteria for sparse contingency tables / Neparametriniai kriterijai retų įvykių dažnių lentelėmsSamusenko, Pavel 18 February 2013 (has links)
In the dissertation, the problem of nonparametric testing for sparse contingency tables is addressed.
Statistical inference problems caused by sparsity of contingency tables are widely discussed in the literature. Traditionally, the expected (under null the hypothesis) frequency is required to exceed 5 in almost all cells of the contingency table. If this condition is violated, the χ2 approximations of goodness of fit statistics may be inaccurate and the table is said to be sparse . Several techniques have been proposed to tackle the problem: exact tests, alternative approximations, parametric and nonparametric bootstrap, Bayes approach and other methods. However they all are not applicable or have some limitations in nonparametric statistical inference of very sparse contingency tables.
In the dissertation, it is shown that, for sparse categorical data, the likelihood ratio statistic and Pearson’s χ2 statistic may become noninformative: they do not anymore measure the goodness-of-fit of null hypotheses to data. Thus, they can be inconsistent even in cases where a simple consistent test does exist.
An improvement of the classical criteria for sparse contingency tables is proposed. The improvement is achieved by grouping and smoothing of sparse categorical data by making use of a new sparse asymptotics model relying on (extended) empirical Bayes approach. Under general conditions, the consistency of the proposed criteria based on grouping is proved. Finite-sample behavior of... [to full text] / Disertacijoje sprendžiami neparametrinių hipotezių tikrinimo uždaviniai išretintoms dažnių lentelėms.
Problemos, susijusios su retų įvykių dažnių lentelėmis yra plačiai aptartos mokslinėje literatūroje. Yra pasiūlyta visa eilė metodų: tikslieji testai, alternatyvūs aproksimavimo būdai parametrinė ir neparametrinė saviranka, Bayeso ir kiti metodai. Tačiau jie nepritaikomi arba yra neefektyvūs neparametrinėje labai išretintų dažnių lentelių analizėje.
Disertacijoje parodyta, kad labai išretintiems kategoriniams duomenims tikėtinumo santykio statistika ir Pearsono χ2 statistika gali pasidaryti neinformatyviomis: jos jau nėra tinkamos nulinės hipotezės ir duomenų suderinamumui matuoti. Vadinasi, jų pagrindu sudaryti kriterijai gali būti net nepagrįsti net tuo atveju, kai egzistuoja paprastas pagrįstas kriterijus.
Darbe yra pasiūlytas klasikinių kriterijų patobulinimas išretintų dažnių lentelėms. Siūlomi kriterijai remiasi išretintų kategorinių duomenų grupavimu ir glodinimu naudojant naują išretinimo asimtotikos modelį, kuris remiasi (išplėstine) empirine Bayeso metodologija. Prie bendrų sąlygų yra įrodytas siūlomų kriterijų, naudojančių grupavimą, pagrįstumas. Kriterijų elgesys baigtinių imčių atveju tiriamas taikant Monte Carlo modeliavimą.
Disertacija susideda iš įvado, 4 skyrių, literatūros sąrašo, bendrų išvadų ir priedo.
Įvade atskleidžiama nagrinėjamos mokslinės problemos svarba, aprašomi darbo tikslai ir uždaviniai, tyrimo metodai, mokslinis naujumas, praktinė gautų... [toliau žr. visą tekstą]
|
20 |
Neparametriniai kriterijai retų įvykių dažnių lentelėms / Nonparametric criteria for sparse contingency tablesSamusenko, Pavel 18 February 2013 (has links)
Disertacijoje sprendžiami neparametrinių hipotezių tikrinimo uždaviniai išretintoms dažnių lentelėms.
Problemos, susijusios su retų įvykių dažnių lentelėmis yra plačiai aptartos mokslinėje literatūroje. Yra pasiūlyta visa eilė metodų: tikslieji testai, alternatyvūs aproksimavimo būdai parametrinė ir neparametrinė saviranka, Bayeso ir kiti metodai. Tačiau jie nepritaikomi arba yra neefektyvūs neparametrinėje labai išretintų dažnių lentelių analizėje.
Disertacijoje parodyta, kad labai išretintiems kategoriniams duomenims tikėtinumo santykio statistika ir Pearsono χ2 statistika gali pasidaryti neinformatyviomis: jos jau nėra tinkamos nulinės hipotezės ir duomenų suderinamumui matuoti. Vadinasi, jų pagrindu sudaryti kriterijai gali būti net nepagrįsti net tuo atveju, kai egzistuoja paprastas pagrįstas kriterijus.
Darbe yra pasiūlytas klasikinių kriterijų patobulinimas išretintų dažnių lentelėms. Siūlomi kriterijai remiasi išretintų kategorinių duomenų grupavimu ir glodinimu naudojant naują išretinimo asimtotikos modelį, kuris remiasi (išplėstine) empirine Bayeso metodologija. Prie bendrų sąlygų yra įrodytas siūlomų kriterijų, naudojančių grupavimą, pagrįstumas. Kriterijų elgesys baigtinių imčių atveju tiriamas taikant Monte Carlo modeliavimą.
Disertacija susideda iš įvado, 4 skyrių, literatūros sąrašo, bendrų išvadų ir priedo.
Įvade atskleidžiama nagrinėjamos mokslinės problemos svarba, aprašomi darbo tikslai ir uždaviniai, tyrimo metodai, mokslinis naujumas, praktinė gautų... [toliau žr. visą tekstą] / In the dissertation, the problem of nonparametric testing for sparse contingency tables is addressed.
Statistical inference problems caused by sparsity of contingency tables are widely discussed in the literature. Traditionally, the expected (under null the hypothesis) frequency is required to exceed 5 in almost all cells of the contingency table. If this condition is violated, the χ2 approximations of goodness of fit statistics may be inaccurate and the table is said to be sparse . Several techniques have been proposed to tackle the problem: exact tests, alternative approximations, parametric and nonparametric bootstrap, Bayes approach and other methods. However they all are not applicable or have some limitations in nonparametric statistical inference of very sparse contingency tables.
In the dissertation, it is shown that, for sparse categorical data, the likelihood ratio statistic and Pearson’s χ2 statistic may become noninformative: they do not anymore measure the goodness-of-fit of null hypotheses to data. Thus, they can be inconsistent even in cases where a simple consistent test does exist.
An improvement of the classical criteria for sparse contingency tables is proposed. The improvement is achieved by grouping and smoothing of sparse categorical data by making use of a new sparse asymptotics model relying on (extended) empirical Bayes approach. Under general conditions, the consistency of the proposed criteria based on grouping is proved. Finite sample behavior of... [to full text]
|
Page generated in 0.0582 seconds