Global ETD Search

161	Cluster-based lack of fit tests for nonlinear regression models Munasinghe, Wijith Prasantha January 1900 (has links) Doctor of Philosophy / Department of Statistics / James W. Neill / Checking the adequacy of a proposed parametric nonlinear regression model is important in order to obtain useful predictions and reliable parameter inferences. Lack of fit is said to exist when the regression function does not adequately describe the mean of the response vector. This dissertation considers asymptotics, implementation and a comparative performance for the likelihood ratio tests suggested by Neill and Miller (2003). These tests use constructed alternative models determined by decomposing the lack of fit space according to clusterings of the observations. Clusterings are selected by a maximum power strategy and a sequence of statistical experiments is developed in the sense of Le Cam. L2 differentiability of the parametric array of probability measures associated with the sequence of experiments is established in this dissertation, leading to local asymptotic normality. Utilizing contiguity, the limit noncentral chi-square distribution under local parameter alternatives is then derived. For implementation purposes, standard linear model projection algorithms are used to approximate the likelihood ratio tests, after using the convexity of a class of fuzzy clusterings to form a smooth alternative model which is necessarily used to approximate the corresponding maximum optimal statistical experiment. It is demonstrated empirically that good power can result by allowing cluster selection to vary according to different points along the expectation surface of the proposed nonlinear regression model. However, in some cases, a single maximum clustering suffices, leading to the development of a Bonferroni adjusted multiple testing procedure. In addition, the maximin clustering based likelihood ratio tests were observed to possess markedly better simulated power than the generalized likelihood ratio test with semiparametric alternative model presented by Ciprian and Ruppert (2004). Nonlinear regression models Lack of fit tests Cluster-based Maximin clustering Non-central Chi-Square distribution Likelihood ratio tests Mathematics (0405) Statistics (0463)
162	Prédiction de l'attrition en date de renouvellement en assurance automobile avec processus gaussiens Pannetier Lebeuf, Sylvain 08 1900 (has links) Le domaine de l’assurance automobile fonctionne par cycles présentant des phases de profitabilité et d’autres de non-profitabilité. Dans les phases de non-profitabilité, les compagnies d’assurance ont généralement le réflexe d’augmenter le coût des primes afin de tenter de réduire les pertes. Par contre, de très grandes augmentations peuvent avoir pour effet de massivement faire fuir la clientèle vers les compétiteurs. Un trop haut taux d’attrition pourrait avoir un effet négatif sur la profitabilité à long terme de la compagnie. Une bonne gestion des augmentations de taux se révèle donc primordiale pour une compagnie d’assurance. Ce mémoire a pour but de construire un outil de simulation de l’allure du porte- feuille d’assurance détenu par un assureur en fonction du changement de taux proposé à chacun des assurés. Une procédure utilisant des régressions à l’aide de processus gaus- siens univariés est développée. Cette procédure offre une performance supérieure à la régression logistique, le modèle généralement utilisé pour effectuer ce genre de tâche. / The field of auto insurance is working by cycles with phases of profitability and other of non-profitability. In the phases of non-profitability, insurance companies generally have the reflex to increase the cost of premiums in an attempt to reduce losses. For cons, very large increases may have the effect of massive attrition of the customers. A too high attrition rate could have a negative effect on long-term profitability of the company. Proper management of rate increases thus appears crucial to an insurance company. This thesis aims to build a simulation tool to predict the content of the insurance portfolio held by an insurer based on the rate change proposed to each insured. A proce- dure using univariate Gaussian Processes regression is developed. This procedure offers a superior performance than the logistic regression model typically used to perform such tasks. forage de données processus gaussien attrition assurance automobile data mining gaussian process churn automobile insurance
163	Modélisation bayésienne avec des splines du comportement moyen d'un échantillon de courbes Merleau, James 08 1900 (has links) Cette thèse porte sur l'analyse bayésienne de données fonctionnelles dans un contexte hydrologique. L'objectif principal est de modéliser des données d'écoulements d'eau d'une manière parcimonieuse tout en reproduisant adéquatement les caractéristiques statistiques de celles-ci. L'analyse de données fonctionnelles nous amène à considérer les séries chronologiques d'écoulements d'eau comme des fonctions à modéliser avec une méthode non paramétrique. Dans un premier temps, les fonctions sont rendues plus homogènes en les synchronisant. Ensuite, disposant d'un échantillon de courbes homogènes, nous procédons à la modélisation de leurs caractéristiques statistiques en faisant appel aux splines de régression bayésiennes dans un cadre probabiliste assez général. Plus spécifiquement, nous étudions une famille de distributions continues, qui inclut celles de la famille exponentielle, de laquelle les observations peuvent provenir. De plus, afin d'avoir un outil de modélisation non paramétrique flexible, nous traitons les noeuds intérieurs, qui définissent les éléments de la base des splines de régression, comme des quantités aléatoires. Nous utilisons alors le MCMC avec sauts réversibles afin d'explorer la distribution a posteriori des noeuds intérieurs. Afin de simplifier cette procédure dans notre contexte général de modélisation, nous considérons des approximations de la distribution marginale des observations, nommément une approximation basée sur le critère d'information de Schwarz et une autre qui fait appel à l'approximation de Laplace. En plus de modéliser la tendance centrale d'un échantillon de courbes, nous proposons aussi une méthodologie pour modéliser simultanément la tendance centrale et la dispersion de ces courbes, et ce dans notre cadre probabiliste général. Finalement, puisque nous étudions une diversité de distributions statistiques au niveau des observations, nous mettons de l'avant une approche afin de déterminer les distributions les plus adéquates pour un échantillon de courbes donné. / This thesis is about Bayesian functional data analysis in hydrology. The main objective is to model water flow data in a parsimonious fashion while still reproducing the statistical features of the data. Functional data analysis leads us to consider the water flow time series as functions to be modelled with a nonparametric method. First, the functions are registered in order to make them more homogeneous. With a more homogeneous sample of curves, we proceed to model their statistical features by relying on Bayesian regression splines in a fairly broad probabilistic framework. More specifically, we study a family of continuous distributions, which include those of the exponential family, from which the data might have arisen. Furthermore, to have a flexible nonparametric modeling tool, we treat the interior knots, which define the basis elements of the regression splines, as random quantities. We then use MCMC with reversible jumps in order to explore the posterior distribution of the interior knots. In order to simplify the procedure in our general modeling context, we consider some approximations for the marginal distribution of the observations, namely one based on the Schwarz information criterion and another which relies on Laplace's approximation. In addition to modeling the central tendency of a sample of curves, we also propose a methodology to simultaneously model the central tendency and the dispersion of the curves in our general probabilistic framework. Finally, since we study several statistical distributions for the observations, we put forward an approach to determine the most adequate distributions for a given sample of curves. Splines de régression bayésienne Bayesian free-knot regression splines Synchronisation Registration Modèle de dispersion Dispersion model Modélisation d'hydrogrammes Hydrograph modeling
164	Les tests de causalité en variance entre deux séries chronologiques multivariées Nkwimi-Tchahou, Herbert 12 1900 (has links) Les modèles de séries chronologiques avec variances conditionnellement hétéroscédastiques sont devenus quasi incontournables afin de modéliser les séries chronologiques dans le contexte des données financières. Dans beaucoup d'applications, vérifier l'existence d'une relation entre deux séries chronologiques représente un enjeu important. Dans ce mémoire, nous généralisons dans plusieurs directions et dans un cadre multivarié, la procédure dévéloppée par Cheung et Ng (1996) conçue pour examiner la causalité en variance dans le cas de deux séries univariées. Reposant sur le travail de El Himdi et Roy (1997) et Duchesne (2004), nous proposons un test basé sur les matrices de corrélation croisée des résidus standardisés carrés et des produits croisés de ces résidus. Sous l'hypothèse nulle de l'absence de causalité en variance, nous établissons que les statistiques de test convergent en distribution vers des variables aléatoires khi-carrées. Dans une deuxième approche, nous définissons comme dans Ling et Li (1997) une transformation des résidus pour chaque série résiduelle vectorielle. Les statistiques de test sont construites à partir des corrélations croisées de ces résidus transformés. Dans les deux approches, des statistiques de test pour les délais individuels sont proposées ainsi que des tests de type portemanteau. Cette méthodologie est également utilisée pour déterminer la direction de la causalité en variance. Les résultats de simulation montrent que les tests proposés offrent des propriétés empiriques satisfaisantes. Une application avec des données réelles est également présentée afin d'illustrer les méthodes / Time series models with conditionnaly heteroskedastic variances have become almost inevitable to model financial time series. In many applications, to confirm the existence of a relationship between two time series is very important. In this Master thesis, we generalize in several directions and in a multivariate framework, the method developed by Cheung and Ng (1996) designed to examine causality in variance in the case of two univariate series. Based on the work of El Himdi and Roy (1997) and Duchesne (2004), we propose a test based on residual cross-correlation matrices of squared residuals and cross-products of these residuals. Under the null hypothesis of no causality in variance, we establish that the test statistics converge in distribution to chi-square random variables. In a second approach, we define as in Ling and Li (1997) a transformation of the residuals for each residual time series. The test statistics are built from the cross-correlations of these transformed residuals. In both approaches, test statistics at individual lags are presented and also portmanteau-type test statistics. That methodology is also used to determine the direction of causality in variance. The simulation results show that the proposed tests provide satisfactory empirical properties. An application with real data is also presented to illustrate the methods Causalité Hétérocédastiques Tests Portemanteaux Corrélations Causality Heteroscedasticity Portmanteau Correlation Test Conditionnellement Variances séries Chronologiques Multivariées Multivariate Time Series Conditionnal Variance
165	Validation des modèles statistiques tenant compte des variables dépendantes du temps en prévention primaire des maladies cérébrovasculaires Kis, Loredana 07 1900 (has links) L’intérêt principal de cette recherche porte sur la validation d’une méthode statistique en pharmaco-épidémiologie. Plus précisément, nous allons comparer les résultats d’une étude précédente réalisée avec un devis cas-témoins niché dans la cohorte utilisé pour tenir compte de l’exposition moyenne au traitement : – aux résultats obtenus dans un devis cohorte, en utilisant la variable exposition variant dans le temps, sans faire d’ajustement pour le temps passé depuis l’exposition ; – aux résultats obtenus en utilisant l’exposition cumulative pondérée par le passé récent ; – aux résultats obtenus selon la méthode bayésienne. Les covariables seront estimées par l’approche classique ainsi qu’en utilisant l’approche non paramétrique bayésienne. Pour la deuxième le moyennage bayésien des modèles sera utilisé pour modéliser l’incertitude face au choix des modèles. La technique utilisée dans l’approche bayésienne a été proposée en 1997 mais selon notre connaissance elle n’a pas été utilisée avec une variable dépendante du temps. Afin de modéliser l’effet cumulatif de l’exposition variant dans le temps, dans l’approche classique la fonction assignant les poids selon le passé récent sera estimée en utilisant des splines de régression. Afin de pouvoir comparer les résultats avec une étude précédemment réalisée, une cohorte de personnes ayant un diagnostique d’hypertension sera construite en utilisant les bases des données de la RAMQ et de Med-Echo. Le modèle de Cox incluant deux variables qui varient dans le temps sera utilisé. Les variables qui varient dans le temps considérées dans ce mémoire sont iv la variable dépendante (premier évènement cérébrovasculaire) et une des variables indépendantes, notamment l’exposition / The main interest of this research is the validation of a statistical method in pharmacoepidemiology. Specifically, we will compare the results of a previous study performed with a nested case-control which took into account the average exposure to treatment to : – results obtained in a cohort study, using the time-dependent exposure, with no adjustment for time since exposure ; – results obtained using the cumulative exposure weighted by the recent past ; – results obtained using the Bayesian model averaging. Covariates are estimated by the classical approach and by using a nonparametric Bayesian approach. In the later, the Bayesian model averaging will be used to model the uncertainty in the choice of models. To model the cumulative effect of exposure which varies over time, in the classical approach the function assigning weights according to recency will be estimated using regression splines. In order to compare the results with previous studies, a cohort of people diagnosed with hypertension will be constructed using the databases of the RAMQ and Med-Echo. The Cox model including two variables which vary in time will be used. The time-dependent variables considered in this paper are the dependent variable (first stroke event) and one of the independent variables, namely the exposure. modèle de Cox B-spline moyennage bayésien des modèles analyse de survie Cox model Bayesian model averaging survival analysis
166	Le progiciel PoweR : un outil de recherche reproductible pour faciliter les calculs de puissance de certains tests d'hypothèses au moyen de simulations de Monte Carlo Tran, Viet Anh 06 1900 (has links) Notre progiciel PoweR vise à faciliter l'obtention ou la vérification des études empiriques de puissance pour les tests d'ajustement. En tant que tel, il peut être considéré comme un outil de calcul de recherche reproductible, car il devient très facile à reproduire (ou détecter les erreurs) des résultats de simulation déjà publiés dans la littérature. En utilisant notre progiciel, il devient facile de concevoir de nouvelles études de simulation. Les valeurs critiques et puissances de nombreuses statistiques de tests sous une grande variété de distributions alternatives sont obtenues très rapidement et avec précision en utilisant un C/C++ et R environnement. On peut même compter sur le progiciel snow de R pour le calcul parallèle, en utilisant un processeur multicœur. Les résultats peuvent être affichés en utilisant des tables latex ou des graphiques spécialisés, qui peuvent être incorporés directement dans vos publications. Ce document donne un aperçu des principaux objectifs et les principes de conception ainsi que les stratégies d'adaptation et d'extension. / Package PoweR aims at facilitating the obtainment or verification of empirical power studies for goodness-of-fit tests. As such, it can be seen as a reproducible research computational tool because it becomes very easy to reproduce (or detect errors in) simulation results already published in the literature. Using our package, it becomes easy to design new simulation studies. The empirical levels and powers for many statistical test statistics under a wide variety of alternative distributions are obtained fastly and accurately using a C/C++ and R environment. One can even rely on package snow to parallelize their computations, using a multicore processor. The results can be displayed using LaTeX tables or specialized graphs, which can be directly incorporated into your publications. This paper gives an overview of the main design aims and principles as well as strategies for adaptation and extension. Hand-on illustrations are presented to get new users started easily. recherche reproductible Monte Carlo étude de puissance test d'adéquation R reproducible research Monte Carlo power study goodness-of-fit test R
167	Inference for the K-sample problem based on precedence probabilities Dey, Rajarshi January 1900 (has links) Doctor of Philosophy / Department of Statistics / Paul I. Nelson / Rank based inference using independent random samples to compare K>1 continuous distributions, called the K-sample problem, based on precedence probabilities is developed and explored. There are many parametric and nonparametric approaches, most dealing with hypothesis testing, to this important, classical problem. Most existing tests are designed to detect differences among the location parameters of different distributions. Best known and most widely used of these is the F- test, which assumes normality. A comparable nonparametric test was developed by Kruskal and Wallis (1952). When dealing with location-scale families of distributions, both of these tests can perform poorly if the differences among the distributions are among their scale parameters and not in their location parameters. Overall, existing tests are not effective in detecting changes in both location and scale. In this dissertation, I propose a new class of rank-based, asymptotically distribution- free tests that are effective in detecting changes in both location and scale based on precedence probabilities. Let X_{i} be a random variable with distribution function F_{i} ; Also, let _pi_ be the set of all permutations of the numbers (1,2,...,K) . Then P(X_{i_{1}}<...<X_{i_{K}}) is a precedence probability if (i_{1},...,i_{K}) belongs to _pi_. Properties of these of tests are developed using the theory of U-statistics (Hoeffding, 1948). Some of these new tests are related to volumes under ROC (Receiver Operating Characteristic) surfaces, which are of particular interest in clinical trials whose goal is to use a score to separate subjects into diagnostic groups. Motivated by this goal, I propose three new index measures of the separation or similarity among two or more distributions. These indices may be used as “effect sizes”. In a related problem, Properties of precedence probabilities are obtained and a bootstrap algorithm is used to estimate an interval for them. Precedence probabilities Nonlinear rank-based statistic U-statistic K-sample problem Hypervolume under ROC manifold Statistics (0463)
168	Using functional boxplots to visualize reflectance data and distinguish between areas of native grasses and invasive old world bluestems in a Kansas tall grass prairie Highland, Garth January 1900 (has links) Master of Science / Department of Statistics / Leigh Murray / Using remotely sensed reflectance data is an appealing tool for controlling invasive species of grasses by rangeland managers. Recent developments in functional data analysis include the functional boxplot (FBP) which is shown here to be a useful tool in the visualization of reflectance data. Functional boxplots are a novel method of visually inspecting functional data and determining the presence of outliers in the data. Implementation and interpretation of FBPs are both straightforward and intuitive. The goal of this study is to examine the use of FBPs for visualizing reflectance data, and to determine the efficacy of using the FBP to distinguish between native tall grasses and invasive Old World Bluestem (OWB, Bothriochloa spp.) monocultures in a Kansas prairie. Validation trials were conducted in order to determine the stability of the FBP when used to analyze spectral data. FBPs were shown to be highly stable for use with both native and OWB grasses at all times and subsets of wavelengths tested. Identification trials were conducted by introducing a single OWB observation to a test set of native tall grass observations and constructing a FBP. Results indicate that using observations recorded early in the growing season, the functional boxplot is able to successfully identify the OWB observation as an outlier in a test set of native tall grass observations with an estimated probability 100% and 95.45% when considering the visible and cellular spectrums, respectively. A 95% lower bound for the probability of successfully identifying the OWB observation using the cellular spectrum in May is found to be 89.67%. Functional boxplot Reflectance data Spectral reflectance Functional data analysis Data visualization Agronomy (0285) Range Management (0777) Statistics (0463)
169	Comparaison d'estimateurs de la variance du TMLE Boulanger, Laurence 09 1900 (has links) No description available. Inférence causale Causal inference TMLE variance estimation estimation de la variance estimateur sandwich sandwich estimator Jackknife
170	Validation des modèles statistiques tenant compte des variables dépendantes du temps en prévention primaire des maladies cérébrovasculaires Kis, Loredana 07 1900 (has links) L’intérêt principal de cette recherche porte sur la validation d’une méthode statistique en pharmaco-épidémiologie. Plus précisément, nous allons comparer les résultats d’une étude précédente réalisée avec un devis cas-témoins niché dans la cohorte utilisé pour tenir compte de l’exposition moyenne au traitement : – aux résultats obtenus dans un devis cohorte, en utilisant la variable exposition variant dans le temps, sans faire d’ajustement pour le temps passé depuis l’exposition ; – aux résultats obtenus en utilisant l’exposition cumulative pondérée par le passé récent ; – aux résultats obtenus selon la méthode bayésienne. Les covariables seront estimées par l’approche classique ainsi qu’en utilisant l’approche non paramétrique bayésienne. Pour la deuxième le moyennage bayésien des modèles sera utilisé pour modéliser l’incertitude face au choix des modèles. La technique utilisée dans l’approche bayésienne a été proposée en 1997 mais selon notre connaissance elle n’a pas été utilisée avec une variable dépendante du temps. Afin de modéliser l’effet cumulatif de l’exposition variant dans le temps, dans l’approche classique la fonction assignant les poids selon le passé récent sera estimée en utilisant des splines de régression. Afin de pouvoir comparer les résultats avec une étude précédemment réalisée, une cohorte de personnes ayant un diagnostique d’hypertension sera construite en utilisant les bases des données de la RAMQ et de Med-Echo. Le modèle de Cox incluant deux variables qui varient dans le temps sera utilisé. Les variables qui varient dans le temps considérées dans ce mémoire sont iv la variable dépendante (premier évènement cérébrovasculaire) et une des variables indépendantes, notamment l’exposition / The main interest of this research is the validation of a statistical method in pharmacoepidemiology. Specifically, we will compare the results of a previous study performed with a nested case-control which took into account the average exposure to treatment to : – results obtained in a cohort study, using the time-dependent exposure, with no adjustment for time since exposure ; – results obtained using the cumulative exposure weighted by the recent past ; – results obtained using the Bayesian model averaging. Covariates are estimated by the classical approach and by using a nonparametric Bayesian approach. In the later, the Bayesian model averaging will be used to model the uncertainty in the choice of models. To model the cumulative effect of exposure which varies over time, in the classical approach the function assigning weights according to recency will be estimated using regression splines. In order to compare the results with previous studies, a cohort of people diagnosed with hypertension will be constructed using the databases of the RAMQ and Med-Echo. The Cox model including two variables which vary in time will be used. The time-dependent variables considered in this paper are the dependent variable (first stroke event) and one of the independent variables, namely the exposure. modèle de Cox B-spline moyennage bayésien des modèles analyse de survie Cox model Bayesian model averaging survival analysis

Search results