21 |
Tests de permutation d’indépendance en analyse multivariéeGuetsop Nangue, Aurélien 11 1900 (has links)
Cette thèse est rédigée par articles. Les articles sont rédigés en anglais et le reste de la thèse est rédigée en français. / Le travail établit une équivalence en termes de puissance entre les tests basés sur la alpha-distance de covariance et sur le critère d'indépendance de Hilbert-Schmidt (HSIC) avec fonction caractéristique de distribution de probabilité stable d'indice alpha avec paramètre d'échelle suffisamment petit. Des simulations en grandes dimensions montrent la supériorité des tests de distance de covariance et des tests HSIC par rapport à certains tests utilisant les copules. Des simulations montrent également que la distribution de Pearson de type III, très utile et moins connue, approche la distribution exacte de permutation des tests et donne des erreurs de type I précises. Une nouvelle méthode de sélection adaptative des paramètres d'échelle pour les tests HSIC est proposée. Trois simulations, dont deux sont empruntées de l'apprentissage automatique, montrent que la nouvelle méthode de sélection améliore la puissance des tests HSIC. Le problème de tests d'indépendance entre deux vecteurs est généralisé au problème de tests d'indépendance mutuelle entre plusieurs vecteurs. Le travail traite aussi d'un problème très proche à savoir, le test d'indépendance sérielle d'une suite multidimensionnelle stationnaire. La décomposition de Möbius des fonctions caractéristiques est utilisée pour caractériser l'indépendance. Des tests généralisés basés sur le critère d'indépendance de Hilbert-Schmidt et sur la distance de covariance en sont obtenus. Une équivalence est également établie entre le test basé sur la distance de covariance et le test HSIC de noyau caractéristique d'une distribution stable avec des paramètres d'échelle suffisamment petits. La convergence faible du test HSIC est obtenue. Un calcul rapide et précis des valeurs-p des tests développés utilise une distribution de Pearson de type III comme approximation de la distribution exacte des tests. Un résultat fascinant est l'obtention des trois premiers moments exacts de la distribution de permutation des statistiques de dépendance. Une méthodologie similaire a été développée pour le test d'indépendance sérielle d'une suite. Des applications à des données réelles environnementales et financières sont effectuées. / The main result establishes the equivalence in terms of power between the alpha-distance covariance test and the Hilbert-Schmidt independence criterion (HSIC) test with the characteristic kernel of a stable probability distribution of index alpha with sufficiently small scale parameters. Large-scale simulations reveal the superiority of these two tests over other tests based on the empirical independence copula process. They also establish the usefulness of the lesser known Pearson type III approximation to the exact permutation distribution. This approximation yields tests with more accurate type I error rates than the gamma approximation usually used for HSIC, especially when dimensions of the two vectors are large. A new method for scale parameter selection in HSIC tests is proposed which improves power performance in three simulations, two of which are from machine learning. The problem of testing mutual independence between many random vectors is addressed. The closely related problem of testing serial independence of a multivariate stationary sequence is also considered. The Möbius transformation of characteristic functions is used to characterize independence. A generalization to p vectors of the alpha -distance covariance test and the Hilbert-Schmidt independence criterion (HSIC) test with the characteristic kernel of a stable probability distributionof index alpha is obtained. It is shown that an HSIC test with sufficiently small scale parameters is equivalent to an alpha -distance covariance test. Weak convergence of the HSIC test is established. A very fast and accurate computation of p-values uses the Pearson type III approximation which successfully approaches the exact permutation distribution of the tests. This approximation relies on the exact first three moments of the permutation distribution of any test which can be expressed as the sum of all elements of a componentwise product of p doubly-centered matrices. The alpha -distance covariance test and the HSIC test are both of this form. A new selection method is proposed for the scale parameter of the characteristic kernel of the HSIC test. It is shown in a simulation that this adaptive HSIC test has higher power than the alpha-distance covariance test when data are generated from a Student copula. Applications are given to environmental and financial data.
|
22 |
New trends in dairy cattle genetic evaluationNICOLAZZI, EZEQUIEL LUIS 24 February 2011 (has links)
I sistemi di valutazione genetica nel mondo sono in rapido sviluppo. Attualmente, i programmi di selezione “tradizionale” basati su fenotipi e rapporti di parentela tra gli animali vengono integrati, e nel futuro potrebbero essere sostituiti, dalle informazioni molecolari. In questo periodo di transizione, questa tesi riguarda ricerche su entrambi i tipi di valutazioni: dall’accertamento sull’accuratezza degli indici genetici internazionali (tradizionali), allo studio di metodi statistici utilizzati per integrare informazioni genomiche nella selezione (selezione genomica). Tre capitoli valutano gli approcci per stimare i valori genetici dai dati genomici riducendo il numero di variabili indipendenti. In modo particolare, la correzione di Bonferroni e il test di permutazioni con regressione a marcatori singoli (Capitolo III), analisi delle componenti principali con BLUP (Capitolo IV) e indice Fst tra razze con BayesA (Capitolo VI). Inoltre, il Capitolo V analizza l’accuratezza dei valori genomici con BLUP, BayesA e Bayesian LASSO includendo tutte le variabili disponibili. I risultati di questa tesi indicano che il progresso genetico atteso dall’analisi dei dati simulati può effettivamente essere ottenuto, anche se ulteriori ricerche sono necessarie per ottimizzare l’utilizzo delle informazioni molecolari in modo da ottimizzare i risultati per tutti i caratteri sotto selezione. / Genetic evaluation systems are in rapid development worldwide. In most countries, “traditional” breeding programs based on phenotypes and relationships between animals are currently being integrated and in the future might be replaced by the introduction of molecular information. This thesis stands in this transition period, therefore it covers research on both types of genetic evaluations: from the assessment of the accuracy of (traditional) international genetic evaluations to the study of statistical methods used to integrate genomic information into breeding (genomic selection). Three chapters investigate and evaluate approaches for the estimation of genetic values from genomic data reducing the number of independent variables. In particular, Bonferroni correction and Permutation test combined with single marker regression (Chapter III), principal component analysis combined with BLUP (Chapter IV) and Fst across breeds combined with BayesA (Chapter VI). In addition, Chapter V analyzes the accuracy of direct genomic values with BLUP, BayesA and Bayesian LASSO including all available variables.
The results of this thesis indicate that the genetic gains expected from the analysis of simulated data can be obtained on real data. Still, further research is needed to optimize the use of genome-wide information and obtain the best possible estimates for all traits under selection.
|
23 |
複迴歸係數排列檢定方法探討 / Methods for testing significance of partial regression coefficients in regression model闕靖元, Chueh, Ching Yuan Unknown Date (has links)
在傳統的迴歸模型架構下,統計推論的進行需要假設誤差項之間相互獨立,且來自於常態分配。當理論模型假設條件無法達成的時候,排列檢定(permutation tests)這種無母數的統計方法通常會是可行的替代方法。
在以往的文獻中,應用於複迴歸模型(multiple regression)之係數排列檢定方法主要以樞紐統計量(pivotal quantity)作為檢定統計量,進而探討不同排列檢定方式的差異。本文除了採用t統計量這一個樞紐統計量作為檢定統計量的排列檢定方式外,亦納入以非樞紐統計量的迴歸係數估計量b22所建構而成的排列檢定方式,藉由蒙地卡羅模擬方法,比較以此兩類檢定方式之型一誤差(type I error)機率以及檢定力(power),並觀察其可行性以及適用時機。模擬結果顯示,在解釋變數間不相關且誤差分配較不偏斜的情形下,Freedman and Lane (1983)、Levin and Robbins (1983)、Kennedy (1995)之排列方法在樣本數大時適用b2統計量,且其檢定力較使用t2統計量高,但差異程度不大;若解釋變數間呈現高度相關,則不論誤差的偏斜狀態,Freedman and Lane (1983)、Kennedy (1995) 之排列方法於樣本數大時適用b2統計量,其檢定力結果也較使用t2統計量高,而且兩者的差異程度比起解釋變數間不相關時更加明顯。整體而言,使用t2統計量適用的場合較廣;相反的,使用b2的模擬結果則常需視樣本數大小以及解釋變數間相關性而定。 / In traditional linear models, error term are usually assumed to be independently, identically, normally distributed with mean zero and a constant variance. When the assumptions cannot meet, permutation tests can be an alternative method.
Several permutation tests have been proposed to test the significance of a partial regression coefficient in a multiple regression model. t=b⁄(se(b)), an asymptotically pivotal quantity, is usually preferred and suggested as the test statistic. In this study, we take not only t statistics, but also the estimates of the partial regression coefficient as our test statistics. Their performance are compared in terms of the probability of committing a type I error and the power through the use of Monte Carlo simulation method. Situations where estimates of the partial regression coefficients may outperform t statistics are discussed.
|
24 |
排列檢定法應用於空間資料之比較 / Permutation test on spatial comparison王信忠, Wang, Hsin-Chung Unknown Date (has links)
本論文主要是探討在二維度空間上二母體分佈是否一致。我們利用排列
(permutation)檢定方法來做比較, 並藉由費雪(Fisher)正確檢定方法的想法而提出重標記 (relabel)排列檢定方法或稱為費雪排列檢定法。
我們透過可交換性的特質證明它是正確 (exact) 的並且比 Syrjala (1996)所建議的排列檢定方法有更高的檢定力 (power)。
本論文另提出二個空間模型: spatial multinomial-relative-log-normal 模型 與 spatial Poisson-relative-log-normal 模型
來配適一般在漁業中常有的右斜長尾次數分佈並包含很多0 的空間資料。另外一般物種可能因天性或自然環境因素像食物、溫度等影響而有群聚行為發生, 這二個模型亦可描述出空間資料的群聚現象以做適當的推論。 / This thesis proposes the relabel (Fisher's) permutation test inspired by Fisher's exact test to compare between distributions of two (fishery) data sets locating on a two-dimensional lattice. We show that the permutation test given by Syrjala (1996} is not exact, but our relabel permutation test is exact and, additionally, more powerful.
This thesis also studies two spatial models: the spatial multinomial-relative-log-normal model and the spatial
Poisson-relative-log-normal model. Both models not only exhibit characteristics of skewness with a long right-hand tail and of high proportion of zero catches which usually appear in fishery data, but also have the ability to describe various types of aggregative behaviors.
|
25 |
再發事件資料之無母數分析黃惠芬 Unknown Date (has links)
再發事件資料常見於醫學、工業、財經、社會等等領域中,對再發資料分析研究時,我們往往無法確知再發事件發生的時間或是發生次數的分配。因此,本論文探討的是分析再發事件的無母數方法,包括Nelson提出的平均累積函數(mean cumulative function)估計量,及Wang、Chiang與Huang介紹的發生率(occurrence rate)之核函數(kernel function)估計量。
就平均累積函數估計量來說,藉由Nelson導出的變異數及自然(naive)變異數,可分別求得平均累積函數的區間估計。本文利用靴環法(bootstrap)計算出平均累積函數在不同時點的變異數,再與Nelson變異數及自然變異數比較,結果顯示Nelson變異數與靴環法算出的變異數較接近。因此,應依據Nelson變異數建構出事件發生累積次數之漸近信賴區間。
本論文亦介紹了兩個或多個母體的平均累積函數的比較方法,包含固定時點之比較與整條曲線之比較。在固定時點之下,比較方法分別為平均累積函數成對差異之漸近信賴區間及靴環信賴區間、變異數分析比較法,與排列檢定法;而整條曲線比較方法包含:類似 統計量、Lawless-Nadeau檢定。這些方法應用在本論文所採之實證資料時,所得到的檢定結論是一致的。 / Recurrent event data arise in many fields, such as medicine, industry, economics, social sciences and so on. When studying recurrent event data, we usually don’t know the exact joint or marginal distributions of the occurrence times or the number of events over time. So, in this article we talk about some nonparametric methods, such as the mean cumulative function (MCF) discussed by Nelson, and kernel estimation of the rate function introduced by Wang, Chiang and Huang.
As to the estimator of MCF, we can compute the confidence interval by Nelson’s variance and naive variance. We use bootstrap method to compare the performance of Nelson variance of the estimated MCF and naive variance of the estimated MCF. The results show that Nelson variance is better than naive variance, so we should construct the confidence limits for the MCF by Nelson’s variance except when only grouped data are available.
We also introduce methods for comparing MCFs, including pointwise comparison of MCFs and comparison of entire MCFs. Methods for pointwise comparing MCFs include approximate confidence limits for difference between two MCFs, analysis-of-variance comparison, permutation test, and bootstrap’s confidence limits for difference between two MCFs. Methods for comparing entire MCFs include a statistic like Hoetelling’s , and Lawless-Nadeau test. Finally, all approaches are employed to analyze a real data, and the conclusions concordance with each other.
|
Page generated in 0.2593 seconds