Spelling suggestions: "subject:"[een] INFORMATION CRITERION"" "subject:"[enn] INFORMATION CRITERION""
1 |
Change Point Analysis for Lognormal Distribution Based on Schwarz Information CriterionCooper, Richard 12 August 2020 (has links)
No description available.
|
2 |
KLIC作為傾向分數配對平衡診斷之可行性探討 / Using Kullback-Leibler Information Criterion on balancing diagnostics for baseline covariates between treatment groups in propensity-score matched samples李珮嘉, Li, Pei Chia Unknown Date (has links)
觀察性研究資料中,透過傾向分數的使用,可以使基準變數在實驗與對照兩組間達到某種程度的平衡,並可視同為一隨機試驗,進而進行有效的統計推論。文獻中有關平衡與否的診斷,大多聚焦於平均數與變異數的比較。本文中我們提出使用KLIC(Kullback-Leibler Information Criterion)及KS(Kolmogorov and Simonov)兩種比較分配函數差異的統計量,作為另一種平衡診斷工具的構想,並針對其可行性進行探討與評比。此外,數據顯示KLIC及KS與透過傾向分數配對的成功比例呈現負相關。由於配對成功比例過低將導致後續統計推論結果的侷限性,因此本文也就KLIC及KS作為是否進行配對的一個先行指標之可行性作探討。模擬結果顯示,二者的答案均是肯定的。 / In observational studies, propensity scores are frequently used as tools to balance the distribution of baseline covariates between treated and untreated groups to some extent so that the data could be treated as if they were from a randomized controlled trial (RCT) and causal inferences could thus be made. In the past, balance or not was usually diagnosed in terms of the means and/or the variances. In this study, we proposed using either Kullback-Leibler Information Criterion (KLIC) or Kolmogorov and Simonov (KS) statistic as a diagnostic measure, and evaluated its feasibility. In addition, since low propensity score matching rate decreases the power of the statistical inference and a pilot study showed that the matching rate was negatively correlated with KLIC and KS; thus, we also discussed the possibilities of using KLIC and KS to be pre-indices before implementing propensity score matching. Both considerations appear to be positive through our simulation study.
|
3 |
Best-subset model selection based on multitudinal assessments of likelihood improvementsCarter, Knute Derek 01 December 2013 (has links)
Given a set of potential explanatory variables, one model selection approach is to select the best model, according to some criterion, from among the collection of models defined by all possible subsets of the explanatory variables. A popular procedure that has been used in this setting is to select the model that results in the smallest value of the Akaike information criterion (AIC). One drawback in using the AIC is that it can lead to the frequent selection of overspecified models. This can be problematic if the researcher wishes to assert, with some level of certainty, the necessity of any given variable that has been selected.
This thesis develops a model selection procedure that allows the researcher to nominate, a priori, the probability at which overspecified models will be selected from among all possible subsets. The procedure seeks to determine if the inclusion of each candidate variable results in a sufficiently improved fitting term, and hence is referred to as the SIFT procedure. In order to determine whether there is sufficient evidence to retain a candidate variable or not, a set of threshold values are computed. Two procedures are proposed: a naive method based on a set of restrictive assumptions; and an empirical permutation-based method.
Graphical tools have also been developed to be used in conjunction with the SIFT procedure. The graphical representation of the SIFT procedure clarifies the process being undertaken. Using these tools can also assist researchers in developing a deeper understanding of the data they are analyzing.
The naive and empirical SIFT methods are investigated by way of simulation under a range of conditions within the standard linear model framework. The performance of the SIFT methodology is compared with model selection by minimum AIC; minimum Bayesian Information Criterion (BIC); and backward elimination based on p-values. The SIFT procedure is found to behave as designed—asymptotically selecting those variables that characterize the underlying data generating mechanism, while limiting the selection of false or spurious variables to the desired level.
The SIFT methodology offers researchers a promising new approach to model selection, whereby they are now able to control the probability of selecting an overspecified model to a level that best suits their needs.
|
4 |
Statistical Methods for High Dimensional Data in Environmental GenomicsSofer, Tamar January 2012 (has links)
In this dissertation, we propose methodology to analyze high dimensional genomics data, in which the observations have large number of outcome variables, in addition to exposure variables. In the Chapter 1, we investigate methods for genetic pathway analysis, where we have a small number of exposure variables. We propose two Canonical Correlation Analysis based methods, that select outcomes either sequentially or by screening, and show that the performance of the proposed methods depend on the correlation between the genes in the pathway. We also propose and investigate criterion for fixing the number of outcomes, and a powerful test for the exposure effect on the pathway. The methodology is applied to show that air pollution exposure affects gene methylation of a few genes from the asthma pathway. In Chapter 2, we study penalized multivariate regression as an efficient and flexible method to study the relationship between large number of covariates and multiple outcomes. We use penalized likelihood to shrink model parameters to zero and to select only the important effects. We use the Bayesian Information Criterion (BIC) to select tuning parameters for the employed penalty and show that it chooses the right tuning parameter with high probability. These are combined in the “two-stage procedure”, and asymptotic results show that it yields consistent, sparse and asymptotically normal estimator of the regression parameters. The method is illustrated on gene expression data in normal and diabetic patients. In Chapter 3 we propose a method for estimation of covariates-dependent principal components analysis (PCA) and covariance matrices. Covariates, such as smoking habits, can affect the variation in a set of gene methylation values. We develop a penalized regression method that incorporates covariates in the estimation of principal components. We show that the parameter estimates are consistent and sparse, and show that using the BIC to select the tuning parameter for the penalty functions yields good models. We also propose the scree plot residual variance criterion for selecting the number of principal components. The proposed procedure is implemented to show that the first three principal components of genes methylation in the asthma pathway are different in people who did not smoke, and people who did.
|
5 |
Combinação de Características Para Segmentação em Transcrição de LocutoresNeri, Leonardo Valeriano 21 February 2014 (has links)
Submitted by Lucelia Lucena (lucelia.lucena@ufpe.br) on 2015-03-09T19:16:26Z
No. of bitstreams: 2
DISSERTAÇÃO Leonardo Valeriano Neri.pdf: 1395784 bytes, checksum: f38db7dc7191951459624c0348b93e63 (MD5)
license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) / Made available in DSpace on 2015-03-09T19:16:26Z (GMT). No. of bitstreams: 2
DISSERTAÇÃO Leonardo Valeriano Neri.pdf: 1395784 bytes, checksum: f38db7dc7191951459624c0348b93e63 (MD5)
license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5)
Previous issue date: 2014-02-21 / Neste trabalho é apresentada uma abordagem de combinação de características para a
etapa de segmentação de locutores em um sistema de transcrição de locutores. Esta abordagem utiliza diferentes características acústicas extraídas da fonte de áudio com o objetivo de combinar as suas capacidades de discriminação para diferentes tipos de sons, aumentando assim, a precisão da segmentação. O Critério de Informação Bayesiana (BIC - Bayesian Information Criterion) é usado como uma medida de distância para verificar a propensão de junção de dois segmentos do áudio. Uma Rede Neural Artificial (RNA) combina as respostas obtidas por cada característica após a aplicação de um algoritmo que detecta se há mudança em um trecho do áudio. Os índices de tempo obtidos são usados como entrada da rede neural que estima o ponto de mudança do locutor no trecho de áudio. Um sistema de transcrição de locutores que inclui a abordagem proposta é desenvolvido para avaliar e comparar os resultados com os do sistema de transcrição que utiliza a abordagem clássica de segmentação de locutores Window-Growing de Chen e Gopalakrishnan, aplicada às diferentes características acústicas adotadas neste trabalho. Nos experimentos com o sistema de transcrição de locutores, uma base artificial contendo amostras com vários locutores é usada. A avaliação dos resultados da etapa de segmentação do sistema mostra um aprimoramento em ambas as taxas de perda de detecção (MDR - Miss Detection Rate)
e de falsos alarmes (FAR - False Alarm Rate) se comparadas à abordagem Window-Growing. A avaliação dos resultados na etapa de agrupamento dos locutores mostra uma melhora significativa na pureza dos grupos de locutores formados, calculada como o percentual de amostras de um mesmo locutor no grupo, demostrando que os mesmos são mais homogêneos.
|
6 |
NORMAL MIXTURE AND CONTAMINATED MODEL WITH NUISANCE PARAMETER AND APPLICATIONSFan, Qian 01 January 2014 (has links)
This paper intend to find the proper hypothesis and test statistic for testing existence of bilaterally contamination when there exists nuisance parameter. The test statistic is based on method of moments estimators. Union-Intersection test is used for testing if the distribution of population can be implemented by a bilaterally contaminated normal model with unknown variance. This paper also developed a hierarchical normal mixture model (HNM) and applied it to birth weight data. EM algorithm is employed for parameter estimation and a singular Bayesian information criterion (sBIC) is applied to choose the number components. We also proposed a singular flexible information criterion which in addition involves a data-driven penalty.
|
7 |
Risk factor modeling of Hedge Funds' strategies / Risk factor modeling of Hedge Funds' strategiesRadosavčević, Aleksa January 2017 (has links)
This thesis aims to identify main driving market risk factors of different strategies implemented by hedge funds by looking at correlation coefficients, implementing Principal Component Analysis and analyzing "loadings" for first three principal components, which explain the largest portion of the variation of hedge funds' returns. In the next step, a stepwise regression through iteration process includes and excludes market risk factors for each strategy, searching for the combination of risk factors which will offer a model with the best "fit", based on The Akaike Information Criterion - AIC and Bayesian Information Criterion - BIC. Lastly, to avoid counterfeit results and overcome model uncertainty issues a Bayesian Model Average - BMA approach was taken. Key words: Hedge Funds, hedge funds' strategies, market risk, principal component analysis, stepwise regression, Akaike Information Criterion, Bayesian Information Criterion, Bayesian Model Averaging Author's e-mail: aleksaradosavcevic@gmail.com Supervisor's e-mail: mp.princ@seznam.cz
|
8 |
Développement d'une nouvelle technique de pointé automatique pour les données de sismique réfraction / Development of a new adaptive algorithm for automatic picking of seismic refraction dataKhalaf, Amin 15 February 2016 (has links)
Un pointé précis des temps de premières arrivées sismiques joue un rôle important dans de nombreuses études d’imagerie sismique. Un nouvel algorithme adaptif est développé combinant trois approches associant l’utilisation de fenêtres multiples imbriquées, l’estimation des propriétés statistiques d’ordre supérieur et le critère d’information d’Akaike. L’algorithme a l’avantage d’intégrer plusieurs propriétés (l’énergie, la gaussianité, et la stationnarité) dévoilant la présence des premières arrivées. Tandis que les incertitudes de pointés ont, dans certains cas, d’importance équivalente aux pointés eux-mêmes, l’algorithme fournit aussi automatiquement une estimation sur leur incertitudes. La précision et la fiabilité de cet algorithme sont évaluées en comparant les résultats issus de ce dernier avec ceux provenant d’un pointé manuel, ainsi que d’autres pointeurs automatiques. Cet algorithme est simple à mettre en œuvre et ne nécessite pas de grandes performances informatiques. Cependant, la présence de bruit dans les données peut en dégrader la performance. Une double sommation dans le domaine temporel est alors proposée afin d’améliorer la détectabilité des premières arrivées. Ce processus est fondé sur un principe clé : la ressemblance locale entre les traces stackées. Les résultats montrent l’intérêt qu’il y a à appliquer cette sommation avant de réaliser le pointé automatique. / Accurate picking of first arrival times plays an important role in many seismic studies, particularly in seismic tomography and reservoirs or aquifers monitoring. A new adaptive algorithm has been developed based on combining three picking methods (Multi-Nested Windows, Higher Order Statistics and Akaike Information Criterion). It exploits the benefits of integrating three properties (energy, gaussianity, and stationarity), which reveal the presence of first arrivals. Since time uncertainties estimating is of crucial importance for seismic tomography, the developed algorithm provides automatically the associated errors of picked arrival times. The comparison of resulting arrival times with those picked manually, and with other algorithms of automatic picking, demonstrates the reliable performance of this algorithm. It is nearly a parameter-free algorithm, which is straightforward to implement and demands low computational resources. However, high noise level in the seismic records declines the efficiency of the developed algorithm. To improve the signal-to-noise ratio of first arrivals, and thereby to increase their detectability, double stacking in the time domain has been proposed. This approach is based on the key principle of the local similarity of stacked traces. The results demonstrate the feasibility of applying the double stacking before the automatic picking.
|
9 |
Seleção de modelos lineares mistos utilizando critérios de informação / Mixed linear model selection using information criterionYamanouchi, Tatiana Kazue 18 August 2017 (has links)
O modelo misto é comumente utilizado em dados de medidas repetidas devido a sua flexibilidade de incorporar no modelo a correlação existente entre as observações medidas no mesmo indivíduo e a heterogeneidade de variâncias das observações feitas ao longo do tempo. Este modelo é composto de efeitos fixos, efeitos aleatórios e o erro aleatório e com isso na seleção do modelo misto muitas vezes é necessário selecionar os melhores componentes do modelo misto de tal forma que represente bem os dados. Os critérios de informação são ferramentas muito utilizadas na seleção de modelos, mas não há muitos estudos que indiquem como os critérios de informação se desempenham na seleção dos efeitos fixos, efeitos aleatórios e da estrutura de covariância que compõe o erro aleatório. Diante disso, neste trabalho realizou-se um estudo de simulação para avaliar o desempenho dos critérios de informação AIC, BIC e KIC na seleção dos componentes do modelo misto, medido pela taxa TP (Taxa de verdadeiro positivo). De modo geral, os critérios de informação se desempenharam bem, ou seja, tiveram altos valores de taxa TP em situações em que o tamanho da amostra é maior. Na seleção de efeitos fixos e na seleção da estrutura de covariância, em quase todas as situações, o critério BIC teve um desempenho melhor em relação aos critérios AIC e KIC. Na seleção de efeitos aleatórios nenhum critério teve um bom desempenho, exceto na seleção de efeitos aleatórios em que considera a estrutura de simetria composta, situação em que BIC teve o melhor desempenho. / The mixed model is commonly used in data of repeated measurements because of its flexibility to incorporate in the model the correlation existing between the observations measured in the same individual and the heterogeneity of variances of observations made over time. This model is composed of fixed effects, random effects and random error and with this in the selection of the mixed model it is often necessary to select the best components of the mixed model in such a way that it represents the data well. Information criteria are tools widely used in model selection, but there are not many studies that indicate how information criteria play out in the selection of fixed effects, random effects, and the covariance structure that makes up the random error. In this work, a simulation study was performed to evaluate the performance of the AIC, BIC and KIC information criteria in the selection of the components of the mixed model, measured by the TP (True positive Rate). In general, the information criteria performed well, that is, they had high TP rate in situations where the sample size is larger. In the selection of fixed effects and in the selection of the covariance structure, in almost all situations, the BIC criterion had a better performance in relation to the AIC and KIC criteria. In the selection of random effects no criterion had a good performance, except in the selection of Random effects in which it considers the compound symmetric structure, situation in which BIC had the best performance.
|
10 |
Att jämföra ojämlikhet : En jämförande kvantitativ studie kring operationaliseringar i diskursen för inbördeskrigJohansson, Hugo January 2015 (has links)
I forskningsdiskursen rörande inomstatliga konflikter finns en diskrepans mellan kvantitativa och kvalitativa studier huruvida variabeln ekonomisk ojämlikhet har en signifikant effekt på risken för konflikt eller inte. Genom att förflytta tanken från generell ojämlikhet till exploatering argumenterar uppsatsen för att rätt typ av ekonomisk ojämlikhet inte blivit operationaliserat på rätt sätt tidigare. Därefter följer en jämförelse mellan Gini och ett mått från ekonomisk historia med namnet Ineqaulity Extraction Ratio (IER) som enligt uppsatsen är ett mer dynamiskt mått för att mäta exploatering. Undersökningens dataset har två varianter av den beroende variabel för konflikt och uppsatsen finner då att IER har en signifikant effekt i båda variabelvarianter, samt ett klart övertag i en utav dom.
|
Page generated in 0.0511 seconds