Global ETD Search

1	A QUASI-LIKELIHOOD METHOD TO DETECT DIFFERENTIALLY EXPRESSED GENES IN RNA-SEQUENCE DATA Gu, Chu-Shu January 2016 (has links) In recent years, the RNA-sequencing (RNA-seq) method, which measures the transcriptome by counting short sequencing reads obtained by high-throughput sequencing, is replacing the microarray technology as the major platform in gene expression studies. The large amount of discrete data in RNA-seq experiments calls for effective analysis methods. In this dissertation, a new method to detect differentially expressed genes based on quasi-likelihood theory is developed in experiments with a completely randomized design with two experimental conditions. The proposed method estimates the variance function empirically and consequently it has similar sensitivities and FDRs across distributions with different variance functions. In a simulation study, the method is shown to have similar sensitivities and FDRs across the data with three different types of variance functions compared with some other popular methods. This method is applied to a real dataset with two experimental conditions along with some competing methods. The new method is then extended to more complex designs such as an experiment with multiple experimental conditions, an experiment with block design and an experiment with factorial design. The same advantages for the new method have been found in simulation studies. This method and some competing methods are applied to three real datasets with complex designs. The new method is also applied to analyze reads per kilobase per million mapped reads (RPKM) data. In the simulation, the method is compared with the Linear Models for Microarray Data (LIMMA) originally developed for microarray analysis (Smyth, 2004) and the question of normalization is also examined. It is shown that the new method and the LIMMA method have similar performance. Further normalization is required for the proper analysis of the RPKM data and the best such normalization is the scaling method. Analyzing raw count data properly has better performance than analyzing the RPKM data. Different normalization and statistical methods are applied to a real dataset with varied gene length across samples. / Thesis / Doctor of Philosophy (PhD) RNA-seq Quasi-likelihood RPKM
2	Quasi-Likelihood Methoden zur Analyse von unabhängigen und abhängigen Beobachtungen Hatzinger, Reinhold January 1991 (has links) (PDF) Ausgehend vom klassischen linearen Modell werden Regressionsmethoden für Datenstrukturen dargestellt, bei denen die Standardannahmen (Unabhängigkeit, normalverteilte Fehler und konstante Varianz) nicht erfüllt sind. Läßt man die Responsevariable aus einer Exponentialfamilie zu, so erhält man die Klasse generalisierter linearer Modelle (GLM) . Dies erlaubt, den Erwartungswert von verschiedensten stetigen und diskreten Responsevariablen (z .B. Anteile, Häufigkeiten, etc.) über eine fixe Kovariatenstruktur zu modellieren. Hebt man zusatzlich die Notwendigkeit auf, eine Verteilung aus Exponentialfamilien spezifizieren zu müssen, erhält man Quasi-Likelihood Modelle, bei denen nur mehr eine Beziehung zwischen Erwartungswert und Varianz festgelegt werden muß. Die Berücksichtigung einer Korrelationsstruktur führt zu verallgemeinerten Schätzgleichungen, d.h. es können auch Longitudinaldaten ohne besondere Verteilungsannahmen analysiert werden. Ziel der Arbeit ist es, diese Methoden und ihre statistischen Eigenschaften vorzustellen und anhand eines Beispiels (Überdispersion bei wiederholt gemessenen binomialen Anteilen) ihre Bedeutung in der biometrischen Praxis zu illustrieren. (Autorenref.) / Series: Forschungsberichte / Institut für Statistik
3	Incorporating survey weights into logistic regression models Wang, Jie 24 April 2013 (has links) Incorporating survey weights into likelihood-based analysis is a controversial issue because the sampling weights are not simply equal to the reciprocal of selection probabilities but they are adjusted for various characteristics such as age, race, etc. Some adjustments are based on nonresponses as well. This adjustment is accomplished using a combination of probability calculations. When we build a logistic regression model to predict categorical outcomes with survey data, the sampling weights should be considered if the sampling design does not give each individual an equal chance of being selected in the sample. We rescale these weights to sum to an equivalent sample size because the variance is too small with the original weights. These new weights are called the adjusted weights. The old method is to apply quasi-likelihood maximization to make estimation with the adjusted weights. We develop a new method based on the correct likelihood for logistic regression to include the adjusted weights. In the new method, the adjusted weights are further used to adjust for both covariates and intercepts. We explore the differences and similarities between the quasi-likelihood and the correct likelihood methods. We use both binary logistic regression model and multinomial logistic regression model to estimate parameters and apply the methods to body mass index data from the Third National Health and Nutrition Examination Survey. The results show some similarities and differences between the old and new methods in parameter estimates, standard errors and statistical p-values. Quasi-likelihood Adjusted weights Multinomial logistic regression Binary logistic regression Sampling weights
4	Modelos para dados de contagem com superdispersão: uma aplicação em um experimento agronômico / Models for count data with overdispersion: application in an agronomic experiment Batista, Douglas Toledo 26 June 2015 (has links) O modelo de referência para dados de contagem é o modelo de Poisson. A principal característica do modelo de Poisson é a pressuposição de que a média e a variância são iguais. No entanto, essa relação de média-variância nem sempre ocorre em dados observacionais. Muitas vezes, a variância observada nos dados é maior do que a variância esperada, fenômeno este conhecido como superdispersão. O objetivo deste trabalho constitui-se na aplicação de modelos lineares generalizados, a fim de selecionar um modelo adequado para acomodar de forma satisfatória a superdispersão presente em dados de contagem. Os dados provêm de um experimento que objetivava avaliar e caracterizar os parâmetros envolvidos no florescimento de plantas adultas da laranjeira variedade \"x11\", enxertadas nos limoeiros das variedades \"Cravo\" e \"Swingle\". Primeiramente ajustou-se o modelo de Poisson com função de ligação canônica. Por meio da deviance, estatística X2 de Pearson e do gráfico half-normal plot observou-se forte evidência de superdispersão. Utilizou-se, então, como modelos alternativos ao Poisson, os modelos Binomial Negativo e Quase-Poisson. Verificou que o modelo Quase-Poisson foi o que melhor se ajustou aos dados, permitindo fazer inferências mais precisas e interpretações práticas para os parâmetros do modelo. / The reference model for count data is the Poisson model. The main feature of Poisson model is the assumption that mean and variance are equal. However, this mean-variance relationship rarely occurs in observational data. Often, the observed variance is greater than the expected variance, a phenomenon known as overdispersion. The aim of this work is the application of generalized linear models, in order to select an appropriated model to satisfactorily accommodate the overdispersion present in the data. The data come from an experiment that aimed to evaluate and characterize the parameters involved in the flowering of orange adult plants of the variety \"x11\" grafted on \"Cravo\" and \"Swingle\". First, the data were submitted to adjust by Poisson model with canonical link function. Using deviance, generalized Pearson chi-squared statistic and half-normal plots, it was possible to notice strong evidence of overdispersion. Thus, alternative models to Poisson were used such as the negative binomial and Quasi-Poisson models. The Quasi-Poisson model presented the best fit to the data, allowing more accurate inferences and practices interpretations for the parameters. Adjustment measures Dados discretos Discrete data GLM HNP HNP Medidas de ajuste MLG Quase-verossimilhança Quasi-likelihood
5	Information Matrices in Estimating Function Approach: Tests for Model Misspecification and Model Selection Zhou, Qian January 2009 (has links) Estimating functions have been widely used for parameter estimation in various statistical problems. Regular estimating functions produce parameter estimators which have desirable properties, such as consistency and asymptotic normality. In quasi-likelihood inference, an important example of estimating functions, correct specification of the first two moments of the underlying distribution leads to the information unbiasedness, which states that two forms of the information matrix: the negative sensitivity matrix (negative expectation of the first order derivative of an estimating function) and the variability matrix (variance of an estimating function) are equal, or in other words, the analogue of the Fisher information is equivalent to the Godambe information. Consequently, the information unbiasedness indicates that the model-based covariance matrix estimator and sandwich covariance matrix estimator are equivalent. By comparing the model-based and sandwich variance estimators, we propose information ratio (IR) statistics for testing model misspecification of variance/covariance structure under correctly specified mean structure, in the context of linear regression models, generalized linear regression models and generalized estimating equations. Asymptotic properties of the IR statistics are discussed. In addition, through intensive simulation studies, we show that the IR statistics are powerful in various applications: test for heteroscedasticity in linear regression models, test for overdispersion in count data, and test for misspecified variance function and/or misspecified working correlation structure. Moreover, the IR statistics appear more powerful than the classical information matrix test proposed by White (1982). In the literature, model selection criteria have been intensively discussed, but almost all of them target choosing the optimal mean structure. In this thesis, two model selection procedures are proposed for selecting the optimal variance/covariance structure among a collection of candidate structures. One is based on a sequence of the IR tests for all the competing variance/covariance structures. The other is based on an ``information discrepancy criterion" (IDC), which provides a measurement of discrepancy between the negative sensitivity matrix and the variability matrix. In fact, this IDC characterizes the relative efficiency loss when using a certain candidate variance/covariance structure, compared with the true but unknown structure. Through simulation studies and analyses of two data sets, it is shown that the two proposed model selection methods both have a high rate of detecting the true/optimal variance/covariance structure. In particular, since the IDC magnifies the difference among the competing structures, it is highly sensitive to detect the most appropriate variance/covariance structure. Estimating functions Information unbiasedness Quasi-likelihood inference Model misspecfication Model selection Statistics (Biostatistics)
6	Information Matrices in Estimating Function Approach: Tests for Model Misspecification and Model Selection Zhou, Qian January 2009 (has links) Estimating functions have been widely used for parameter estimation in various statistical problems. Regular estimating functions produce parameter estimators which have desirable properties, such as consistency and asymptotic normality. In quasi-likelihood inference, an important example of estimating functions, correct specification of the first two moments of the underlying distribution leads to the information unbiasedness, which states that two forms of the information matrix: the negative sensitivity matrix (negative expectation of the first order derivative of an estimating function) and the variability matrix (variance of an estimating function) are equal, or in other words, the analogue of the Fisher information is equivalent to the Godambe information. Consequently, the information unbiasedness indicates that the model-based covariance matrix estimator and sandwich covariance matrix estimator are equivalent. By comparing the model-based and sandwich variance estimators, we propose information ratio (IR) statistics for testing model misspecification of variance/covariance structure under correctly specified mean structure, in the context of linear regression models, generalized linear regression models and generalized estimating equations. Asymptotic properties of the IR statistics are discussed. In addition, through intensive simulation studies, we show that the IR statistics are powerful in various applications: test for heteroscedasticity in linear regression models, test for overdispersion in count data, and test for misspecified variance function and/or misspecified working correlation structure. Moreover, the IR statistics appear more powerful than the classical information matrix test proposed by White (1982). In the literature, model selection criteria have been intensively discussed, but almost all of them target choosing the optimal mean structure. In this thesis, two model selection procedures are proposed for selecting the optimal variance/covariance structure among a collection of candidate structures. One is based on a sequence of the IR tests for all the competing variance/covariance structures. The other is based on an ``information discrepancy criterion" (IDC), which provides a measurement of discrepancy between the negative sensitivity matrix and the variability matrix. In fact, this IDC characterizes the relative efficiency loss when using a certain candidate variance/covariance structure, compared with the true but unknown structure. Through simulation studies and analyses of two data sets, it is shown that the two proposed model selection methods both have a high rate of detecting the true/optimal variance/covariance structure. In particular, since the IDC magnifies the difference among the competing structures, it is highly sensitive to detect the most appropriate variance/covariance structure. Estimating functions Information unbiasedness Quasi-likelihood inference Model misspecfication Model selection Statistics (Biostatistics)
7	Modelos para dados de contagem com superdispersão: uma aplicação em um experimento agronômico / Models for count data with overdispersion: application in an agronomic experiment Douglas Toledo Batista 26 June 2015 (has links) O modelo de referência para dados de contagem é o modelo de Poisson. A principal característica do modelo de Poisson é a pressuposição de que a média e a variância são iguais. No entanto, essa relação de média-variância nem sempre ocorre em dados observacionais. Muitas vezes, a variância observada nos dados é maior do que a variância esperada, fenômeno este conhecido como superdispersão. O objetivo deste trabalho constitui-se na aplicação de modelos lineares generalizados, a fim de selecionar um modelo adequado para acomodar de forma satisfatória a superdispersão presente em dados de contagem. Os dados provêm de um experimento que objetivava avaliar e caracterizar os parâmetros envolvidos no florescimento de plantas adultas da laranjeira variedade \"x11\", enxertadas nos limoeiros das variedades \"Cravo\" e \"Swingle\". Primeiramente ajustou-se o modelo de Poisson com função de ligação canônica. Por meio da deviance, estatística X2 de Pearson e do gráfico half-normal plot observou-se forte evidência de superdispersão. Utilizou-se, então, como modelos alternativos ao Poisson, os modelos Binomial Negativo e Quase-Poisson. Verificou que o modelo Quase-Poisson foi o que melhor se ajustou aos dados, permitindo fazer inferências mais precisas e interpretações práticas para os parâmetros do modelo. / The reference model for count data is the Poisson model. The main feature of Poisson model is the assumption that mean and variance are equal. However, this mean-variance relationship rarely occurs in observational data. Often, the observed variance is greater than the expected variance, a phenomenon known as overdispersion. The aim of this work is the application of generalized linear models, in order to select an appropriated model to satisfactorily accommodate the overdispersion present in the data. The data come from an experiment that aimed to evaluate and characterize the parameters involved in the flowering of orange adult plants of the variety \"x11\" grafted on \"Cravo\" and \"Swingle\". First, the data were submitted to adjust by Poisson model with canonical link function. Using deviance, generalized Pearson chi-squared statistic and half-normal plots, it was possible to notice strong evidence of overdispersion. Thus, alternative models to Poisson were used such as the negative binomial and Quasi-Poisson models. The Quasi-Poisson model presented the best fit to the data, allowing more accurate inferences and practices interpretations for the parameters. Dados discretos HNP Medidas de ajuste MLG Quase-verossimilhança Adjustment measures Discrete data GLM HNP Quasi-likelihood
8	Bootstrap-adjusted Quasi-likelihood Information Criteria for Mixed Model Selection Ge, Wentao 21 August 2019 (has links) No description available. Statistics AIC QIC K-L Discrepancy Quasi-likelihood Bootstrap GEE Longitudinal Data
9	Performances of different estimation methods for generalized linear mixed models. Biswas, Keya 08 May 2015 (has links) Generalized linear mixed models (GLMMs) have become extremely popular in recent years. The main computational problem in parameter estimation for GLMMs is that, in contrast to linear mixed models, closed analytical expressions for the likelihood are not available. To overcome this problem, several approaches have been proposed in the literature. For this study we have used one quasi-likelihood approach, penalized quasi-likelihood (PQL), and two integral approaches: Laplace and adaptive Gauss-Hermite quadrature (AGHQ) approximation. Our primary objective was to measure the performances of each estimation method. AGHQ approximation is more accurate than Laplace approximation, but slower. So the question is when Laplace approximation is adequate, versus when AGHQ approximation provides a significantly more accurate result. We have run two simulations using PQL, Laplace and AGHQ approximations with different quadrature points for varying random effect standard deviation (Ɵ) and number of replications per cluster. The performances of these three methods were measured base on the root mean square error (RMSE) and bias. Based on the simulated data, we have found that for both smaller values of Ɵ and small number of replications and for larger values of and for larger values of Ɵ and lager number of replications, the RMSE of PQL method is much higher than Laplace and AGHQ approximations. However, for intermediate values of Ɵ (random effect standard deviation) ranging from 0.63 to 3.98, regardless of number of replications per cluster, both Laplace and AGHQ approximations gave similar estimates. But when both number of replications and Ɵ became small, increasing quadrature points increases RMSE values indicating that Laplace approximation perform better than the AGHQ method. When random effect standard deviation is large, e.g. Ɵ=10, and number of replications is small the Laplace RMSE value is larger than that of AGHQ approximation. Increasing quadrature points decreases the RMSE values. This indicates that AGHQ performs better in this situation. The difference in RMSE between PQL vs Laplace and AGHQ vs Laplace is approximately 12% and 10% respectively. In addition, we have tested the relative performance and the accuracy between two different packages of R (lme4, glmmML) and SAS (PROC GLIMMIX) based on real data. Our results suggested that all of them perform well in terms of accuracy, precision and convergence rates. In most cases, glmmML was found to be much faster than lme4 package and SAS. The only difference was found in the Contraception data where the required computational time for both R packages was exactly the same. The difference in required computational times for these two platforms decreases as the number of quadrature points increases. / Thesis / Master of Science (MSc)
10	Analysis of the Total Food Folate Intake Data from the National Health and Nutrition Exa-amination Survey (Nhanes) Using Generalized Linear Model Lee, Kyung Ah 01 December 2009 (has links) The National health and nutrition examination survey (NHANES) is a respected nation-wide program in charge of assessing the health and nutritional status of adults and children in United States. Recent cal research found that folic acid play an important role in preventing baby birth defects. In this paper, we use the generalized estimating equation (GEE) method to study the generalized linear model (GLM) with compound symmetric correlation matrix for the NHANES data and investigate significant factors to ence the intake of food folic acid. Intraclass correlation coefficients Quasi-likelihood method Generalized estimating equa-tion Generalized linear model Mathematics

Search results