Global ETD Search

21	Semiparametric Methods for the Generalized Linear Model Chen, Jinsong 01 July 2010 (has links) The generalized linear model (GLM) is a popular model in many research areas. In the GLM, each outcome of the dependent variable is assumed to be generated from a particular distribution function in the exponential family. The mean of the distribution depends on the independent variables. The link function provides the relationship between the linear predictor and the mean of the distribution function. In this dissertation, two semiparametric extensions of the GLM will be developed. In the first part of this dissertation, we have proposed a new model, called a semiparametric generalized linear model with a log-concave random component (SGLM-L). In this model, the estimate of the distribution of the random component has a nonparametric form while the estimate of the systematic part has a parametric form. In the second part of this dissertation, we have proposed a model, called a generalized semiparametric single-index mixed model (GSSIMM). A nonparametric component with a single index is incorporated into the mean function in the generalized linear mixed model (GLMM) assuming that the random component is following a parametric distribution. In the first part of this dissertation, since most of the literature on the GLM deals with the parametric random component, we relax the parametric distribution assumption for the random component of the GLM and impose a log-concave constraint on the distribution. An iterative numerical algorithm for computing the estimators in the SGLM-L is developed. We construct a log-likelihood ratio test for inference. In the second part of this dissertation, we use a single index model to generalize the GLMM to have a linear combination of covariates enter the model via a nonparametric mean function, because the linear model in the GLMM is not complex enough to capture the underlying relationship between the response and its associated covariates. The marginal likelihood is approximated using the Laplace method. A penalized quasi-likelihood approach is proposed to estimate the nonparametric function and parameters including single-index coe±cients in the GSSIMM. We estimate variance components using marginal quasi-likelihood. Asymptotic properties of the estimators are developed using a similar idea by Yu (2008). A simulation example is carried out to compare the performance of the GSSIMM with that of the GLMM. We demonstrate the advantage of my approach using a study of the association between daily air pollutants and daily mortality adjusted for temperature and wind speed in various counties of North Carolina. / Ph. D. Penalized splines Generalized linear mixed model Generalized linear model Single-Index Model
22	Statistical Methods for Non-Linear Proﬁle Monitoring Quevedo Candela, Ana Valeria 02 January 2020 (has links) We have seen an increased interest and extensive research in the monitoring of a process over time whose characteristics are represented mathematically in functional forms such as profiles. Most of the current techniques require all of the data for each profile to determine the state of the process. Thus, quality engineers from industrial processes such as agricultural, aquacultural, and chemical cannot make process corrections to the current profile that are essential for correcting their processes at an early stage. In addition, the focus of most of the current techniques is on the statistical significance of the parameters or features of the model instead of the practical significance, which often relates to the actual quality characteristic. The goal of this research is to provide alternatives to address these two main concerns. First, we study the use of a Shewhart type control chart to monitor within profiles, where the central line is the predictive mean profile and the control limits are formed based on the prediction band. Second, we study a statistic based on a non-linear mixed model recognizing that the model leads to correlations among the estimated parameters. / Doctor of Philosophy / Checking the stability over time of the quality of a process which is best expressed by a relationship between a quality characteristic and other variables involved in the process has received increasing attention. The goal of this research is to provide alternative methods to determine the state of such a process. Both methods presented here are compared to the current methodologies. The first method will allow us to monitor a process while the data is still being collected. The second one is based on the quality characteristic of the process and takes full advantage of the model structure. Both methods seem to be more robust than the current most well-known method. non-linear profile monitoring non-linear mixed model practical significance Gaussian process model heteroscedasticity
23	Longitudinal data analysis with covariates measurement error Hoque, Md. Erfanul 05 January 2017 (has links) Longitudinal data occur frequently in medical studies and covariates measured by error are typical features of such data. Generalized linear mixed models (GLMMs) are commonly used to analyse longitudinal data. It is typically assumed that the random effects covariance matrix is constant across the subject (and among subjects) in these models. In many situations, however, this correlation structure may differ among subjects and ignoring this heterogeneity can cause the biased estimates of model parameters. In this thesis, following Lee et al. (2012), we propose an approach to properly model the random effects covariance matrix based on covariates in the class of GLMMs where we also have covariates measured by error. The resulting parameters from this decomposition have a sensible interpretation and can easily be modelled without the concern of positive definiteness of the resulting estimator. The performance of the proposed approach is evaluated through simulation studies which show that the proposed method performs very well in terms biases and mean square errors as well as coverage rates. The proposed method is also analysed using a data from Manitoba Follow-up Study. / February 2017 Cholesky decomposition Longitudinal data Measurement error Random effects Generalized Linear Mixed Model
24	Assessing variance components of multilevel models pregnancy data Letsoalo, Marothi Peter January 2019 (has links) Thesis (M. Sc. (Statistics) / Most social and health science data are longitudinal and additionally multilevel in nature, which means that response data are grouped by attributes of some cluster. Ignoring the diﬀerences and similarities generated by these clusters results to misleading estimates, hence motivating for a need to assess variance components (VCs) using multilevel models (MLMs) or generalised linear mixed models (GLMMs). This study has explored and ﬁtted teenage pregnancy census data that were gathered from 2011 to 2015 by the Africa Centre at Kwa-Zulu Natal, South Africa. The exploration of these data revealed a two level pure hierarchy data structure of teenage pregnancy status for some years nested within female teenagers. To ﬁt these data, the eﬀects that census year (year) and three female characteristics (namely age (age), number of household membership (idhhms), number of children before observation year (nch) have on teenage pregnancy were examined. Model building of this work, ﬁrstly, ﬁtted a logit gen eralised linear model (GLM) under the assumption that teenage pregnancy measurements are independent between females and secondly, ﬁtted a GLMM or MLM of female random eﬀect. A better ﬁt GLMM indicated, for an additional year on year, a 0.203 decrease on the log odds of teenage pregnancy while GLM suggested a 0.21 decrease and 0.557 increase for each additional year on age and year, respectively. A GLM with only year eﬀect uncovered a ﬁxed estimate which is higher, by 0.04, than that of a better ﬁt GLMM. The inconsistency in the eﬀect of year was caused by a signiﬁcant female cluster variance of approximately 0.35 that was used to compute the VCs. Given the eﬀect of year, the VCs suggested that 9.5% of the diﬀerences in teenage pregnancy lies between females while 0.095 similarities (scale from 0 to 1) are for the same female. It was also revealed that year does not vary within females. Apart from the small diﬀerences between observed estimates of the ﬁtted GLM and GLMM, this work produced evidence that accounting for cluster eﬀect improves accuracy of estimates. Keywords: Multilevel Model, Generalised Linear Mixed Model, Variance Components, Hier archical Data Structure, Social Science Data, Teenage Pregnancy Multilevel model Generalised Linear Mixed Model Variance Components Hier archical Data Structure Social Science Data Teenage pregnancy Statistics
25	A Hierarchical Spherical Radial Quadrature Algorithm for Multilevel GLMMS, GSMMS, and Gene Pathway Analysis Gagnon, Jacob A. 01 September 2010 (has links) The first part of my thesis is concerned with estimation for longitudinal data using generalized semi-parametric mixed models and multilevel generalized linear mixed models for a binary response. Likelihood based inferences are hindered by the lack of a closed form representation. Consequently, various integration approaches have been proposed. We propose a spherical radial integration based approach that takes advantage of the hierarchical structure of the data, which we call the 2 SR method. Compared to Pinheiro and Chao's multilevel Adaptive Gaussian quadrature, our proposed method has an improved time complexity with the number of functional evaluations scaling linearly in the number of subjects and in the dimension of random effects per level. Simulation studies show that our approach has similar to better accuracy compared to Gauss Hermite Quadrature (GHQ) and has better accuracy compared to PQL especially in the variance components. The second part of my thesis is concerned with identifying differentially expressed gene pathways/gene sets. We propose a logistic kernel machine to model the gene pathway effect with a binary response. Kernel machines were chosen since they account for gene interactions and clinical covariates. Furthermore, we established a connection between our logistic kernel machine with GLMMs allowing us to use ideas from the GLMM literature. For estimation and testing, we adopted Clarkson's spherical radial approach to perform the high dimensional integrations. For estimation, our performance in simulation studies is comparable to better than Bayesian approaches at a much lower computational cost. As for testing of the genetic pathway effect, our REML likelihood ratio test has increased power compared to a score test for simulated non-linear pathways. Additionally, our approach has three main advantages over previous methodologies: 1) our testing approach is self-contained rather than competitive, 2) our kernel machine approach can model complex pathway effects and gene-gene interactions, and 3) we test for the pathway effect adjusting for clinical covariates. Motivation for our work is the analysis of an Acute Lymphocytic Leukemia data set where we test for the genetic pathway effect and provide confidence intervals for the fixed effects. Gauss Hermite Quadrature Generalized Linear Mixed Model Generalized Semiparametric Mixed Model multilevel Spherical Radial splines Mathematics Statistics and Probability
26	Bayesian Hierarchical Latent Model for Gene Set Analysis Chao, Yi 13 May 2009 (has links) Pathway is a set of genes which are predefined and serve a particular celluar or physiological function. Ranking pathways relevant to a particular phenotype can help researchers focus on a few sets of genes in pathways. In this thesis, a Bayesian hierarchical latent model was proposed using generalized linear random effects model. The advantage of the approach was that it can easily incorporate prior knowledges when the sample size was small and the number of genes was large. For the covariance matrix of a set of random variables, two Gaussian random processes were considered to construct the dependencies among genes in a pathway. One was based on the polynomial kernel and the other was based on the Gaussian kernel. Then these two kernels were compared with constant covariance matrix of the random effect by using the ratio, which was based on the joint posterior distribution with respect to each model. For mixture models, log-likelihood values were computed at different values of the mixture proportion, compared among mixtures of selected kernels and point-mass density (or constant covariance matrix). The approach was applied to a data set (Mootha et al., 2003) containing the expression profiles of type II diabetes where the motivation was to identify pathways that can discriminate between normal patients and patients with type II diabetes. / Master of Science Pathway based analysis Point-mass density Probit regression model Bayesian hierarchical model Latent variable Generalized linear mixed model
27	Structure and restoration of natural secondary forests in the Central Highlands, Vietnam Bui, Manh Hung 15 December 2016 (has links) (PDF) Introduction and objectives In Vietnam, the forest resources have been declining and degrading severely in recent years. The degradation has decreased the natural forest area, changed the forest structure seriously and reduced timber volume and biodiversity. From 1999 to 2005, the rich forest area has decreased 10.2%, whereas the poor secondary forest has increased dramatically by 20.7%. Forest structure plays an important role in forestry research. Understanding forest structure will unlock an understanding of the history, function and future of a forest ecosystem (Spies, 1998). The forest structure is an excellent basis for restoration measures. Therefore, this research is necessary to contribute to improving forest area and quality, reducing difficulties in forest management. The study also enhances the grasp of forest structure, structure changes after harvesting and fills serious gaps in knowledge. In addition, the research results will contribute to improving and rescuing the poor secondary forest and restoring it, approaching the old-growth forest in Vietnam. Material and methods The study was conducted in Kon Ka Kinh national park. The park is located in the Northeastern region of Gia Lai province, 50 km from Pleiku city center to the Northeast. The park is distributed over seven different communes in three districts: K’Bang, Mang Yang and Đăk Đoa. Data were collected from 10 plots of secondary forests (Type IIb) and 10 plots of primeval forests (Type IV). Stratified random sampling was applied to select plot locations. 1 ha plots were used to investigate gaps. 2000 m2 plots were used to measure overstorey trees such as diameter at breast height, total height, crown width and species names. 500 m2 subplots were used to record tree positions. For regeneration, 25 systematic 4 m2 subplots were established inside 1 ha plots. After data were collected in the field, data analyses were conducted by using R and Excel. Firstly, some stand information, such as density, volume and so on, was calculated, and then descriptive statistics were computed for diameter and height variables. Linear mixed effect models were applied to analyze the difference of diameter and height and to check the effect of random factor between the two forest types. Diameter and height frequency distributions were also generated and compared by using permutational analysis of variance (PERMANOVA). Non-linear regression models were analyzed for diameter and height variables. Similar analyses were implemented for gaps. Regarding spatial point patterns of overstorey trees, replicated point pattern analysis techniques were applied in this research. For biodiversity, some calculations were run such as richness and biodiversity indices, comparison of biodiversity indices by using linear mixed models and biodiversity differences between two forest types tested again by permutational analysis of variance. In terms of regeneration, some analyses were implemented such as: height frequency distribution generation, frequency difference testing, biodiversity indices for the regeneration and spatial distribution checking by using a nonrandomness index. Results and discussion After analyzing the data, some essential findings were obtained as follows: Hypothesis H1 “The overstorey structure of secondary forests is more homogeneous and uniform than old-growth forests” is accepted. In other words, the secondary forest density is about 1.8 times higher than the jungle. However, the volume is only 0.56 times as large. The average diameter and height of the secondary forest is smaller by 5.71 cm and 3.73 m than the old-growth forest, respectively. Linear mixed effect model results indicate that this difference is statistically different and the effect of the random factor (Section) is not important. Type IIb has many small trees and the diameter frequency distribution is quite homogeneous. The old-growth forest has more big trees. For both forest stages, the height frequency distribution is positively skewed. PERMANOVA results illustrate that the frequency distribution is statistically different between the two forest types. Regression functions are also more variant and diverse in the old-growth forest, because all standard deviations of the parameters are greater there. Gap analysis results indicate that the number of gaps in the young forest is slightly higher, while the average gap size is much smaller. The gap frequency distribution is statistically different between the two types. In terms of the spatial point pattern of overlayer trees, the G-test and the pair correlation function results show that trees distribute randomly in the secondary forest. In contrast, the spatial point patterns of trees are more regular and diverse in the old-growth forest. The spatial point pattern difference is not significant, and this is proved by a permutational t-test for pair correlation function (pcf). Envelope function results indicate that the variation of pcf in young forests is much lower than in the primary forests. Hypothesis H2 “The overstorey species biodiversity of the secondary forest is less than in the old-growth forest” is rejected. Results show that the number of species of the secondary forest is much greater than in the old-growth forest, especially richness. The richness of the secondary forest is 1.16 times higher. The Simpson and Shannon indices are slightly smaller in the secondary forest. The average Simpson index for both forest stages is 0.898 and 0.920, respectively. However, the difference is not significant. Species accumulation curves become relatively flatter on the right, meaning a reasonable number of plots have been observed. Estimated number of species from accumulation curves in two forest types are 105 and 95/ha. PERMANOVA results show that number of species and proportion of individuals in each species are significantly different between forest types. Hypothesis H3 “The number regenerating species of the secondary forest is less and they distribute more regularly, compared to the old-growth forest” is rejected. There are both similarities and differences between the two types. The regeneration density of the stage IIb is 22,930 seedlings/ha, greater than the old forest by 9,030 seedlings. The height frequency distribution shows a decreasing trend. Similar to overstorey, the richness of the secondary forest is 141 species, higher than the old-growth forest by 9 species. Biodiversity indices are not statistically different between two types. PERMANOVA results indicate that the number of species and the proportion of individuals for each species are also not significantly different from observed forest types. Nonrandomness index results show that the regeneration distributes regularly. Up to 95% of the plots reflect this distribution trend. Hypothesis H4 “Restoration measures (with and without human intervention) could be implemented in the regenerating forest” is accepted. The investigated results show that the secondary forest still has mother trees, and it has enough seedlings to restore. Therefore, restoration solutions with and without human intervention can be implemented. Firstly, forest protection should be applied. This measure is relevant to national park regulations in Vietnam. Rangers and other related organizations will be responsible for carrying out protection activities. These activities will protect forest resources from illegal logging, grazing and tourist activities. Environmental education and awareness-raising activities for indigenous people is also important. Another measure is additional and enrichment planting. It should focus on exclusive species of the overstorey in Type IIb or exclusive species of the primary forest. Selection of these species will lead to species biodiversity increase in the future. This also meets the purpose of the maximum biodiversity solution. Conclusion Forest resources play a very important role in human life as well as maintaining the sustainability of ecosystems. However, at present, they are under serious threat, particularly in Vietnam. Central Highland, Vietnam, where forest resources are still relatively good, is also threatened by illegal logging, lack of knowledge of people and so on. Therefore, it needs the hands of the people, especially foresters and researchers. Through research, scientists can provide the knowledge and understanding of the forest, including the structure and forest restoration. This study has obtained important findings. The secondary forest is more homogeneous and uniform, while the old-growth forest is very diverse. Biodiversity of the overstorey in the secondary forest is more than the primary. The number of regenerating species in the secondary forest is higher, but other indices are not statistically different between two types. The regeneration distribute regularly on the ground. The secondary forest still has mother trees and sufficient regeneration, so some restoration measures can be applied here. Findings of the study contribute to improve people’s understanding of the structure and the structural changes after harvesting in Kon Ka Kinh national park, Gia Lai. That is a key to have better understandings of the history and values of the forests. These findings and the proposed restoration measures address rescuing degraded forests in Central Highland in particular and Vietnam in general. And further, this is a promising basis for the management and sustainable use of forest resources in the future. Structure Restoration Tropical forest Linear mixed model Replicated point pattern analysis Spatial distribution Gaps Kon Ka Kinh Vietnam Kon Ka Kinh ddc:630 rvk:ZC 73564
28	Metody výpočtu maximálně věrohodných odhadů v zobecněném lineárním smíšeném modelu / Computational Methods for Maximum Likelihood Estimation in Generalized Linear Mixed Models Otava, Martin January 2011 (has links) of the diploma thesis Title: Computational Methods for Maximum Likelihood Estimation in Generalized Linear Mixed Models Author: Bc. Martin Otava Department: Department of Probability and Mathematical Statistics Supervisor: RNDr. Arnošt Komárek, Ph.D., Department of Probability and Mathematical Statistics Abstract: Using maximum likelihood method for generalized linear mixed models, the analytically unsolvable problem of maximization can occur. As solution, iterative and ap- proximate methods are used. The latter ones are core of the thesis. Detailed and general introducing of the widely used methods is emphasized with algorithms useful in practical cases. Also the case of non-gaussian random effects is discussed. The approximate methods are demonstrated using the real data sets. Conclusions about bias and consistency are supported by the simulation study. Keywords: generalized linear mixed model, penalized quasi-likelihood, adaptive Gauss- Hermite quadrature 1
29	Flexible models for hierarchical and overdispersed data in agriculture / Modelos flexíveis para dados hierárquicos e superdispersos na agricultura Sercundes, Ricardo Klein 29 March 2018 (has links) In this work we explored and proposed flexible models to analyze hierarchical and overdispersed data in agriculture. A semi-parametric generalized linear mixed model was applied and compared with the main standard models to assess count data and, a combined model that take into account overdispersion and clustering through two separate sets of random effects was proposed to model nominal outcomes. For all models, the computational codes were implemented using the SAS software and are available in the appendix. / Nesse trabalho, exploramos e propusemos modelos flexíveis para a análise de dados hierárquicos e superdispersos na agricultura. Um modelo linear generalizado semi- paramétrico misto foi aplicado e comparado com os principais modelos para a análise de dados de contagem e, um modelo combinado que leva em consideração a superdispersão e a hierarquia dos dados por meio de dois efeitos aleatórios distintos foi proposto para a análise de dados nominais. Todos os códigos computacionais foram implementados no software SAS sendo disponibilizados no apêndice. B-spline B-spline Beta distribution Combined model Distribuição beta Distribuição multinomial Generalized linear mixed model Likelihood Modelo combinado Modelo linear generalizado misto Multinomial distribution Verossimilhança
30	Novel Statistical Methods in Quantitative Genetics : Modeling Genetic Variance for Quantitative Trait Loci Mapping and Genomic Evaluation Shen, Xia January 2012 (has links) This thesis develops and evaluates statistical methods for different types of genetic analyses, including quantitative trait loci (QTL) analysis, genome-wide association study (GWAS), and genomic evaluation. The main contribution of the thesis is to provide novel insights in modeling genetic variance, especially via random effects models. In variance component QTL analysis, a full likelihood model accounting for uncertainty in the identity-by-descent (IBD) matrix was developed. It was found to be able to correctly adjust the bias in genetic variance component estimation and gain power in QTL mapping in terms of precision. Double hierarchical generalized linear models, and a non-iterative simplified version, were implemented and applied to fit data of an entire genome. These whole genome models were shown to have good performance in both QTL mapping and genomic prediction. A re-analysis of a publicly available GWAS data set identified significant loci in Arabidopsis that control phenotypic variance instead of mean, which validated the idea of variance-controlling genes. The works in the thesis are accompanied by R packages available online, including a general statistical tool for fitting random effects models (hglm), an efficient generalized ridge regression for high-dimensional data (bigRR), a double-layer mixed model for genomic data analysis (iQTL), a stochastic IBD matrix calculator (MCIBD), a computational interface for QTL mapping (qtl.outbred), and a GWAS analysis tool for mapping variance-controlling loci (vGWAS). statistical genetics quantitative trait loci genome-wide association study genomic selection genetic variance hierarchical generalized linear model linear mixed model random effect heteroscedastic effects model variance-controlling genes

Search results