Global ETD Search

31	Genetic association of high-dimensional traits Meyer, Hannah Verena January 2018 (has links) Over the past ten years, more than 4,000 genome-wide association studies (GWAS) have helped to shed light on the genetic architecture of complex traits and diseases. In recent years, phenotyping of the samples has often gone beyond single traits and it has become common to record multi- to high-dimensional phenotypes for individu- als. Whilst these rich datasets offer the potential to analyse complex trait structures and pleiotropic effects at a genome-wide level, novel analytic challenges arise. This thesis summarises my research into genetic associations for high-dimensional phen- otype data. First, I developed a novel and computationally efficient approach for multivari- ate analysis of high-dimensional phenotypes based on linear mixed models, com- bined with bootstrapping (LiMMBo). Both in simulation studies and on real data, I demonstrate the statistical validity of LiMMBo and that it can scale to hundreds of phenotypes. I show the gain in power of multivariate analyses for high-dimensional phenotypes compared to univariate approaches, and illustrate that LiMMBo allows for detecting pleiotropy in a large number of phenotypic traits. Aside from their computational challenges in GWAS, the true dimensionality of very high-dimensional phenotypes is often unknown and lies hidden in high-dimen- sional space. Retaining maximum power for association studies of such phenotype data relies on using an appropriate phenotype representation. I systematically ana- lysed twelve unsupervised dimensionality reduction methods based on their per- formance in finding a robust phenotype representation in simulated data of different structure and size. I propose a stability criteria for choosing low-dimensional phen- otype representations and demonstrate that stable phenotypes can recover genetic associations. Finally, I analysed genetic variants for associations to high-dimensional cardiac phenotypes based on MRI data from 1,500 healthy individuals. I used an unsuper- vised approach to extract a low-dimensional representation of cardiac wall thickness and conducted a GWAS on this representation. In addition, I investigated genetic associations to a trabeculation phenotype generated from a supervised feature ex- traction approach on the cardiac MRI data. In summary, this thesis highlights and overcomes some of the challenges in per- forming genetic association studies on high-dimensional phenotypes. It describes new approaches for phenotype processing, and genotype to phenotype mapping for high-dimensional datasets, as well as providing new insights in the genetic structure of cardiac morphology in humans.
32	Semiconductor Yield Modeling Using Generalized Linear Models January 2011 (has links) abstract: Yield is a key process performance characteristic in the capital-intensive semiconductor fabrication process. In an industry where machines cost millions of dollars and cycle times are a number of months, predicting and optimizing yield are critical to process improvement, customer satisfaction, and financial success. Semiconductor yield modeling is essential to identifying processing issues, improving quality, and meeting customer demand in the industry. However, the complicated fabrication process, the massive amount of data collected, and the number of models available make yield modeling a complex and challenging task. This work presents modeling strategies to forecast yield using generalized linear models (GLMs) based on defect metrology data. The research is divided into three main parts. First, the data integration and aggregation necessary for model building are described, and GLMs are constructed for yield forecasting. This technique yields results at both the die and the wafer levels, outperforms existing models found in the literature based on prediction errors, and identifies significant factors that can drive process improvement. This method also allows the nested structure of the process to be considered in the model, improving predictive capabilities and violating fewer assumptions. To account for the random sampling typically used in fabrication, the work is extended by using generalized linear mixed models (GLMMs) and a larger dataset to show the differences between batch-specific and population-averaged models in this application and how they compare to GLMs. These results show some additional improvements in forecasting abilities under certain conditions and show the differences between the significant effects identified in the GLM and GLMM models. The effects of link functions and sample size are also examined at the die and wafer levels. The third part of this research describes a methodology for integrating classification and regression trees (CART) with GLMs. This technique uses the terminal nodes identified in the classification tree to add predictors to a GLM. This method enables the model to consider important interaction terms in a simpler way than with the GLM alone, and provides valuable insight into the fabrication process through the combination of the tree structure and the statistical analysis of the GLM. / Dissertation/Thesis / Ph.D. Industrial Engineering 2011 Industrial Engineering Statistics Management generalized linear mixed models generalized linear models process improvement quality and reliability engineering semiconductor yield modeling
33	Modelos para dados categorizados ordinais com efeito aleatório: uma aplicação à análise sensorial / Models for ordinal categorical data with random effects: an application to the sensory analysis Maíra Blumer Fatoretto 12 January 2016 (has links) Os modelos para dados categorizados ordinais são extensões dos Modelos Lineares Generalizados e suas suposições e inferências são fundamentadas por esta classe de modelos. Os Modelos de Logitos Cumulativos, em que a função de ligação é constituída de probabilidades acumuladas, são muito utilizados para este tipo de variável, sendo uma de suas simplificações, os Modelos de Chances Proporcionais, em que para todas as covaríaveis no modelo há um crescimento linear nas razões de chances, porém, neste caso, é necessária a verificação da suposição de paralelismo. Outros modelos como o Modelo de Chances Proporcionais Parciais, o Modelo de Categorias Adjacentes e o Modelo Logito de Razão Contínua também podem ser utilizados. Em diversos estudos deste tipo, é necessário a utilização de modelos mistos, seja pelo tipo de um fator ou a dependência entre observações da variável resposta. Objetivou-se, neste trabalho, o estudo de modelos para variável resposta ordinal com a inclusão de um ou mais efeitos aleatórios. Esses modelos são ilustrados com a utilização de dados reais de análise sensorial, cuja variável resposta é constituída de uma escala ordinal e deseja-se saber dentre duas variedades de tomates desidratados (Italiano e Sweet Grape), qual teve melhor aceitação pelos consumidores. Nesse experimento os provadores avaliaram uma única vez cada uma das variedades, sendo as repetições constituídas pelas avaliações dadas por diferentes provadores. Nesse caso, é necessária a inclusão de um efeito aleatório por provador, para que o modelo consiga capturar as diferenças entre esses provadores não treinados. O Modelo de Chances Proporcionais ajustou-se de maneira satisfatória aos dados, podendo-se fazer uso das estimativas de probabilidades e razões de chances para a interpretação dos resultados e concluindo-se que o sabor da variedade Sweet Grape foi o que mais agradou os provadores, independente do sexo. / Models for ordinal categorical data are extensions of the Generalized Linear Models and their assumptions and inferences are based on this class of models. The Cumulative Logit Models in wich the link function consists of accumulated probabilities are more used for this type of variable, with one of its simplifications are the Proportional Odds Model, in wich for all covariates in the model there is a linear growth in odds ratios, but in this case, checking the parallelism assumption is required. Other models such as the Partial Proportional Odds Model, the Adjacent-Categories Logits and Continuation-Ratio Logits model can also be used. In several of such studies, the use of mixed models is required, either by type of factor or dependence between the response variable observations. The aim of this work is studying models for ordinal variable response with the inclusion of one or more random effects. These models are illustrated by using real data of sensory analysis, the response variable consists of an ordinal scale and we want to know from two varieties of dried tomatoes, Italian and Sweet Grape, which had better acceptance by consumers. In this experiment, the panelists evaluated each variety once, and the repetitions constituted by the ratings given by different tasters. In this case, the inclusion of a random effect by taster is required so that the model can capture the difference between these untrained tasters. The Proportional Odds Model fitted satisfactorily to the data and it is possible to make use of the estimates of probabilities and odds ratios for the interpretation of results and concluding that the taste of the variety Sweet Grape was the one that most pleased the tasters regardless of sex. Dados categorizados Modelos de logitos cumulativos Modelos lineares generalizados mistos Categorical Data Cumulative Logit Models Generalized Linear Mixed Models
34	Sex-specific Habitat Use and Responses to Fragmentation in an Endemic Chameleon Fauna Shirk, Philip 25 July 2012 (has links) Chameleons are an understudied taxon facing many threats, including collection for the international pet trade and habitat loss and fragmentation. A recent field study reports a highly female-biased sex ratio in the Eastern Arc Endemic Usambara three horned chameleon, Trioceros deremensis, a large, sexually dimorphic species. This species is collected for the pet trade, and local collectors report males bring a higher price because only this sex has horns. Thus, sex ratios may vary due to differential rates of survival or harvesting. Alternatively, they may simply appear to be skewed if differences in habitat use biases detection of the sexes. Another threat facing chameleons is that of habitat loss and fragmentation. Despite enormous amounts of research, the factors of fragmentation that different species respond to is still under debate. Understanding these responses is important for current mitigation efforts as well as predicting how species will respond to future habitat alteration and climate change. My study suggests that differences in survival and detection may explain much of the observed seasonal sex skew in adult T. deremensis. Within fragmented habitat chameleons consistently responded more to edge effects and vegetative characteristics associated with fragmentation than to area or isolation effects. This may bode poorly for chameleon populations in the coming decades as climate change further alters vegetative communities and exacerbates edge effects. habitat fragmentation Chamaeleonidae East Usambara Mountains adult sex ratio radio-tracking generalized linear mixed models Trioceros deremensis pet trade Biology Life Sciences
35	Rezervování škod v rámci panelových dat / Claims reserving within the panel data framework Gerthofer, Michal January 2015 (has links) In the presented thesis the issue of dependency between response variables within the subjects in the generalized linear models framework is investigated. Reserving in non-life insurance is a key factor for the financial position of a company. The text introduces the basic actuarial notation, terminology and methods. The main part is focused on panel data framework, especially Generalized Linear Mixed Models (GLMM) as well as Generalized Estimating Equations (GEE), and their application on claims reserving. The aim of this thesis is to show the advantages, disadvantages, limitations and the comparison of these approaches on representative datasets, which were chosen according to results obtained from whole database analysis. Significant focus is on model selection and diagnostics used for this purpose. Finally, the obtained results are summarized in tables, figures and the comparison of the methods is provided. Powered by TCPDF (www.tcpdf.org)
36	Modelos lineares mistos em dados longitudionais com o uso do pacote ASReml-R / Linear Mixed Models with longitudinal data using ASReml-R package Alcarde, Renata 10 April 2012 (has links) Grande parte dos experimentos instalados atualmente é planejada para que sejam realizadas observações ao longo do tempo, ou em diferentes profundidades, enfim, tais experimentos geralmente contem um fator longitudinal. Uma maneira de se analisar esse tipo de conjunto de dados é utilizando modelos mistos, por meio da inclusão de fatores de efeito aleatório e, fazendo uso do método da máxima verossimilhança restrita (REML), podem ser estimados os componentes de variância associados a tais fatores com um menor viés. O pacote estatístico ASReml-R, muito eficiente no ajuste de modelos lineares mistos por possuir uma grande variedade de estruturas para as matrizes de variâncias e covariâncias já implementadas, apresenta o inconveniente de nao ter como objetos as matrizes de delineamento X e Z, nem as matrizes de variâncias e covariâncias D e , sendo estas de grande importância para a verificação das pressuposições do modelo. Este trabalho reuniu ferramentas que facilitam e fornecem passos para a construção de modelos baseados na aleatorização, tais como o diagrama de Hasse, o diagrama de aleatorização e a construção de modelos mistos incluindo fatores longitudinais. Sendo o vetor de resíduos condicionais e o vetor de parâmetros de efeitos aleatórios confundidos, ou seja, não independentes, foram obtidos resíduos, denominados na literatura, resíduos com confundimento mínimo e, como proposta deste trabalho foi calculado o EBLUP com confudimento mínimo. Para tanto, foram implementadas funções que, utilizando os objetos de um modelo ajustado com o uso do pacote estatístico ASReml-R, tornam disponíveis as matrizes de interesse e calculam os resíduos com confundimento mínimo e o EBLUP com confundimento m´nimo. Para elucidar as técnicas neste apresentadas e salientar a importância da verificação das pressuposições do modelo adotado, foram considerados dois exemplos contendo fatores longitudinais, sendo o primeiro um experimento simples, visando a comparação da eficiência de diferentes coberturas em instalações avícolas, e o segundo um experimento realizado em três fases, contendo fatores inteiramente confundidos, com o objetivos de avaliar características do papel produzido por diferentes espécies de eucaliptos em diferentes idades. / Currently, most part of the experiments installed is designed to be carried out observations over time or at different depths. These experiments usually have a longitudinal factor. One way of analyzing this data set is by using mixed models through means of inclusion of random effect factors, and it is possible to estimate the variance components associated to such factors with lower bias by using the Restricted maximum likelihood method (REML). The ASRemi-R statistic package, very efficient in fitting mixed linear models because it has a wide variety of structures for the variance - covariance matrices already implemented, presents the disadvantage of having neither the design matricesX and Z, nor the variance - covariance matrices D and , and they are very important to verify the assumption of the model. This paper gathered tools which facilitate and provide steps to build models based on randomization such as the Hasse diagram, randomization diagram and the mixed model formulations including longitudinal factors. Since the conditional residuals and random effect parameters are confounded, that is, not independent, it was calculated residues called in the literature as least confounded residuals and as a proposal of this work, it was calculated the least confound EBLUP. It was implemented functions which using the objects of fitted models with the use of the ASReml-R statistic package becoming available the matrices of interests and calculate the least confounded residuals and the least confounded EBLUP. To elucidate the techniques shown in this paper and highlight the importance of the verification of the adopted models assumptions, it was considered two examples with longitudinal factors. The former example was a simple experiment and the second one conducted in three phases, containing completely confounded factors, with the purpose of evaluating the characteristics of the paper produced by different species of eucalyptus from different ages. Agricultural experiments Análise de dados longitudinais Applied statistics Aviaries Aviários Estatística aplicada Eucalipto Eucalyptus Experimentos agrícolas Likelihood Linear mixed models Longitudinal data analysis Modelos lineares mistos Verossimilhança
37	Testes de hipóteses para componentes de variância utilizando estatísticas U / U-tests for variance components in linear mixed models. Nobre, Juvencio Santos 09 August 2007 (has links) Nós consideramos decomposições de estatísticas $U$ para obter testes para componentes de variância. As distribuições assintóticas das estatísticas de testes sob a hipótese nula são obtidas supondo apenas a existência do quarto momento do erro condicional e do segundo momento dos efeitos aleatórios. Isso permite sua utilização em uma classe bastante ampla de distribuições. Sob a suposição adicional de existência do quarto momento dos efeitos aleatórios, obtemos também a distribuição assintótica das estatísticas sob uma seqüência de hipóteses alternativas locais. Comparamos a eficiência dos testes propostos com aqueles dos testes clássicos, obtidos sob suposição de normalidade, por meio de estudos de simu-lação. Os testes propostos se mostram mais adequados nas situações em que a amostra é de tamanho moderado ou grande, independentemente da distribuição das fontes de variação, e nas situações em que existe fortes afastamentos da normalidade. / We consider decompositions of U-statistics to obtain tests for null variance components in linear mixed models. Their asymptotic distributions under the null hypothesis are obtained only assuming the existence of the first four moments of the conditional error distribution and the existence of the first two moments of the random effects distribution. Thus, the proposed U-tests may be employed in a large class of models. Under the additional assumption of the existence of the fourth moment of the distribution of the random effects, we also obtain the asymptotic distribution of the U-tests under a sequence of local hypothesis. We compare their efficiency with that of classical tests derived under the assumption of normality, through simulation studies. The proposed tests are more efficient in situations where the sample size is moderate or large, independently of the distribution of the sources of variation; they also perform better in situations where the underlying distributions are far from normal. Componentes de variância estatística U hipóteses não regulares linear mixed models martingais. martingales. modelos lineares mistos nonstandard hypothesis U-statistics variance components
38	Estimativa do custo da colheita mecanizada de cana-de-açúcar utilizando modelos de regressão / Estimated cost of mechanized harvesting of sugarcane using regression models Maekawa, Eduardo Shigueiti 22 August 2016 (has links) A colheita mecanizada é uma das mais significativas e onerosas operações do processo de produção de cana-de-açúcar, tornando-se importante o entendimento das relações que envolvem o seu custo. Atualmente, as metodologias para estimar o custo da colheita partem do conceito de custo fixo e variável. No entanto, considerando a complexidade desse processo, faz-se necessário avaliar métodos capazes de relacionar os parâmetros operacionais com o custo final. Neste contexto, a modelagem estatística por meio da regressão permite tratar tais relações e prever tendências. O objetivo deste trabalho foi desenvolver um modelo empírico para o cálculo do custo da colheita mecanizada de cana-de-açúcar. Desenvolveu-se um modelo linear generalizado (MLG) e um modelo linear generalizado misto (MLGM) ambos com distribuição gama, utilizando indicadores operacionais e dados de custo de 20 usinas do setor sucroalcooleiro. Por meio do MLGM, obteve-se uma aderência satisfatória quando comparado aos modelos MLG, nulo (média) e linear (supondo normalidade). Os indicadores que explicaram o custo foram: produtividade (t maq-1), consumo (l t-1), horímetro (h) e número de operadores por colhedora (nop). / The mechanized harvesting of sugarcane is one of the most significant and costly operations of the production process, thus it is important to understand the relationships involving its cost. Currently, methods to estimate these costs rise from the concept of fixed and variable cost. However, considering the complexity of the harvesting process, it is necessary to evaluate techniques to relate the operating parameters with the final cost. In this context, statistical modeling by regression allows to treat such relationship and predict trends. The objective of this study was to develop an empirical model to calculate the cost of mechanical harvesting of sugarcane. A generalized linear model (GLM) and a generalized linear mixed model (GLMM) both with gamma distribution was developed using operational indicators and cost data from 20 plants in the sugarcane industry. Through the GLMM, satisfactory adhesion was obtained when compared to the GLM, null model (average) and linear (assuming normality). The indicators that explained the cost were: productivity (t mach-1), consumption (l t-1), hourmeter (h) and number of operators per harvester (nop). Colhedora de cana Custo operacional Generalized linear mixed models Generalized linear models Modelos lineares generalizados Modelos lineares generalizados mistos Operational cost Sugarcane harvester
39	Modelos estatísticos para dados politômicos nominais em estudos longitudinais com uma aplicação à área agronômica / Statistical models for nominal polytomous data in longitudinal studies with an application to agronomy Menarin, Vinicius 14 January 2016 (has links) Estudos em que a resposta de interesse é uma variável categorizada são bastante comuns nas mais diversas áreas da Ciência. Em muitas situações essa resposta é composta por mais de duas categorias não ordenadas, denominada então de uma variável politômica nominal, e em geral o objetivo do estudo é associar a probabilidade de ocorrência de cada categoria aos efeitos de variáveis explicativas. Ademais, existem tipos especiais de estudos em que os dados são coletados diversas vezes para uma mesma unidade amostral ao longo do tempo, os estudos longitudinais. Estudos assim requerem o uso de modelos estatísticos que considerem em sua formulação algum tipo de estrutura que suporte a dependência que tende a surgir entre observações feitas em uma mesma unidade amostral. Neste trabalho são abordadas duas extensões do modelo de logitos generalizados, usualmente empregado quando a resposta é politômica nominal com observações independentes entre si. A primeira consiste de uma modificação das equações de estimação generalizadas para dados nominais que se utiliza de razões de chances locais para descrever a dependência entre as observações da variável resposta politômica ao longo dos diversos tempos observados. Este tipo de modelo é denominado de modelo marginal. A segunda proposta abordada consiste no modelo de logitos generalizados com a inclusão de efeitos aleatórios no preditor linear, que também leva em conta uma dependência entre as observações. Esta abordagem caracteriza o modelo de logitos generalizados misto. Há diferenças importantes inerentes às interpretações dos modelos marginais e mistos, que são discutidas e que devem ser levadas em consideração na escolha da abordagem adequada. Ambas as propostas são aplicadas em um conjunto de dados proveniente de um experimento da área agronômica realizado em campo, conduzido sob um delineamento casualizado em blocos com esquema fatorial para os tratamentos. O experimento foi acompanhado ao longo de seis estações do ano, caracterizando assim uma estrutura longitudinal, sendo a variável resposta o tipo de vegetação observado no campo (touceiras, plantas invasoras ou espaços vazios). Os resultados encontrados são satisfatórios, embora a dependência presente nos dados não seja tão caracterizada; por meio de testes como da razão de verossimilhanças e de Wald diversas diferenças significativas entre os tratamentos foram encontradas. Ainda, devido às diferenças metodológicas das duas abordagens, o modelo marginal baseado nas equações de estimação generalizadas mostra-se mais adequado para esses dados. / Studies where the response is a categorical variable are quite common in many fields of Sciences. In many situations this response is composed by more than two unordered categories characterizing a nominal polytomous outcome and, in general, the aim of the study is to associate the probability of occurrence of each category to the effects of variables. Furthermore, there are special types of study where many measurements are taken over the time for the same sampling unit, called longitudinal studies. Such studies require special statistical models that consider some kind of structure that support the dependence that tends to arise from the repeated measurements for the same sampling unit. This work focuses on two extensions of the baseline-category logit model usually employed in cases when there is a nominal polytomous response with independent observations. The first one consists in a modification of the well-known generalized estimating equations for longitudinal data based on local odds ratios to describe the dependence between the levels of the response over the repeated measurements. This type of model is also known as a marginal model. The second approach adds random effects to the linear predictor of the baseline-category logit model, which also considers a dependence between the observations. This characterizes a baseline-category mixed model. There are substantial differences inherent to interpretations when marginal and mixed models are compared, what should be considered in the choice of the most appropriated approach for each situation. Both methodologies are applied to the data of an agronomic experiment installed under a complete randomized block design with a factorial arrangement for the treatments. It was carried out over six seasons, characterizing the longitudinal structure, and the response is the type of vegetation observed in field (tussocks, weeds or regions with bare ground). The results are satisfactory, even if the dependence found in data is not so strong, and likelihood-ratio and Wald tests point to several differences between treatments. Moreover, due to methodological differences between the two approaches, the marginal model based on generalized estimating equations seems to be more appropriate for this data. Dados categorizados nominais Equações de estimação generalizadas generalized estimating equations generalized linear mixed models Medidas repetidas no tempo Modelos lineares generalizados mistos nominal categorical data repeated measurements over time
40	Superdispersão em dados binomiais hierárquicos / Overdispersion in hierarchical binomial data Nati, Lilian 05 March 2008 (has links) Para analisar dados binários oriundos de uma estrutura hierárquica com dois níveis (por exemplo, aluno e escola), uma alternativa bastante utilizada é a suposição da distribuição binomial para as unidades experimentais do primeiro nível (aluno) condicionalmente a um efeito aleatório proveniente de uma distribuição normal para as unidades do segundo nível (escola). Neste trabalho, propõe-se a adição de um efeito aleatório normal no primeiro nível de um modelo linear generalizado hierárquico binomial para contemplar uma possível variabilidade extra-binomial decorrente da dependência entre os ensaios de Bernoulli de um mesmo indivíduo. Obtém-se o processo de estimação por máxima verossimilhança para este modelo a partir da verossimilhança marginal dos dados, após uma dupla aplicação do método de quadratura de Gauss-Hermite adaptativa como aproximação para as integrais dos efeitos aleatórios. Realiza-se um estudo de simulação para contrastar propriedades inferenciais do modelo aspirante com o modelo linear generalizado binomial, um modelo de quase-verossimilhança e o tradicional modelo linear generalizado hierárquico em dois níveis. / A common alternative when analyzing binary data originated from a two-level hierarchical structure (for instance, student and school) is to assume a binomial distribution for the experimental units of the first level (student) conditionally to a normal random effect for the second level units (school). In this work, we propose the inclusion of a second normal random effect in the first level to contemplate a possible extra-binomial variability due to the dependence among the Bernoulli trials in the same individual. We obtain the maximum likelihood estimation process for this hierarchical model starting from the marginal likelihood of the data, after a double application of the adaptive Gauss-Hermite quadrature as an approximation of the integrals of the random effects. We conduct a simulation study to compare the inferential properties of the advocated model with the generalized linear (binomial) model, a quasi-likelihood model and the usual two-level hierarchical generalized linear model. binomial data dados binomiais generalized linear mixed models hierarchical models modelos hierárquicos modelos lineares generalizados mistos modelos multiníveis multilevel models overdispersion superdispersão

Search results