Global ETD Search

21	Learning 3-D Models of Object Structure from Images Schlecht, Joseph January 2010 (has links) Recognizing objects in images is an effortless task for most people.Automating this task with computers, however, presents a difficult challengeattributable to large variations in object appearance, shape, and pose. The problemis further compounded by ambiguity from projecting 3-D objects into a 2-D image.In this thesis we present an approach to resolve these issues by modeling objectstructure with a collection of connected 3-D geometric primitives and a separatemodel for the camera. From sets of images we simultaneously learn a generative,statistical model for the object representation and parameters of the imagingsystem. By learning 3-D structure models we are going beyond recognitiontowards quantifying object shape and understanding its variation.We explore our approach in the context of microscopic images of biologicalstructure and single view images of man-made objects composed of block-likeparts, such as furniture. We express detected features from both domains asstatistically generated by an image likelihood conditioned on models for theobject structure and imaging system. Our representation of biological structurefocuses on Alternaria, a genus of fungus comprising ellipsoid and cylindershaped substructures. In the case of man-made furniture objects, we representstructure with spatially contiguous assemblages of blocks arbitrarilyconstructed according to a small set of design constraints.We learn the models with Bayesian statistical inference over structure andcamera parameters per image, and for man-made objects, across categories, suchas chairs. We develop a reversible-jump MCMC sampling algorithm to exploretopology hypotheses, and a hybrid of Metropolis-Hastings and stochastic dynamicsto search within topologies. Our results demonstrate that we can infer both 3-Dobject and camera parameters simultaneously from images, and that doing soimproves understanding of structure in images. We further show how 3-D structuremodels can be inferred from single view images, and that learned categoryparameters capture structure variation that is useful for recognition. 3-D Object recognition Computer vision Machine learning MCMC sampling Statistical inference
22	Delay estimation in computer networks Johnson, Nicholas Alexander January 2010 (has links) Computer networks are becoming increasingly large and complex; more so with the recent penetration of the internet into all walks of life. It is essential to be able to monitor and to analyse networks in a timely and efficient manner; to extract important metrics and measurements and to do so in a way which does not unduly disturb or affect the performance of the network under test. Network tomography is one possible method to accomplish these aims. Drawing upon the principles of statistical inference, it is often possible to determine the statistical properties of either the links or the paths of the network, whichever is desired, by measuring at the most convenient points thus reducing the effort required. In particular, bottleneck-link detection methods in which estimates of the delay distributions on network links are inferred from measurements made at end-points on network paths, are examined as a means to determine which links of the network are experiencing the highest delay. Initially two published methods, one based upon a single Gaussian distribution and the other based upon the method-of-moments, are examined by comparing their performance using three metrics: robustness to scaling, bottleneck detection accuracy and computational complexity. Whilst there are many published algorithms, there is little literature in which said algorithms are objectively compared. In this thesis, two network topologies are considered, each with three configurations in order to determine performance in six scenarios. Two new estimation methods are then introduced, both based on Gaussian mixture models which are believed to offer an advantage over existing methods in certain scenarios. Computationally, a mixture model algorithm is much more complex than a simple parametric algorithm but the flexibility in modelling an arbitrary distribution is vastly increased. Better model accuracy potentially leads to more accurate estimation and detection of the bottleneck. The concept of increasing flexibility is again considered by using a Pearson type-1 distribution as an alternative to the single Gaussian distribution. This increases the flexibility but with a reduced complexity when compared with mixture model approaches which necessitate the use of iterative approximation methods. A hybrid approach is also considered where the method-of-moments is combined with the Pearson type-1 method in order to circumvent problems with the output stage of the former. This algorithm has a higher variance than the method-of-moments but the output stage is more convenient for manipulation. Also considered is a new approach to detection algorithms which is not dependant on any a-priori parameter selection and makes use of the Kullback-Leibler divergence. The results show that it accomplishes its aim but is not robust enough to replace the current algorithms. Delay estimation is then cast in a different role, as an integral part of an algorithm to correlate input and output streams in an anonymising network such as the onion router (TOR). TOR is used by users in an attempt to conceal network traffic from observation. Breaking the encryption protocols used is not possible without significant effort but by correlating the un-encrypted input and output streams from the TOR network, it is possible to provide a degree of certainty about the ownership of traffic streams. The delay model is essential as the network is treated as providing a pseudo-random delay to each packet; having an accurate model allows the algorithm to better correlate the streams. 004.6
23	A NOVEL COMPUTATIONAL FRAMEWORK FOR TRANSCRIPTOME ANALYSIS WITH RNA-SEQ DATA Hu, Yin 01 January 2013 (has links) The advance of high-throughput sequencing technologies and their application on mRNA transcriptome sequencing (RNA-seq) have enabled comprehensive and unbiased profiling of the landscape of transcription in a cell. In order to address the current limitation of analyzing accuracy and scalability in transcriptome analysis, a novel computational framework has been developed on large-scale RNA-seq datasets with no dependence on transcript annotations. Directly from raw reads, a probabilistic approach is first applied to infer the best transcript fragment alignments from paired-end reads. Empowered by the identification of alternative splicing modules, this framework then performs precise and efficient differential analysis at automatically detected alternative splicing variants, which circumvents the need of full transcript reconstruction and quantification. Beyond the scope of classical group-wise analysis, a clustering scheme is further described for mining prominent consistency among samples in transcription, breaking the restriction of presumed grouping. The performance of the framework has been demonstrated by a series of simulation studies and real datasets, including the Cancer Genome Atlas (TCGA) breast cancer analysis. The successful applications have suggested the unprecedented opportunity in using differential transcription analysis to reveal variations in the mRNA transcriptome in response to cellular differentiation or effects of diseases. RNA-seq transcriptome algorithm statistical inference data mining Bioinformatics Computational Biology Genomics Other Computer Sciences
24	Phylodynamic Methods for Infectious Disease Epidemiology Rasmussen, David Alan January 2014 (has links) <p>In this dissertation, I present a general statistical framework for phylodynamic inference that can be used to estimate epidemiological parameters and reconstruct disease dynamics from pathogen genealogies. This framework can be used to fit a broad class of epidemiological models, including nonlinear stochastic models, to genealogies by relating the population dynamics of a pathogen to its genealogy using coalescent theory. By combining Markov chain Monte Carlo and particle filtering methods, efficient Bayesian inference of all parameters and unobserved latent variables is possible even when analytical likelihood expressions are not available under the epidemiological model. Through extensive simulations, I show that this method can be used to reliably estimate epidemiological parameters of interest as well as reconstruct past disease dynamics from genealogies, or jointly from genealogies and other common sources of epidemiological data like time series. I then extend this basic framework to include different types of host population structure, including models with spatial structure, multiple-hosts or vectors, and different stages of infection. The later is demonstrated by using a multistage model of HIV infection to estimate stage-specific transmission rates and incidence from HIV sequence data collected in Detroit, Michigan. Finally, to demonstrate how the approach can be used more generally, I consider the case of dengue virus in southern Vietnam. I show how earlier phylodynamic inference methods fail to reliably reconstruct the dynamics of dengue observed in hospitalization data, but by deriving coalescent models that take into consideration ecological complexities like seasonality, vector dynamics and spatial structure, accurate dynamics can be reconstructed from genealogies. In sum, by extending phylodynamics to include more ecologically realistic and mechanistic models, this framework can provide more accurate estimates and give deeper insight into the processes driving infectious disease dynamics.</p> / Dissertation Epidemiology Statistics Biology Coalescent Dengue HIV Infectious disease Phylodynamics Statistical inference
25	Computerized achievement tests : sequential and fixed length tests Wiberg, Marie H. January 2003 (has links) The aim of this dissertation is to describe how a computerized achivement test can be constructed and used in practice. Throughout this dissertation the focus is on classifying the examinees into masters and non-masters depending on their ability. However, there has been no attempt to estimate their ability. In paper I, a criterion-referenced computerized test with a fixed number of items is expressed as a statistical inference problem. The theory of optimal design is used to find the test that has the strongest power. A formal proof is provided showing that all items should have the same item characteristics, viz. high discrimination, low guessing and difficulty near the cutoff score, in order to give us the most powerful statistical test. An efficiency study shows how many times more non-optimal items are needed if we do not use optimal items in order to achieve the same power in the test. In paper II, a computerized mastery sequential test is examined using sequential analysis. The focus is on examining the sequential probability ratio test and to minimize the number of items in a test, i.e. to minimize the average sample number function, abbreviated as the ASN function. Conditions under which the ASN function decreases are examined. Further, it is shown that the optimal values are the same for item discrimination and item guessing, but differ for item difficulty compared with tests with fixed number of items. Paper III presents three simulation studies of sequential computerized mastery tests. Three cases are considered, viz. the examinees' responses are either identically distributed, not identically distributed, or not identically distributed together with estimation errors in the item characteristics. The simulations indicate that the observed results from the operating characteristic function differ significantly from the theoretical results. The mean number of items in a test, the distribution of test length and the variance depend on whether the true values of the item characteristics are known and whether they are iid or not. In paper IV computerized tests with both pretested items with known item parameters, and try-out items with unknown item parameters are considered. The aim is to study how the item parameters for try-out items can be estimated in a computerized test. Although the unknown examinees' abilities may act as nuisance parameters, the asymptotic variance of the item parameter estimators can be calculated. Examples show that a more reliable variance estimator yields much larger estimates of the variance than commonly used variance estimators. Statistics item respons theory sequential analysis statistical inference power optimal design simulations Statistik Statistics Statistik
26	Modelo de Grubbs em grupos / Grubbs' model with subgroups Zeller, Camila Borelli 23 February 2006 (has links) Orientador: Filidor Edilfonso Vilca Labra / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matematica, Estatistica e Computação Cientifica / Made available in DSpace on 2018-08-05T23:55:16Z (GMT). No. of bitstreams: 1 Zeller_CamilaBorelli_M.pdf: 3683998 bytes, checksum: 26267086098b12bd76b1d5069f688223 (MD5) Previous issue date: 2006 / Resumo: Neste trabalho, apresentamos um estudo de inferência estatística no modelo de Grubbs em grupos, que representa uma extensão do modelo proposto por Grubbs (1948,1973) que é freqüentemente usado para comparar instrumentos ou métodos de medição. Nós consideramos a parametrização proposta por Bedrick (2001). O estudo é baseado no método de máxima verossimilhança. Testes de hipóteses são considerados e baseados nas estatísticas de wald, escore e razão de verossimilhanças. As estimativas de máxima verossimilhança do modelo de Grubbs em grupos são obtidas usando o algoritmo EM e considerando que as observações seguem uma distribuição normal. Apresentamos um estudo de análise de diagnóstico no modelo de Grubbs em grupos com o interesse de avaliar o impacto que um determinado subgrupo exerce na estimativa dos parâmetros. Vamos utilizar a metodologia de influência local proposta por Cook (1986), considerando o esquema de perturbação: ponderação de casos. Finalmente, apresentamos alguns estudos de simulação e ilustramos os resultados teóricos obtidos usando dados encontrados na literatura / Abstract: In this work, we presented a study of statistical inference in the Grubbs's model with subgroups, that represents an extension of the model proposed by Grubbs (1948,1973) that is frequently used to compare instruments or measurement methods. We considered the parametrization proposed by Bedrick (2001). The study is based on the maximum likelihood method. Tests of hypotheses are considered and based on the wald statistics, score and likelihood ratio statistics. The maximum likelihood estimators of the Grubbs's model with subgroups are obtained using the algorithm EM and considering that the observations follow a normal distribution. We also presented a study of diagnostic analysis in the Grubb's model with subgroups with the interest of evaluating the effect that a certain one subgroup exercises in the estimate of the parameters. We will use the methodology of local influence proposed by Cook (1986) considering the schemes of perturbation of case weights. Finally, we presented some simulation studies and we illustrated the obtained theoretical results using data found in the literature / Mestrado / Mestre em Estatística Verossimilhança (Estatística) Estatística matemática Inferencia estatistica Likelihood (Statistic) Mathematical statistics Statistical inference
27	Avaliação de transcritos diferencialmente expressos neoplasias humanas com ORESTES / Evaluation of differential expression profiles across neoplasic human samples using ORESTES (Opening reading frame) Peres, Tarcisio de Souza 30 August 2006 (has links) Orientador: Fernando Lopes Alberto / Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Ciencias Medicas / Made available in DSpace on 2018-08-07T09:04:32Z (GMT). No. of bitstreams: 1 Peres_TarcisiodeSouza_M.pdf: 2330056 bytes, checksum: 66c1d9241ad60e2d973cfb7361a5eb7c (MD5) Previous issue date: 2006 / Resumo: Durante todo o século XX, a pesquisa do câncer se desenvolveu de maneira sistemática, porém os últimos 25 anos foram notadamente caracterizados por rápidos avanços que geraram uma rica e complexa base de conhecimentos, evidenciando a doença dentro de um conjunto dinâmico de alterações no genoma. Desta forma, o entendimento completo dos fenômenos moleculares envolvidos na fisiopatologia das neoplasias depende do conhecimento dos diversos processos celulares e bioquímicos característicos da célula tumoral e que, porventura, a diferenciem da célula normal (GOLUB e SLONIM, 1999). Nesse trabalho buscamos o melhor entendimento das vias moleculares no processo neoplásico por meio da análise dos dados do Projeto Genoma Humano do Câncer (CAMARGO, 2001) com vistas à identificação de genes diferencialmente expressos nas neoplasias dos seguintes tecidos: mama, cólon, cabeça e pescoço, pulmão, sistema nervoso central, próstata, estômago, testículo e útero. A metodologia de geração dos transcritos utilizada pelo Projeto Genoma Humano do Câncer é conhecida como ORESTES (DIAS et al, 2000). Inicialmente, os dados de seqüenciamento (fragmentos ORESTES) foram agrupados por meio de uma técnica conhecida em Bioinformática como ¿montagem¿, utilizando o pacote de programas de computador PHRED/PHRAP (EWING e GREEN P., 1998). A comparação de cada agrupamento com seqüências conhecidas (depositadas em bases públicas) foi realizada por meio do algoritmo BLAST (ALTSCHUL et al, 1990). Um subconjunto de genes foi selecionado com base em critérios específicos e submetido à avaliação de seus níveis de expressão em diferentes tecidos com base em abordagem de inferência Bayesiana (CHEN et al, 1998), em contraposição às abordagens mais clássicas, como testes de hipótese nula (AUDIC e CLAVERIE, 1997). A inferência Bayesiana foi viabilizada pelo desenvolvimento de uma ferramenta computacional escrita em linguagem PERL (PERES et al, 2005). Com o apoio da literatura, foi criada uma lista de genes relacionados ao fenômeno neoplásico. Esta lista foi confrontada com as informações de expressão gênica, constituindo-se em um dos parâmetros de um sistema de classificação (definido para a seleção dos genes de interesse). Desta forma, parte da base de conhecimento sobre câncer foi utilizada em conjunto com os dados de expressão gênica inferidos a partir dos fragmentos ORESTES. Para contextualização biológica da informação gerada, os genes foram classificados segundo nomenclatura GO (ASHBURNER et al, 2000) e KEGG (OGATA et al, 1999). Parte dos genes apontados como diferencialmente expressos em pelo menos um tecido tumoral, em relação ao seu equivalente normal, integram vias relacionadas ao fenômeno neoplásico (HAHN e WEINBERG, 2002). Dos genes associados a estas vias, 52% deles possuíam fator de expressão diferencial (em módulo) superior a cinco. Finalmente, dez entre os genes classificados foram escolhidos para confirmação experimental dos achados. Os resultados de qPCR em amostras de tecido gástrico normal e neoplásico foram compatíveis com com os dados de expressão gênica inferidos a partir dos fragmentos ORESTES / Abstract: The XXth century showed the development in cancer research in a systematic way, most notably in the last 25 years that were characterized by rapid advances that generated a rich and complex body of knowledge, highlighting the disease within a dynamic group of changes in the genome. The complete understanding of the molecular phenomena involved in the physiopathology of neoplasia is based upon the knowledge of the varied cellular and biochemical processes which are characteristic of the tumor and which make it different from the normal cell (GOLUB e SLONIM, 1999) In this work, we investigated the molecular pathways in the neoplasic process through data analyses of the cDNA sequences generated on the Human Cancer Genome Project (CAMARGO, 2001). The following neoplasias were included: breast, colon, head and neck, lungs, central nervous system, prostate gland, stomach, testicle and womb. The methodology of generation of transcripts used by the Genome Project of Human Cancer is known as ORESTES (DIAS et al, 2000). Initially, the sequence of data (ORESTES fragments) were grouped and assembled according to similarity scores. For this purpose, we used the package of computer programs PHRED/PHRAP (EWING e GREEN P., 1998). The resulting consensus sequences, each representing a cluster, were compared to known sequences (deposited in public databanks) through the BLAST algorithm (ALTSCHUL et al, 1990). A subgroup of genes was selected based on specific criteria and their levels of expression in different tissues were evaluated by a bayesian inference approach (CHEN et al, 1998), as compared to more classical approaches such as null hypothesis tests (AUDIC e CLAVERIE, 1997). The Bayesian inference tool was represented as a PERL script developed for this work. A list of genes, putatively related to the neoplasic phenotype, was created with the support of the literature. This list was compared to the gene expression information, becoming one of the parameters of a ranking system (defined for the selection of genes of interest). Therefore, part of the knowledge related to cancer was used together with the data of gene expression inferred from ORESTES fragments. For a more accurate understanding of the molecular pathways involved in the generated information, the genes were classified according to the Gene Ontology (ASHBURNER et al, 2000) and KEGG (OGATA et al, 1999) nomenclatures. Additional global analyses by pathways related to the neoplasic phenomenon (HAHN e WEINBERG, 2002) demonstrated differential expression of the selected genes. About 52% of the genes in this pathways were differentially expressed in tumor tissue with at least a 5-fold. Finally, ten genes were selected for experimental validation (in vitro) of the findings with real-time quantitative PCR, confirming in silico results / Mestrado / Ciencias Biomedicas / Mestre em Ciências Médicas Bioinformática Genetica - Expressão Câncer Inferencia estatistica Transcrição genética Bioinformatics Gene expression Neoplasm Statistical inference Transcription, Genetic
28	Método Bootstrap na agricultura de precisão / Bootstrap method in precision farming Dalposso, Gustavo Henrique 15 February 2017 (has links) Submitted by Neusa Fagundes (neusa.fagundes@unioeste.br) on 2017-09-20T19:23:51Z No. of bitstreams: 1 Gustavo_Dalposso2017.pdf: 1367696 bytes, checksum: 564cf4753004f95da013e7b93a9767ac (MD5) / Made available in DSpace on 2017-09-20T19:23:51Z (GMT). No. of bitstreams: 1 Gustavo_Dalposso2017.pdf: 1367696 bytes, checksum: 564cf4753004f95da013e7b93a9767ac (MD5) Previous issue date: 2017-02-15 / Fundação Araucária de Apoio ao Desenvolvimento Científico e Tecnológico do Estado do Paraná (FA) / One issue in precision agriculture studies concerns about the statistical methods applied in inferential analysis, since they have required assumptions that, sometimes, cannot be assumed. A possibility to traditional methods is to use the bootstrap method, which consists in resampling and replacing the original data set to carry out inferences. The bootstrap methodology can be applied to independent sample data as well as in cases of dependence, such as in spatial statistics. However, adjustments are required during the resampling process in order to use the bootstrap method in spatial data. Thus, this trial aimed at applying the bootstrap method in precision agriculture studies, whose result was the preparation of three scientific papers. Soybean yield and soil attributes datasets formed with few samples were used in the first paper to determine a multiple linear regression model. Bootstrap methods were chosen to select variables, identify influential points and determine confidence intervals of the model parameters. The results showed that the bootstrap methods allowed selecting significant attributes to design a model, to build confidence intervals of the studied parameters and finally to indentify the influential points on the estimated parameters. Besides, spatial dependence of soybean yield data and soil attributes were studied in the second paper by bootstrap method in geostatistical analysis. The spatial bootstrap method was used to quantify the uncertainties associated with the spatial dependence structure, the fitted model parameter estimators, kriging predicted values and multivariate normality assumption of data. Thus, it was possible to quantify the uncertainties in all phases of geostatistical analysis. A spatial linear model was used to analyze soybean yield considering the soil attributes in the third paper. Spatial bootstrap methods were used to determine point and interval estimators associated with the studied model parameters. Hypothesis tests were carried out on the model parameters and probability plots were developed to identify data normality. These methods allowed to quantify the uncertainties associated to the structure of spatial dependence, as well as to evaluate the individual significance of the parameters associated with the average of the spatial linear model and to verify data multivariate normality assumption. Finally, it is concluded that bootstrap method is an effective alternative to make statistical inferences in precision agriculture studies. / Um problema que ocorre nos estudos vinculados à agricultura de precisão diz respeito aos métodos estatísticos utilizados nas análises inferenciais, pois eles requerem pressupostos que muitas vezes não podem ser assumidos. Uma alternativa aos métodos tradicionais é a utilização do método bootstrap, que utiliza reamostragens com reposição do conjunto de dados originais para realizar inferências. A metodologia bootstrap pode ser aplicada a dados amostrais independentes e também em casos de dependência, como na estatística espacial. No entanto, para se utilizar o método bootstrap em dados espaciais, são necessárias adaptações no processo de reamostragem. Este trabalho teve como objetivo utilizar o método bootstrap em estudos vinculados à agricultura de precisão, cujo resultado é a elaboração de três artigos. No primeiro artigo utilizou-se um conjunto de dados de produtividade de soja e atributos do solo formado com poucas amostras para determinar um modelo de regressão linear múltipla. Foram utilizados métodos bootstrap para a seleção de variáveis, identificação de pontos influentes e determinação de intervalos de confiança dos parâmetros do modelo. Os resultados mostraram que os métodos bootstrap permitiram selecionar os atributos que foram significativos na construção do modelo, construir os intervalos de confiança dos parâmetros e identificar os pontos que tiveram grande influência sobre os parâmetros estimados. No segundo artigo estudou-se a dependência espacial de dados de produtividade de soja e atributos do solo utilizando o método bootstrap na análise geoestatística. Utilizou-se o método bootstrap espacial para quantificar as incertezas associadas à caracterização das estruturas de dependência espacial, aos estimadores dos parâmetros dos modelos ajustados, aos valores preditos por krigagem e ao pressuposto de normalidade multivariada dos dados. Os resultados obtidos possibilitaram quantificar as incertezas em todas as fases da análise geoestatística. No terceiro artigo utilizou-se uma regressão espacial linear para modelar a produtividade de soja em função de atributos do solo. Foram utilizados métodos bootstrap espaciais para determinar estimadores pontuais e por intervalo associados aos parâmetros do modelo. Realizaram-se testes de hipóteses sobre os parâmetros do modelo e foram eleborados gráficos de probabilidade para identificar a normalidade dos dados. Os métodos permitiram quantificar as incertezas associadas à estrutura de dependência espacial, avaliar a significância individual dos parâmetros associados à média do modelo espacial linear e verificar a suposição de normalidade multivariada dos dados. Conclui-se, portanto, que o método bootstrap é uma eficaz alternativa para realizar inferências em estudos vinculados à agricultura de precisão. Geoestatística Inferência estatística Produtividade de soja Reamostragem Geostatistic Statistical inference Soybean yield Resampling CIENCIAS AGRARIAS::ENGENHARIA AGRICOLA
29	Kvantilová regrese / Quantile Regression Procházka, Jiří January 2015 (has links) The thesis deals with brief introduction of the quantile regression theory. The thesis is divided into three thematic parts. In the first part the thesis deals with general introduction to the quantile regression, with theoretical aspects regarding quantile regression and with basic approaches to estimation of quantile regression parameters. The second part of the thesis focuses on general and asymptotic properties of the quantile regression. Goal of this part is to compare the quantile regression with traditional OLS regression and outline its possible application. In the third part the thesis describes statistical inference, construction of the confidence intervals and testing statistical hypotheses about quantile regression parameters. The goal of this part is to introduce traditional approach and the approach based on resampling procedures and in the end of the day perform mutual comparison of different approaches eventually propose partial modification.
30	Modelos lineares generalizados mistos multivariados para caracterização genética de doenças / Multivariate generalized linear mixed models for genetic characterization of diseases Baldoni, Pedro Luiz, 1989- 24 August 2018 (has links) Orientador: Hildete Prisco Pinheiro / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matemática, Estatística e Computação / Made available in DSpace on 2018-08-24T09:34:36Z (GMT). No. of bitstreams: 1 Baldoni_PedroLuiz_M.pdf: 4328843 bytes, checksum: 0ab04f375988e62ac31097716ac0eaa5 (MD5) Previous issue date: 2014 / Resumo: Os Modelos Lineares Generalizados Mistos (MLGM) são uma generalização natural dos Modelos Lineares Mistos (MLM) e dos Modelos Lineares Generalizados (MLG). A classe dos MLGM estende a suposição de normalidade dos dados permitindo o uso de várias outras distribuições bem como acomoda a superdispersão frequentemente observada e também a correlação existente entre observações em estudos longitudiais ou com medidas repetidas. Entretanto, a teoria de verossimilhança para MLGM não é imediata uma vez que a função de verossimilhança marginal não possui forma fechada e envolve integrais de alta dimensão. Para solucionar este problema, diversas metodologias foram propostas na literatura, desde técnicas clássicas como quadraturas numéricas, por exemplo, até métodos sofisticados envolvendo algoritmo EM, métodos MCMC e quase-verossimilhança penalizada. Tais metodologias possuem vantagens e desvantagens que devem ser avaliadas em cada tipo de problema. Neste trabalho, o método de quase-verossimilhança penalizada (\cite{breslow1993approximate}) foi utilizado para modelar dados de ocorrência de doença em uma população de vacas leiteiras pois demonstrou ser robusto aos problemas encontrados na teoria de verossimilhança deste conjunto de dados. Além disto, os demais métodos não se mostram calculáveis frente à complexidade dos problemas existentes em genética quantitativa. Adicionalmente, estudos de simulação são apresentados para verificar a robustez de tal metodologia. A estabilidade dos estimadores e a teoria de robustez para este problema não estão completamente desenvolvidos na literatura / Abstract: Generalized Linear Mixed Models (GLMM) are a generalization of Linear Mixed Models (LMM) and of Generalized Linear Models (GLM). The class of models GLMM extends the normality assumption of the data and allows the use of several other probability distributions, for example, accommodating the over dispersion often observed and also the correlation among observations in longitudinal or repeated measures studies. However, the likelihood theory of the GLMM class is not straightforward since its likelihood function has not closed form and involves a high order dimensional integral. In order to solve this problem, several methodologies were proposed in the literature, from classical techniques as numerical quadrature¿s, for example, up to sophisticated methods involving EM algorithm, MCMC methods and penalized quasi-likelihood. These methods have advantages and disadvantages that must be evaluated in each problem. In this work, the penalized quasi-likelihood method (\cite{breslow1993approximate}) was used to model infection data in a population of dairy cattle because demonstrated to be robust in the problems faced in the likelihood theory of this data. Moreover, the other methods do not show to be treatable faced to the complexity existing in quantitative genetics. Additionally, simulation studies are presented in order to verify the robustness of this methodology. The stability of these estimators and the robust theory of this problem are not completely studied in the literature / Mestrado / Estatistica / Mestre em Estatística Modelos lineares generalizados Inferencia estatistica Genetica quantitativa Generalized linear models Statistical inference Quantitative genetics

Search results