• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 80
  • 38
  • 11
  • 6
  • 5
  • 4
  • 3
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 172
  • 172
  • 172
  • 75
  • 71
  • 69
  • 25
  • 23
  • 22
  • 20
  • 19
  • 17
  • 15
  • 15
  • 15
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
121

[pt] EXPERIMENTOS COM MISTURA: UMA APLICAÇÃO COM RESPOSTAS NÃO-NORMAIS / [en] MIXTURE EXPERIMENTS: AN APPLICATION WITH NONNORMAL RESPONSES

LUIZ HENRIQUE ABREU DAL BELLO 03 January 2006 (has links)
[pt] Esta dissertação, além de apresentar uma abordagem de um caso prático real, fez reunir as técnicas estatísticas necessárias ao trato de experimentos envolvendo misturas. Foi visto que as metodologias adotadas em Projeto de Experimentos devem ser adaptadas para possibilitar o trato de problemas com misturas, já que há a necessidade de considerar a restrição básica desse tipo de experimento, o qual amarra a soma das proporções dos componentes, que deve ser sempre igual a 1, ou seja, 100%. O experimento do misto de retardo, objeto principal e motivador dessa dissertação, é um experimento com mistura, em que as proporções de todos os três componentes possuem restrições superiores e inferiores simultaneamente. Com essas restrições, o espaço fatorial restrito fica bem distorcido em relação ao simplex, havendo, portanto, a necessidade de geração de um design D-ótimo. Como houve a indicação de que a variância da resposta não é constante, no caso do misto de retardo, recorreu-se aos Modelos Lineares Generalizados, especificamente ao método da Quase- Verossimilhança. De posse do modelo adequado, pôde-se então determinar a proporção dos componentes do misto de retardo, tendo em vista o atendimento da especificação de projeto. / [en] This dissertation presents a real pactical case, and besides, it puts together the statistical techniques for the treatment of Mixture Experiments. It was presented, that the Design of Experiments techniques must be adapted in order to make possible the treatment of problems with mixtures, because the basic constraint in this type of experiment must be taken into account, that is, the sum of the proportions of all mixture components must be equal to 1 or 100%. The delay compound experiment, the main and motivating object in this dissertation, is a mixture experiment with simultaneous constraints in the proportions of all its three components. With these constraints, it is possible to observe a distortion in the restricted factorial design space in comparison to the simplex one. Therefore, it was necessary to generate a D-optimal design. When there was an indication that the response variance is not constant, in the case of the delay compound, the Generalized Linear Models, specifically the Quasi- Likelihood method was used to fit an adequate model. With the adequate model, it was possible to find the proportion of each component of the delay compound in order to attend the design specification.
122

The Impact of Weather on Residential Fires in Sweden: A Regression Analysis / Väders Inverkan på Bostadsbränder i Sverige: En Regressionsanalys

Reineck, Viktor, Ulfsparre, Folke January 2019 (has links)
The purpose of this report is to investigate possible relationships between the number of residential fires in Sweden and various weather parameters. The study is conducted based on a hypothesis as stated by the MSB, the Swedish Civil Contingencies Agency, that behavioral factors related to weather can have an influence on the number of residential fires. Generalized linear models within the regression analysis have been used and specifically Poisson and negative binomial regression. The aim was to map the possible connection and determine if it was possible to use the analysis as a tool to improve the emergency services in Sweden. Temperature, short term differences in temperature and precipitation were analyzed with residential fires as the dependent variable, which resulted in a model for each municipality in Sweden. The relationships between the weather parameters and residential fires, seen throughout Sweden, proved to be weak to non-existent with one exception. The average temperature variable was significant in 117 out of 290 municipalities and indicated a relationship where the expected number of residential fires decreases at temperature increases. Due to the weak relationships, the model is not recommended as a prognostic tool on a national level. However, individual models could be used as a supplement to current prognostic tools at a local level and used for preventive purposes. Thus, the study has concluded that weather has some impact on the expected number of residential fires and thus has the potential to be used as a tool when forecasting residential fires. As an addition to the regression analysis, an organizational analysis of the emergency services in Sweden is carried out. The analysis sought the optimal structure based on the emergency services conditions and requirements, which were defined on the basis of organizational concepts and methods. The result was a more structured operation and organization where methods and processes are managed at a centralized level. / Syftet med denna rapport är att undersöka eventuella samband mellan antalet bostadsbränder i Sverige och olika väderparametrar. Studien genomförts mot bakgrund av en hypotes ställd av MSB, Myndigheten för Samhällsskydd och Beredskap, om att beteendefaktorer relaterade till vädret kan ha en påverkan på antalet bostadsbränder. Generaliserade linjära modeller inom regressionsanalysen har använts och specifikt Poisson- och negativ binomialregression. Målet var att kartlägga det eventuella sambandet och avgöra huruvida det var möjligt att nyttja analysen som verktyg för att förbättra räddningstjänsten i Sverige. Temperatur, kortsiktig temperaturförändring och nederbörd analyserades med bostadsbränder som den beroende variabeln, vilket resulterade i en modell för varje svensk kommun. Sambanden mellan väderparametrarna och bostadsbränder, sett över hela Sverige, visade sig vara svaga till obefintliga med ett undantag. Variabeln för genomsnittstemperatur var signifikant i 117 av 290 kommuner och visade på ett samband där förväntat antal bostadsbränder minskar vid ökad temperatur. På grund av de svaga sambanden, sett över hela Sverige, rekommenderas inte modellen som prognostiskt verktyg på nationell nivå. Däremot skulle enskilda modeller kunna användas som komplement till nuvarande prognostiska verktyg på lokal nivå, samt användas i förebyggande syfte. Därmed har studien kommit fram till att väder har viss påverkan på det förväntade antalet bostadsbränder och således har potential att användas som verktyg vid prognos av bostadsbränder. Som ett komplement till regressionsanalysen genomförs en organisatorisk analys av räddningstjänsten i Sverige. Analysen sökte den optimala strukturen utifrån räddningstjänstens förutsättningar och krav, som definierades utifrån grundläggande organisatoriska begrepp och metoder. Resultatet blev en mer strukturerad verksamhet där metoder och processer sköts på en centraliserad nivå.
123

Supervised Learning for Prediction of Tumour Mutational Burden / Användning av statistisk inlärning för estimering av mutationsbörda

Hargell, Joanna January 2021 (has links)
Tumour Mutational Burden is a promising biomarker to predict response to immunotherapy. In this thesis, statistical methods of supervised learning were used to predict TMB: GLM, Decision Trees and SVM. Predictions were based on data from targeted DNA sequencing, using variants found in the exonic, intronic, UTR and intergenic regions of the human DNA. This project was of an exploratory nature, performed in a pan-cancer setting. Both regression and classification were considered. The purpose was to investigate whether variants found in these regions of the DNA sequence are useful when predicting TMB. Poisson regression and Negative binomial regression were used within the framework of GLM. The results indicated deficiencies in the model assumptions and that the use of GLM for the application is questionable. The single regression tree did not yield satisfactory prediction accuracy. However, performance was improved by using variance reducing methods such as bagging and random forests. The use of boosted regression trees did not yield any significant improvement in prediction accuracy. In the classification setting, binary as well as multiple classes were considered. The distinction between classes was based on commonly used thresholds in clinical care to achieve immunotherapy. SVM and classification trees yielded high prediction accuracy for the binary case: a misclassification rate of 0.0242 and 0 respectively for the independent test set. In the multiple classification setting, bagging and random forests were implemented, yet, did not improve performance over the single classification tree. SVM produced a misclassification rate of 0.103, and the corresponding number for the single classification tree was 0.109. It was concluded that SVM and Decision trees are suitable methods for predicting TMB based on targeted gene panels. However, to obtain reliable predictions, there is a need to move from a pan-cancer setting to a diagnosis-based setting. Furthermore, parameters affecting TMB, like pre-analytical factors need to be included in the statistical analysis. / Denna uppsats undersöker tre metoder inom statistisk inlärning: GLM, Decision Trees och SVM, med avsikt att förutsäga mutationsbörda, TMB, för cancerpatienter. Metoderna har applicerats både inom regression och klassificering. Förutsägelser gjordes baserat på data från panel-baserad DNA-sekvensering som innehåller varianter från kodande, introniska UTR och intergeniska regioner av mänskligt DNA. Projektet ämnar att undersöka om varianter från dessa regioner av DNA-sekvensen kan vara användbara för att förutsäga mutationsbördan för en patient. Poisson-regression och Negativ Binomial-regression undersöktes inom GLM. Resultaten indikerade på brister i modellerna och att GLM inte är lämplig för denna tillämpning. Regressionsträden gav inte tillräckligt noggranna förutsägelser, men implementering av bagging och random forests förbättrade modellernas prestanda. Boosting förbättrade inte resultaten. Inom klassificering användes både binära klasser och multipla klasser. Avgränsningen mellan klasser baserades på kända gränser för TMB inom vården för att få immunoterapi. SVM och decision trees gav god prestanda för binär klassificering, med ett klassificeringsfel på 0.024 för SVM och 0 för decision trees. Bagging och random forests implementerades för det multipla fallet inom decision trees, men förbättrade inte prestandan. För multipla klasser gav SVM ett klassificeringnsfel på 0.103 och decision trees 0.109. Både SVM och decision trees visade sig vara lämpliga metoder för för att förutse värdet på TMB. Däremot, för att förutsägelserna ska vara tillförlitliga finns det ett behov av att göra denna typ av analys för varje enskild cancerdiagnos. Dessutom finns det ett behov av att inkludera parametrar från den bioinformatiska processen i den statistiska analysen.
124

廣義線性模式下處理比較之最適設計 / Optimal Designs for Treatment Comparisons under Generalized Linear Models

何漢葳, Ho, Han Wei Unknown Date (has links)
本研究旨在建立廣義線性模式下之D-與A-最適設計(optimal designs),並依不同處理結構(treatment structure)分成完全隨機設計(completely randomized design, CRD)與隨機集區設計(randomized block design, RBD)兩部分探討。 根據完全隨機設計所推導出之行列式的性質與理論結果,我們首先提出一個能快速大幅限縮尋找D-最適正合(exact)設計範圍的演算法。解析解的部分,則從將v個處理的變異數分為兩類出發,建立其D-最適近似(approximate)設計,並由此發現 (1) 各水準對應之樣本最適配置的上下界並非與水準間不同變異有關,而是與有多少處理之變異相同有關;(2) 即使是變異很大的處理,也必須分配觀察值,始能極大化行列式值。此意味著當v較大時,均分應不失為一有效率(efficient)的設計。至於正合設計,我們僅能得出某一處理特別大或特別小時的D-最適設計,並舉例說明求不出一般解的原因。 除此之外,我們亦求出當三個處理的變異數皆不同時之D-最適近似設計,以及v個處理皆不同時之A-最適近似設計。 至於最適隨機集區設計的建立,我們的重點放在v=2及v=3的情形,並假設集區樣本數(block size)為給定。當v=2時,各集區對應之行列式值不受其他集區的影響,故僅需依照完全隨機設計之所得,將各集區之行列式值分別最佳化,即可得出D-與A-最適設計。值得一提的是,若進一步假設各集區中兩處理變異的比例(>1)皆相同,且集區大小皆相同,則將各處理的「近似設計下最適總和」取最接近的整數,再均分給各集區,其結果未必為最適設計。當v=3時,即使只有2個集區,行列式也十分複雜,我們目前僅能證明當集區內各處理的變異相同時(不同集區之處理變異可不同),均分給定之集區樣本數為D-最適設計。當集區內各處理的變異不全相同時,我們僅能先以2個集區為例,類比完全隨機設計的性質,舉例猜想當兩集區中處理之變異大小順序相同時,各處理最適樣本配置的多寡亦與變異大小呈反比。由於本研究對處理與集區兩者之效應假設為可加,因此可合理假設集區中處理之變異大小順序相同。 / The problem of finding D- and A-optimal designs for the zero- and one-way elimination of heterogeneity under generalized linear models is considered. Since GLM designs rely on the values of parameters to be estimated, our strategy is to employ the locally optimal designs. For the zero-way elimination model, a theorem-based algorithm is proposed to search for the D-optimal exact designs. A formula for the construction of D-optimal approximate design when values of unknown parameters are split into two, with respective sizes m and v-m, are derived. Analytic solutions provided to the exact counterpart, however, are restricted to the cases when m=1 and m=v-1. An example is given to explain the problem involved. On the other hand, the upper bound and lower bound of the optimal number of replicates per treatment are proved dependent on m, rather than the unknown parameters. These bounds imply that designs having as equal number of replications for each treatment as possible are efficient in D-optimality. In addition, a D-optimal approximate design when values of unknown parameters are divided into three groups is also obtained. A closed-form expression for an A-optimal approximate design for comparing arbitrary v treatments is given. For the one-way elimination model, our focus is on studying the D-optimal designs for v=2 and v=3 with each block size given. The D- and A-optimality for v=2 can be achieved by assigning units proportional to square root of the ratio of two variances, which is larger than 1, to the treatment with smaller variance in each block separately. For v=3, the structure of determinant is much more complicated even for two blocks, and we can only show that, when treatment variances are the same within a block, design having equal number of replicates as possible in each block is a D-optimal block design. Some numerical evidences conjecture that a design satisfying the condition that the number of replicates are inversely proportional to the treatment variances per block is better in terms of D-optimality, as long as the ordering of treatment variances are the same across blocks, which is reasonable for an additive model as we assume.
125

APC模型估計方法的模擬與實證研究 / Simulation and empirical comparisons of estimation methods for the APC model

歐長潤, Ou, Chang Jun Unknown Date (has links)
20世紀以來,因為衛生醫療等因素的進步,各年齡死亡率均大幅下降,使得平均壽命大幅延長。壽命延長的效果近年逐漸顯現,其中的人口老化及其相關議題較受重視,因為人口老化已徹底改變國人的生活規劃,死亡率是否會繼續下降遂成為熱門的研究課題。描述死亡率變化的模型很多,近代發展的Age–Period–Cohort模型(簡稱APC模型),同時考慮年齡、年代與世代三個解釋變數,是近年廣受青睞的模型之一。這個模型將死亡率分成年齡、年代與世代三個效應,常用於流行病學領域,探討疾病、死亡率是否與年齡、年代、世代三者有關,但一般僅作為資料的大致描述,本研究將評估APC模型分析死亡率的可能性。 APC模型最大的問題在於不可甄別(Non–identification),即年齡、年代與世代三個變數存有共線性的問題,眾多的估計APC模型參數方法因應甄別問題而生。本研究預計比較七種較常見的APC模型估計方法,包括本質估計量(IE)、限制的廣義線性模型(cglim_age、cglim_period與cglim_cohort)、序列法ACP、序列法APC與自我迴歸模型(AR),以確定哪一種估計方法較為穩定,評估包括電腦模擬與實證分析兩部份。 電腦模擬部份比較各估計方法,衡量何者有較小的年齡別死亡率及APC參數的估計誤差;實證分析則考慮交叉分析,尋找用於死亡率預測的最佳估計方法。另外,也將以蒙地卡羅檢驗APC的模型假設,以確定這個模型的可行性。初步研究發現,以台灣死亡資料做為實證,本研究考量的估計方法在估計年齡別死亡率大致相當,只是在年齡–年代–世代這三者有不同的詮釋,且模型假設並非很符合。交叉分析上,Lee–Cater模型及其延展模型相對於APC模型有較小的預測誤差,整體顯示Lee–Cater 模型較佳。 / Since the beginning of the 20th century, the human beings have been experiencing longer life expectancy and lower mortality rates, which can attributed to constant improvements of factors such as medical technology, economics, and environment. The prolonging life expectancy has dramatically changed the life planning and life style after the retirement. The change would be even more severe if the mortality rates have larger reduction, and thus the study of mortality become popular in recent years. Many methods were proposed to describe the change of mortality rates. Among all methods, the Age-Period-Cohort model (APC) is a popular method used in epidemiology to discuss the relation between diseases, mortality rate, age, period and cohort. Non-identification (i.e. collinearity) is a serious problem for APC model, and many methods used in the procedure included estimation of parameter. In the first part of this paper, we use simulation compare and evaluate popular estimation methods of APC model, such as Intrinsic Estimator (IE), constrained of age, period and cohort in the Generalized Linear Model (c–glim), sequential method, and Auto-regression (AR) Model. The simulation methods considered include Monte-Carlo and cross validation. In addition, the morality data in Taiwan (Data sources: Ministry of Interior), are used to demonstrate the validity and model assumption of these methods. In the second part of this paper, we also apply similar research method to the Lee-Carter model and compare it to the APC model. We found Lee–Carter model have smaller prediction errors than APC models in the cross–validation.
126

L’arbre de régression multivariable et les modèles linéaires généralisés revisités : applications à l’étude de la diversité bêta et à l’estimation de la biomasse d’arbres tropicaux

Ouellette, Marie-Hélène 04 1900 (has links)
En écologie, dans le cadre par exemple d’études des services fournis par les écosystèmes, les modélisations descriptive, explicative et prédictive ont toutes trois leur place distincte. Certaines situations bien précises requièrent soit l’un soit l’autre de ces types de modélisation ; le bon choix s’impose afin de pouvoir faire du modèle un usage conforme aux objectifs de l’étude. Dans le cadre de ce travail, nous explorons dans un premier temps le pouvoir explicatif de l’arbre de régression multivariable (ARM). Cette méthode de modélisation est basée sur un algorithme récursif de bipartition et une méthode de rééchantillonage permettant l’élagage du modèle final, qui est un arbre, afin d’obtenir le modèle produisant les meilleures prédictions. Cette analyse asymétrique à deux tableaux permet l’obtention de groupes homogènes d’objets du tableau réponse, les divisions entre les groupes correspondant à des points de coupure des variables du tableau explicatif marquant les changements les plus abrupts de la réponse. Nous démontrons qu’afin de calculer le pouvoir explicatif de l’ARM, on doit définir un coefficient de détermination ajusté dans lequel les degrés de liberté du modèle sont estimés à l’aide d’un algorithme. Cette estimation du coefficient de détermination de la population est pratiquement non biaisée. Puisque l’ARM sous-tend des prémisses de discontinuité alors que l’analyse canonique de redondance (ACR) modélise des gradients linéaires continus, la comparaison de leur pouvoir explicatif respectif permet entre autres de distinguer quel type de patron la réponse suit en fonction des variables explicatives. La comparaison du pouvoir explicatif entre l’ACR et l’ARM a été motivée par l’utilisation extensive de l’ACR afin d’étudier la diversité bêta. Toujours dans une optique explicative, nous définissons une nouvelle procédure appelée l’arbre de régression multivariable en cascade (ARMC) qui permet de construire un modèle tout en imposant un ordre hiérarchique aux hypothèses à l’étude. Cette nouvelle procédure permet d’entreprendre l’étude de l’effet hiérarchisé de deux jeux de variables explicatives, principal et subordonné, puis de calculer leur pouvoir explicatif. L’interprétation du modèle final se fait comme dans une MANOVA hiérarchique. On peut trouver dans les résultats de cette analyse des informations supplémentaires quant aux liens qui existent entre la réponse et les variables explicatives, par exemple des interactions entres les deux jeux explicatifs qui n’étaient pas mises en évidence par l’analyse ARM usuelle. D’autre part, on étudie le pouvoir prédictif des modèles linéaires généralisés en modélisant la biomasse de différentes espèces d’arbre tropicaux en fonction de certaines de leurs mesures allométriques. Plus particulièrement, nous examinons la capacité des structures d’erreur gaussienne et gamma à fournir les prédictions les plus précises. Nous montrons que pour une espèce en particulier, le pouvoir prédictif d’un modèle faisant usage de la structure d’erreur gamma est supérieur. Cette étude s’insère dans un cadre pratique et se veut un exemple pour les gestionnaires voulant estimer précisément la capture du carbone par des plantations d’arbres tropicaux. Nos conclusions pourraient faire partie intégrante d’un programme de réduction des émissions de carbone par les changements d’utilisation des terres. / In ecology, in ecosystem services studies for example, descriptive, explanatory and predictive modelling all have relevance in different situations. Precise circumstances may require one or the other type of modelling; it is important to choose the method properly to insure that the final model fits the study’s goal. In this thesis, we first explore the explanatory power of the multivariate regression tree (MRT). This modelling technique is based on a recursive bipartitionning algorithm. The tree is fully grown by successive bipartitions and then it is pruned by resampling in order to reveal the tree providing the best predictions. This asymmetric analysis of two tables produces homogeneous groups in terms of the response that are constrained by splitting levels in the values of some of the most important explanatory variables. We show that to calculate the explanatory power of an MRT, an appropriate adjusted coefficient of determination must include an estimation of the degrees of freedom of the MRT model through an algorithm. This estimation of the population coefficient of determination is practically unbiased. Since MRT is based upon discontinuity premises whereas canonical redundancy analysis (RDA) models continuous linear gradients, the comparison of their explanatory powers enables one to distinguish between those two patterns of species distributions along the explanatory variables. The extensive use of RDA for the study of beta diversity motivated the comparison between its explanatory power and that of MRT. In an explanatory perspective again, we define a new procedure called a cascade of multivariate regression trees (CMRT). This procedure provides the possibility of computing an MRT model where an order is imposed to nested explanatory hypotheses. CMRT provides a framework to study the exclusive effect of a main and a subordinate set of explanatory variables by calculating their explanatory powers. The interpretation of the final model is done as in nested MANOVA. New information may arise from this analysis about the relationship between the response and the explanatory variables, for example interaction effects between the two explanatory data sets that were not evidenced by the usual MRT model. On the other hand, we study the predictive power of generalized linear models (GLM) to predict individual tropical tree biomass as a function of allometric shape variables. Particularly, we examine the capacity of gaussian and gamma error structures to provide the most precise predictions. We show that for a particular species, gamma error structure is superior in terms of predictive power. This study is part of a practical framework; it is meant to be used as a tool for managers who need to precisely estimate the amount of carbon recaptured by tropical tree plantations. Our conclusions could be integrated within a program of carbon emission reduction by land use changes.
127

Modelagem simultânea de média e dispersão e aplicações na pesquisa agronômica / Joint modeling of mean and dispersion and applications to agricultural research

Vieira, Afrânio Márcio Corrêa 10 February 2009 (has links)
Diversos delineamentos experimentais que são aplicados correntemente tomam como base experimentos agronômicos. Esses dados experimentais são, geralmente, analisados usando-se modelos que consideram uma variância residual constante (ou homogênea), como pressuposto inicial. Entretanto, esta pressuposição mostra-se relativamente forte quando se está diante de situações para as quais fatores ambientais ou externos exercem considerável influência nas medidas experimentais. Neste trabalho, são estudados modelos para a média e a variância, simultaneamente, com a variância estruturada de duas formas: (i) por meio de um preditor linear, que permite incorporar variáveis externas e fatores de ruído e (ii) por meio de efeitos aleatórios, que permitem acomodar tanto o efeito longitudinal quanto o efeito de superdispersão, no caso de medidas binárias repetidas no tempo. A classe de modelos lineares generalizados duplos (MLGD) foi aplicada a um estudo observacional que consistiu em medir a mortalidade de frangos de corte no fim da condição de espera pré-abate. Nesse problema, é forte a evidência de que alguns fatores influenciam a variabilidade, e consequentemente, diminuem a precisão das análises inferenciais. Outro problema agronômico relevante, associado à horticultura, são os experimentos de cultura de tecidos vegetais, em que o número de explantes que regeneram são contados. Como esse tipo de experimento apresenta um grande número de parâmetros a serem estimados, comparado ao tamanho da amostra, os modelos existente podem gerar estimativas questionáveis ou até levar a conclusões erroneas, uma vez esse que são baseados em grandes amostras para se fazer inferência estatística. Foi proposto um modelo linear generalizados duplo, para os dados de proporções, de uma perspectiva Bayesiana, visando a análise estatística sob pequenas amostras e a incorporação do conhecimento especialista no processo de estimação dos parâmetros. Um problema clínico, que envolve dados binários medidos repetidamente no tempo é apresentado e são propostos dois modelos que acomodam o efeito da superdispersão e a dependência longitudinal das medidas, utilizandos-se efeitos aleatórios. Foram obtidos resultados satisfatórios nos três problemas estudados. Os MLGD permitiram identificar os fatores associados à mortalidade das aves de corte, o que permitirá minimizar perdas e habilitar os processos de manejo, transporte e abate aos critérios de bem-estar animal e exigências da comunidade européia. O MLGD Bayesiano permitiu identificar o genótipo associado ao efeito de superdispersão, aumentando a precisão da inferência de seleção de variedades. Dois modelos combinados foram propostos logit-normal-Bernoulli-beta e o probit-normal-Bernoulli-beta, que acomodaram satisfatoriamente a superdispersão e a dependência longitudinal das medidas binárias. Esses resultados reforçam a importância de se modelar a média e a variância conjuntamente, o que aumenta a precisão na pesquisa agronômica, tanto em estudos experimentais quanto em estudos observacionais. / Several experimental designs that are currently applied are based on agricultural experiments. These experimental data are, usually, analised with statistical models that assume constant residual variance (or homogeneous), as basic assumption. However, this assumption shows hard to stand for, when environmental or external factors exert strong influence over the measurements. In this work, we study the joint modelling for the mean and the variance, the latter being structured on two ways: (i) through a linear predictor, which allows the incorporation of external variables and/or noise factors and (ii) by the use of random effects, that accommodate jointly the possible overdispersion effect and the dependence of longitudinal data in the case of binary measusurements taken over time. The class of double generalized linear models (DGLM) was applied to an observational study where the poultry mortality was measured in the preslaughter operations. With this situation, it can be observed that there is a strong influence from some environmental factors over the variability observed, and consequently, this reduces the precision of the inferential analysis. Another relevant agricultural problem, related to horticulture, is the tissue culture experiments, where the number of regenerated explants is counted. Usually, this kind of experiment use a large number of parameters to be estimated, when compared with the sample size. The current frequentist models are based on large samples for statistical inference and, under this experimental condition, can generate unreliable estimates or even lead to erroneous conclusions. A double generalized linear model was proposed to analyse proportion data, under the Bayesian perspective, which can be applied to small samples and can incorporate expert knowledge into the parameter estimation process. One clinical research, that measured binary data repeatedly through the time is presented and two models are proposed to fit the overdispersion effect and the dependence of longitudinal measurements, using random effects. It was obtained satisfactory results under these three problems studied. the DGLM allowed to identify factors associated with the poultry mortality, that will allow to minimize loss and improve the process, since the catching until lairage on slaughterhouse, agreeing with animal welfare criteria and the European community rules. The Bayesian DGLM allowed to identify the genotype associated with the overdispersion effect, increasing the precision on the inference about varieties selection. Two combined models were proposed, a logit-normal- Bernoulli-beta and a probit-normal-Bernoulli-beta, which have both addressed the overdispersion effect and the longitudinal dependence of the binary measurements. These results reinforce the importance to modelling mean and dispersion jointly, as a way to increase the precision of agricultural experimentation, be it on experimental studies or observational studies.
128

"Modelos lineares generalizados para análise de dados com medidas repetidas" / "Generalized linear models for repeated measures regression analysis"

Venezuela, Maria Kelly 04 July 2003 (has links)
Neste trabalho, apresentamos as equações de estimação generalizadas desenvolvidas por Liang e Zeger (1986), sob a ótica da teoria de funções de estimação apresentada por Godambe (1991). Essas equações de estimação são obtidas para os modelos lineares generalizados (MLGs) considerando medidas repetidas. Apresentamos também um processo iterativo para estimação dos parâmetros de regressão, assim como testes de hipóteses para esses parâmetros. Para a análise de resíduos, generalizamos para dados com medidas repetidas algumas técnicas de diagnóstico usuais em MLGs. O gráfico de probabilidade meio-normal com envelope simulado é uma proposta para avaliarmos a adequação do ajuste do modelo. Para a construção desse gráfico, simulamos respostas correlacionadas por meio de algoritmos que descrevemos neste trabalho. Por fim, realizamos aplicações a conjuntos de dados reais. / In this work, we consider the generalized estimation equations developed by Liang and Zeger (1986) focusing the theory of estimating functions presented by Godambe (1991). These estimation equations are an extension of generalized linear models (GLMs) to the analysis of repeated measurements. We present an iterative procedure to estimate the regression parameters as well as hypothesis testing of these parameters. For the residual analysis, we generalize to repeated measurements some diagnostic methods available for GLMs. The half-normal probability plot with a simulated envelope is useful for diagnosing model inadequacy and detecting outliers. To obtain this plot, we consider an algorithm for generating a set of nonnegatively correlated variables having a specified correlation structure. Finally, the theory is applied to real data sets.
129

Refinamentos assintóticos em modelos lineares generalizados heteroscedáticos / Asymptotic refinements in heteroskedastic generalized linear models

Barros, Fabiana Uchôa 07 March 2017 (has links)
Nesta tese, desenvolvemos refinamentos assintóticos em modelos lineares generalizados heteroscedásticos (Smyth, 1989). Inicialmente, obtemos a matriz de covariâncias de segunda ordem dos estimadores de máxima verossimilhança corrigidos pelos viés de primeira ordem. Com base na matriz obtida, sugerimos modificações na estatística de Wald. Posteriormente, derivamos os coeficientes do fator de correção tipo-Bartlett para a estatística do teste gradiente. Em seguida, obtemos o coeficiente de assimetria assintótico da distribuição dos estimadores de máxima verossimilhança dos parâmetros do modelo. Finalmente, exibimos o coeficiente de curtose assintótico da distribuição dos estimadores de máxima verossimilhança dos parâmetros do modelo. Analisamos os resultados obtidos através de estudos de simulação de Monte Carlo. / In this thesis, we have developed asymptotic refinements in heteroskedastic generalized linear models (Smyth, 1989). Initially, we obtain the second-order covariance matrix for the maximum likelihood estimators corrected by the bias of first-order. Based on the obtained matrix, we suggest changes in Wald statistics. In addition, we derive the coeficients of the Bartlett-type correction factor for the statistical gradient test. After, we get asymptotic skewness of the distribution of the maximum likelihood estimators of the model parameters. Finally, we show the asymptotic kurtosis coeficient of the distribution of the maximum likelihood estimators of the model parameters. Monte Carlo simulation studies are developed to evaluate the results obtained.
130

Aperfeiçoamento de métodos estatísticos em modelos de regressão da família exponencial / Further statistical methods in regression models of the exponential family

Cavalcanti, Alexsandro Bezerra 03 August 2009 (has links)
Neste trabalho, desenvolvemos três tópicos relacionados a modelos de regressão da família exponencial. No primeiro tópico, obtivemos a matriz de covariância assintótica de ordem $n^$, onde $n$ é o tamanho da amostra, dos estimadores de máxima verossimilhança corrigidos pelo viés de ordem $n^$ em modelos lineares generalizados, considerando o parâmetro de precisão conhecido. No segundo tópico calculamos o coeficiente de assimetria assintótico de ordem n^{-1/2} para a distribuição dos estimadores de máxima verossimilhança dos parâmetros que modelam a média e dos parâmetros de precisão e dispersão em modelos não-lineares da família exponencial, considerando o parâmetro de dispersão desconhecido, porém o mesmo para todas as observações. Finalmente, obtivemos fatores de correção tipo-Bartlett para o teste escore em modelos não-lineares da família exponencial, considerando covariáveis para modelar o parâmetro de dispersão. Avaliamos os resultados obtidos nos três tópicos desenvolvidos por meio de estudos de simulação de Monte Carlo / In this work, we develop three topics related to the exponential family nonlinear regression. First, we obtain the asymptotic covariance matrix of order $n^$, where $n$ is the sample size, for the maximum likelihood estimators corrected by the bias of order $n^$ in generalized linear models, considering the precision parameter known. Second, we calculate an asymptotic formula of order $n^{-1/2}$ for the skewness of the distribution of the maximum likelihood estimators of the mean parameters and of the precision and dispersion parameters in exponential family nonlinear models considering that the dispersion parameter is the same although unknown for all observations. Finally, we obtain Bartlett-type correction factors for the score test in exponential family nonlinear models assuming that the precision parameter is modelled by covariates. Monte Carlo simulation studies are developed to evaluate the results obtained in the three topics.

Page generated in 0.1219 seconds