Global ETD Search

81	Discrepancy-based algorithms for best-subset model selection Zhang, Tao 01 May 2013 (has links) The selection of a best-subset regression model from a candidate family is a common problem that arises in many analyses. In best-subset model selection, we consider all possible subsets of regressor variables; thus, numerous candidate models may need to be fit and compared. One of the main challenges of best-subset selection arises from the size of the candidate model family: specifically, the probability of selecting an inappropriate model generally increases as the size of the family increases. For this reason, it is usually difficult to select an optimal model when best-subset selection is attempted based on a moderate to large number of regressor variables. Model selection criteria are often constructed to estimate discrepancy measures used to assess the disparity between each fitted candidate model and the generating model. The Akaike information criterion (AIC) and the corrected AIC (AICc) are designed to estimate the expected Kullback-Leibler (K-L) discrepancy. For best-subset selection, both AIC and AICc are negatively biased, and the use of either criterion will lead to overfitted models. To correct for this bias, we introduce a criterion AICi, which has a penalty term evaluated from Monte Carlo simulation. A multistage model selection procedure AICaps, which utilizes AICi, is proposed for best-subset selection. In the framework of linear regression models, the Gauss discrepancy is another frequently applied measure of proximity between a fitted candidate model and the generating model. Mallows' conceptual predictive statistic (Cp) and the modified Cp (MCp) are designed to estimate the expected Gauss discrepancy. For best-subset selection, Cp and MCp exhibit negative estimation bias. To correct for this bias, we propose a criterion CPSi that again employs a penalty term evaluated from Monte Carlo simulation. We further devise a multistage procedure, CPSaps, which selectively utilizes CPSi. In this thesis, we consider best-subset selection in two different modeling frameworks: linear models and generalized linear models. Extensive simulation studies are compiled to compare the selection behavior of our methods and other traditional model selection criteria. We also apply our methods to a model selection problem in a study of bipolar disorder. Best-subset model selection Gauss discrepancy Generalized linear models Kullback-Leibler discrepancy Linear models Multistage procedure Biostatistics
82	Estimativa do custo da colheita mecanizada de cana-de-açúcar utilizando modelos de regressão / Estimated cost of mechanized harvesting of sugarcane using regression models Maekawa, Eduardo Shigueiti 22 August 2016 (has links) A colheita mecanizada é uma das mais significativas e onerosas operações do processo de produção de cana-de-açúcar, tornando-se importante o entendimento das relações que envolvem o seu custo. Atualmente, as metodologias para estimar o custo da colheita partem do conceito de custo fixo e variável. No entanto, considerando a complexidade desse processo, faz-se necessário avaliar métodos capazes de relacionar os parâmetros operacionais com o custo final. Neste contexto, a modelagem estatística por meio da regressão permite tratar tais relações e prever tendências. O objetivo deste trabalho foi desenvolver um modelo empírico para o cálculo do custo da colheita mecanizada de cana-de-açúcar. Desenvolveu-se um modelo linear generalizado (MLG) e um modelo linear generalizado misto (MLGM) ambos com distribuição gama, utilizando indicadores operacionais e dados de custo de 20 usinas do setor sucroalcooleiro. Por meio do MLGM, obteve-se uma aderência satisfatória quando comparado aos modelos MLG, nulo (média) e linear (supondo normalidade). Os indicadores que explicaram o custo foram: produtividade (t maq-1), consumo (l t-1), horímetro (h) e número de operadores por colhedora (nop). / The mechanized harvesting of sugarcane is one of the most significant and costly operations of the production process, thus it is important to understand the relationships involving its cost. Currently, methods to estimate these costs rise from the concept of fixed and variable cost. However, considering the complexity of the harvesting process, it is necessary to evaluate techniques to relate the operating parameters with the final cost. In this context, statistical modeling by regression allows to treat such relationship and predict trends. The objective of this study was to develop an empirical model to calculate the cost of mechanical harvesting of sugarcane. A generalized linear model (GLM) and a generalized linear mixed model (GLMM) both with gamma distribution was developed using operational indicators and cost data from 20 plants in the sugarcane industry. Through the GLMM, satisfactory adhesion was obtained when compared to the GLM, null model (average) and linear (assuming normality). The indicators that explained the cost were: productivity (t mach-1), consumption (l t-1), hourmeter (h) and number of operators per harvester (nop). Colhedora de cana Custo operacional Generalized linear mixed models Generalized linear models Modelos lineares generalizados Modelos lineares generalizados mistos Operational cost Sugarcane harvester
83	Métodos estatísticos aplicados ao teste de Salmonella/microssoma: modelos, seleção e suas implicações / Statistical methods applied for Salmonella/microsome test data: models, selection and their entailments Butturi-Gomes, Davi 03 December 2015 (has links) O teste de Salmonella/microssoma é um ensaio biológico amplamente utilizado para avaliar o potencial mutagênico de substâncias que podem colocar em risco a saúde humana e a qualidade ambiental. A variável resposta é constituída pela contagem do número de colônias revertentes em cada placa, entretanto geralmente há dois efeitos confundidos, o de toxicidade e o de mutagenicidade. Alguns modelos foram propostos para a análise dos dados desses experimentos, que nem sempre apresentam bons ajustes e não consideram explicitamente interações. Há, ainda, poucas plataformas computacionais disponíveis que integram todas essas propostas e forneçam critérios para a seleção adequada de um modelo. Além disso, geralmente é difícil comparar os efeitos de diferentes substâncias sobre as várias linhagens da bactéria, então medidas com interpretação biológica direta são necessárias. Neste trabalho, foram investigadas as propriedades dos preditores dos modelos tradicionais, bem como o comportamento das distribuições amostrais dos estimadores dos parâmetros desses modelos, na presença de diversos níveis de superdispersão. Também, foram realizados experimentos com as linhagens TA98 e TA100 da bactéria, expostas aos inseticidas, metabolizados e não-metabolizados, Fipronil e Tiametoxam, dois agroquímicos bastante utilizados no Brasil. Aos dados desses experimentos foram ajustados diversos modelos, tanto aqueles tradicionalmente utilizados, quanto novos modelos, alguns baseados na regressão de Skellam e outros com interações explícitas. Para tal, foi obtida uma nova classe de modelos chamada de modelos não-lineares vetoriais generalizados e foi desenvolvido um pacote computacional em linguagem R, intitulado \"ames\", para o ajuste, diagnóstico e seleção de modelos. Por fim, foram propostas medidas de interesse biológico, baseadas nos modelos selecionados, para avaliação de risco e do comprometimento do material genético e intervalos de confiança bootstrap paramétrico foram obtidos. Dentre os modelos tradicionais, aqueles cujas distribuições amostrais dos estimadores possuem melhor aproximação normal foram os de Bernstein, Breslow e Myers. Estes resultados forneceram um critério prático para a seleção de modelos, particularmente nas situações em que as medidas de AIC e de bondade de ajuste, os testes de razão de verossimilhanças e a análise de resíduos ou são pouco informativos ou simplesmente não podem ser aplicados. A partir dos modelos selecionados, pode-se concluir que a interação do fator de metabolização é significativa para a linhagem TA98 exposta ao Fipronil, tanto com relação aos efeitos tóxicos quanto aos efeitos mutagênicos; que o mecanismo de ação do Tiametoxam sobre a linhagem TA98 é completamente diferente quando o produto está metabolizado; e que, para a linhagem TA100, não houve efeito de metabolização considerando ambos os agroquímicos. Baseando-se nas medidas propostas, pode-se concluir que o Tiametoxam oferece os maiores riscos de contaminação residual, ainda que o Fipronil apresente os maiores índices de mutagenicidade. / The Salmonella/microsome test is a widely accepted biological assay used to evaluate the mutagenic potential of substances, which can compromise human health and environment quality. The response variable in such experiments is typically the total number of reverts per plate, which, in turn, is the result of the confounded effects of mutagenicity and toxicity. Despite of some statistical models have already been established in the literature, they do not always fit well and neither explicitly consider interaction terms. Besides, there is just a number of available software able to handle these different approaches, usually lacking of global performance and model selection criteria. Also, it is often a hard task to compare the effects of different chemicals over the several available strains to perform the assay, and, thus, direct measures of biological implications are required. In this work, the properties of the predictors in each traditional model were investigated, as well as the behavior of the sampling distributions of the parameter estimators of these models, in different levels of overdispersion. Also, experiments using TA98 and TA100 strains were perfomed, by exposition to two insecticides, namely Fipronil and Thiamethoxam, currently used in Brazil, each of them prior and after to a metabolization processes. Then, the traditional models, empirical regression models based on the Skellam distribution and also compound mechanistic-empirical models with explicit interaction terms were fitted to the data. In order to use a single fitting framework, a new class of models was presented, namely the vector generalized nonlinear models, and a R language package, entitled \"ames\", was developed for fitting, diagnosing and selection of models. Finally, some measures of biological interest were approached based on the selected models for the data, in the contexts of risk evaluation and of DNA damage cautioning. Confidence intervals for such measures were provided using bootstrap percentiles. Among the traditional models, the ones from Bernstein, Breslow and Myers were those whose sampling distributions presented the best normal approximations. These results provided a practical criterion for model selection, particularly in situations where measures as AIC and goodness of fit, likelihood ratio tests, and residual analysis are non informative or simply cannot be applied. From the final selected models, it was inferred that the interactions between the metabolization factor is significative for TA98 strain exposed to Fipronil, regarding both, mutagenic and toxic effects; that the dynamics between mutagenicity and toxicity are different when Thiamethoxam is metabolized compared to when it is not; and that there was no evidence to consider metabolization factor interactions for the TA100 strain data exposed to neither of the insecticides. By appling the referred measures of biological interest, it was concluded that the use of Thiamethoxam provides greater residual contamination risks and that Fipronil causes higher mutagenicity indices. Ames test Distribuição de Skellam Funções em R Generalized nonlinear models Modelos lineares generalizados vetoriais Modelos não-lineares generalizados R functions Skellam distribution Teste de Ames Vector generalized linear models
84	Melhoramento do resíduo de Wald em modelos lineares generalizados / Improvement of Wald residual in generalized linear models Urbano, Mariana Ragassi 18 December 2008 (has links) A teoria dos modelos lineares generalizados é muito utilizada na estatística, para a modelagem de observações provenientes da distribuição Normal, mas, principalmente, na modelagem de observações cuja distribuição pertença à família exponencial de distribuições. Alguns exemplos são as distribuições binomial, gama, normal inversa, dentre outras. Ajustado um modelo, para vericar a adequação do ajuste, são aplicadas técnicas de diagnósticos e feita uma análise de resíduos. As propriedades dos resíduos para modelos lineares generalizados não são muito conhecidas e resultados assintóticos são o único recurso. Este trabalho teve como objetivo estudar as propriedades assintóticas do resíduo de Wald, e realizar correções para que sua distribuição se aproxime de uma distribuição normal padrão. Uma aplicação das correções para o resíduo de Wald foi feita para cinco conjuntos de dados. Em dois conjuntos, a variável resposta apresentava-se na forma de contagem, e para a modelagem utilizou-se a distribuição de Poisson. Dois outros conjuntos são provenientes de delineamentos experimentais inteiramente casualizados, com variável resposta contínua e para a modelagem utilizou-se a distribuição normal, e para o último conjunto, o interesse era modelar a proporção, e utilizou-se a distribuição binomial. Um estudo de simulação foi conduzido, utilizando-se o método de Monte Carlo, e concluiu-se, que com as correções realizadas no resíduo de Wald, houve uma melhora signicativa em sua distribuição, sendo que a versão melhorada do resíduo tem distribuição que aproxima mais de uma distribuição normal padrão. / The theory of generalized linear models is very used in statistics, not only for modeling data normally distributed, but in the modeling of data whose distribution belongs to the exponential family of distributions. Some examples are binomial, gamma and inverse Gaussian distribution, among others. After tting a model in order to check the adequacy of tting, diagnostic techniques are used. The properties of residuals in generalized linear models are not well known, and asymptotic results are the only recourse. This work aims to study the asymptotic properties of Wald residual, and to obtain corrections to make the distribution of the modied residuals closer to standard normal. An application of the corrections for Wald residuals was done to ve datasets. In two datasets the response variables were counts, and to model, was used the Poisson distribution. Other two datasets are provided from a completely randomized design with a continuous response, and to model, was used the normal distribution, and, in the last dataset the interest was to model the proportion and the binomial distribution was used. A Monte Carlo simulation, was performed showing that the distribution of the corrected Wald residuals, is more close to the standard normal distribution. Distribuição normal Generalized linear models Método de Monte Carlo Modelos lineares generalizados Normal distribution Monte Carlo method.
85	Biodiversidade e modelagem estatística da comunidade de poliquetas de fundos inconsolidados do complexo recifal Sebastião Gomes, Banco dos Abrolhos (BA, Brasil) / Biodiversity and statistical modeling of polychaete community in soft bottom of Sebastião Gomes reef complex, Abrolhos Bank (BA, Brazil) Silva, Michele Quesada da 21 August 2013 (has links) Embora recifes de coral sejam hotspots de biodiversidade para corais e peixes, não se sabe se são para pequenos invertebrados marinhos. Este estudo visou verificar se o complexo recifal Sebastião Gomes é um hotspot de biodiversidade de poliquetas, bem como caracterizar a comunidade estrutural e funcional desses organismos que habitam o sedimento ao redor do recife. Através de modelos lineares generalizados (glm), tendo como variáveis preditoras características do sedimento e/ou posicionamento das estações de coleta ao redor do recife (transectos perpendiculares às faces sul, oeste, norte e leste), buscou-se compreender os padrões de: diversidade alfa; abundância total de poliquetas; abundância das espécies mais representativas; e abundância dos diferentes hábitos tróficos. Foram coletados 2399 indivíduos identificados em 116 espécies, indicando que Sebastião Gomes pode ser um hotspot. Todos descritores da comunidade foram maiores próximos ao recife, onde predominaram sedimentos grossos e carbonáticos. Já a posição ao redor do recife foi importante apenas para alguns descritores, tais como abundância total e dos hábitos tróficos carnívoros e detritívoros, todos maiores nos transectos norte e leste, expostos aos ventos. A abundância de poliquetas foi mais baixa em todo transecto sul, mais suscetível à ressuspensão de sedimento causada pelas frentes frias que atingem essa região / Although coral reefs are biodiversity hotspots for corals and fishes, it is not known whether they are also for small marine invertebrates. The present study aimed to verify if Sebastião Gomes reef complex is a polychaete biodiversity hotspot, as well to describe the structural and functional community of these organisms which inhabit sediments around the reef. Generalized linear models (glm) with sediment features and station position around the reef (perpendicular transects to the South, West, East and North faces) as predictor variables were used to understand the patterns of: alpha diversity; total polychaete abundance; most representative species abundance; and abundance of different trophic habits. 2399 individuals identified in 116 species were collected, indicating that Sebastião Gomes may be a hotspot. All community descriptors were higher near the reef, where coarse and carbonate sediments preponderate. However, the position around the reef was important only for some descriptors, such as total abundance and abundance of carnivorous and deposit feeders. All of them higher in the North and East transects, that are exposed to wind. The polychaete abundance was lower in the whole South transect, nevertheless it is more susceptible to sediment resuspension caused by cold fronts that reach the region Abrolhos Bank Banco dos Abrolhos generalized linear models hotspot hotspot modelos lineares generalizados Polychaeta Polychaeta recife Sebastião Gomes Sebastião Gomes reef
86	Modelos lineares generalizados mistos para dados longitudinais. / Generalized linear mixed models in longitudinal data. Costa, Silvano Cesar da 13 March 2003 (has links) Experimentos cujas variaveis respostas s~ ao proporcoes ou contagens, sao muito comuns nas diversas areas do conhecimento, principalmente na area agricola. Na analise desses experimentos, utiliza-se a teoria de modelos lineares generalizados, bastante difundida (McCullagh & Nelder, 1989; Demetrio, 2001), em que as respostas sao independentes. Caso a variancia estimada seja maior do que a esperada, estima-se o parametro de dispersao, incluindo-o no processo de estimaçao dos parametros. Quando a variavel resposta e observada ao longo do tempo, pode haver uma correlacao entre as observacoes e isso tem que ser levado em consideracao na estimacao dos parametros. Uma forma de se trabalhar essa correlacao e aplicando a metodologia de equacoes de estimacao generalizada (EEG), discutida por Liang & Zeger (1986), embora, neste caso, o interesse esteja nas estimativas dos efeitos fixos e a inclusao da matriz de correlacao de trabalho sirva para se obter um melhor ajuste. Uma outra alternativa e a inclusao, no preditor linear, de um efeito latente para captar variabilidades nao consideradas no modelo e que podem in uenciar nos resultados. No presente trabalho, usa-se uma forma combinada de efeito aleatorio e parametro de dispersao, incluidos conjuntamente na estimacao dos parametros. Essa metodologia e aplicada a um conjunto de dados obtidos de um experimento com camu-camu, com objetivo de se avaliarem quais os melhores metodos de enxertia e tipos de porta-enxertos que podem ser utilizados, atraves da proporcao de pegamentos da muda. Varios modelos sao ajustados, desde o modelo em parcelas subdivididas (supondo independencia), ate o modelo em que se considera o parametro de dispersao e efeito aleatorio conjuntamente. Ha evidencias de que o modelo em que se inclui o efeito aleatorio e o parametro de dispersao, conjuntamente, resultam em melhores estimativas dos parametros. Outro conjunto de dados longitudinais, com milho transgenico MON810, em que a variavel resposta e o numero de lagartas (Spodoptera frugiperda), e utilizado. Neste caso, devido ao excesso de respostas zero, emprega-se o modelo de regressao Poisson in acionado de zeros (ZIP), alem do modelo Poisson padrao, em que as observacoes sao consideradas independentes, e do modelo Poisson in acionado de zeros com efeito aleatorio. Os resultados mostram que o efeito aleatorio incluido no preditor foi nao significativo e, assim, o modelo adotado e o modelo de regressao Poisson in acionado de zeros. Os resultados foram obtidos usando-se os procedimentos NLMIXED, GENMOD e GPLOT do SAS - Statistical Analysis System, versao 8.2. / Experiments which response variables are proportions or counts are very common in several research areas, specially in the area of agriculture. The theory of generalized linear models, well difused (McCullagh & Nelder, 1989; Demetrio, 2001), is used for analyzing these experiments where the responses are independent. If the estimated variance is greater than the expected variance, the dispersion parameter is estimated including it on the parameter estimation process. When the response variable is observed over time a correlation among observations might occur and it should be taken into account in the parameter estimation. A way of dealing with this correlation is applying the methodology of generalized estimating equations (GEEs) discussed by Liang & Zeger (1986) although, in this case, the interest is on the estimates of the xed efect being the inclusion of a working correlation matrix useful to obtain more accurate estimates. Another alternative is the inclusion of a latent efect in the linear predictor to explain variabilities not considered in the model that might in uence the results. In this work the random efect and the dispersion parameter are combined and included together in the parameter estimation. Such methodology is applied to a data set obtained from an experiment realized with camu-camu to evaluate, through proportion of grafting well successful of seedling, which kind of grafting and understock are suitable to be used. Several models are fitted, since the split plot model (with independence assumption) up to the model where the dispersion parameter and the random efect are considered together. There is evidence that the model including the random efect and the dispersion parameter together, produce better estimates of the parameters. Another longitudinal data set used here comes from an experiment realized with the MON810 transgenic corn where the response variable is the number of caterpillars (Spodoptera frugiperda). In this case, due to the excessive number of zeros obtained, the zero in ated Poisson regression model (ZIP) is used in addition to the standard Poisson model, where observations are considered independent, and the zero in ated Poisson regression model with random efect. The results show that the random efect included in the linear predictor was not significant and, therefore, the adopted model is the zero in ated Poisson regression model. The results were obtained using the procedures NLMIXED, GENMOD and GPLOT available on SAS - Statistical Analysis System, version 8.2. análise de dados longitudinais binomial distribution distribuição binomial distribuição de poisson em algorithm generalized linear mixed models generalized linear models modelos lineares generalizados poisson distribution SAS (programa de computador)
87	Dynamiques spatio-temporelles d'espèces démersales clés du golfe du Lion : bénéfices potentiels d’aires marines protégées / Spatio-temporal dynamics of demersal exploited species in the Gulf of Lions : potential usefulness of Marine Protected Areas Morfin, Marie 18 October 2013 (has links) Les espèces démersales représentent 50% des captures des pêcheries françaises du golfe du Lion, dont la plupart sont pleinement exploitées, voir surexploitées depuis plusieurs décennies. Cette thèse évalue la pertinence d'aires marines protégées (AMPs) comme outil de gestion et de conservation de ces populations. Jusqu'à présent de telles zones ont été uniquement mises en place le long des côtes pour protéger des espèces très peu mobiles. Le problème est plus complexe pour les espèces vivant en haute mer car leur habitat est plus large et plus diffus. Pour ce faire, la distribution spatiale de 12 espèces démersales exploitées clés ont été étudiées de 1994 à 2010, à l'aide d'observations scientifiques et d'outils statistiques ad hoc. Une approche géostatistique a permis de détecter des structures d'auto-corrélation spatiale pour l'ensemble des espèces, et de produire des cartes de distributions annuelles de chaque espèce. Ces distributions sont apparues très stables sur les dix-sept années, mis à part un phénomène d'expansion/ rétraction avec le niveau d'abondance totale sur la région. Par ailleurs une approche par modèle linéaire généralisé a révélé des associations importantes de ces espèces à un habitat stable dans le temps. Ces résultats sont en accord avec la théorie du bassin de MacCall selon laquelle l'association d'une espèce à un habitat est densité-dépendant, et l'augmentation de la densité d'individus dans une zone serait à l'origine de la colonisation d'habitats sub-optimaux. Protéger l'habitat optimal d'une espèce permettrait alors de constituer un habitat «source», si la zone est judicieusement choisie ; en effet le report de l'effort de pêche à l'extérieur de l'AMP peut au contraire rendre cette mesure inefficace voir délétère. Par ailleurs les populations adultes occupaient généralement des zones plus concentrées et incluses dans l'aire de répartition des juvéniles. Ces zones communes d'habitats essentiels (reproduction et nourriceries) peuvent être potentiellement intéressantes à protéger dans un cadre monospécifique. L'hétérogénéité observée des répartitions d'une espèce à l'autre implique l'instauration de zones très clairsemées, et rend la gestion difficile dans une cadre pluri-spécifique. Une zone de taille raisonnable a tout de même été identifiée, représentant 20% de la population de chaque espèce et représentative de la diversité des habitats de cette région. / Demersal species represent 50% of French fisheries catches in theGulf of Lions, most of which are fully exploited, or overfished for decades. This thesis evaluates the relevance of marine protected areas (MPAs) as a tool for conservation and management of these populations. So far these areas have been implemented only along the coast to protect the very few mobile species. The problem is more complex for deep sea species because their habitat is broader and more diffuse. To do this, the spatial distribution of 12 key demersal species exploited were studied from 1994 to 2010, with scientific observations and ad hoc statistical tools. A geostatistical approach allowed to detect spatial autocorrelation structures for all species, and produce maps of annual distributions of each species. These distributions appeared very stable over 17 years, apart from a phenomenon of expansion/ contraction with the level of total abundance in the region. In addition, a generalized linear model approach revealed significant associations of these species to a temporally stable habitat. These results are consistent with MacCall basin theory, according which habitat suitability is a density-dependent thus the increase of individuals in an area make them colonize sub-optimal habitats. An optimal habitat under protection could thus be "source" habitat, if the area is carefully chosen. Indeed reporting the fishing effort outside the MPA can instead make this measure ineffective or deleterious. The adult population were generally in more concentrated areas and included in the spatial range of juveniles. These common areas of essential habitat (breeding and nursery) may be potentially interesting to protect a single species . However, the heterogeneity of distributions of a species to another involves the introduction of very sparse areas, making the management difficult. However an area of reasonable size has been identified, covering 20% of the population of each species and representative of the diversity of bottom habitats in the region. Espèces démersales exploitées Statistiques spatiales Habitat Modèle Linéaire Généralisé Golfe du Lion Aires Marines Protégées Exploited demersal species Spatial statistics Habitat Generalized Linear Models Gulf of Lions Marine Protected Areas
88	Modelos lineares generalizados e modelos de dispersão aplicados à modelagem de sinistros agrícolas / Generalized linear models and model dispersion applied to modelling agricultural claims Sousa, Keliny Martins de Melo 12 February 2010 (has links) O presente trabalho tem por objetivo utilizar a abordagem dos modelos lineares generalizados e os modelos de dispersão no contexto do seguro agrícola. Os modelos lineares generalizados (MLG\'s) constituem uma extensão dos modelos lineares de regressão múltipla introduzida por Nelder e Wedderburn (1972), que inclui modelos cuja variável resposta pertence à família exponencial de distribuições. O MLG é formado por um componente aleatório, que possui distribuição pertencente à família exponencial, um componente sistemático, conectados por uma função de ligação. Jorgensen (1997) estende a utilização dos MLG para uma classe mais ampla de modelos probabilísticos, denominados modelos de dispersão. A estimação dos parâmetros foi baseada no método da máxima verossimilhança, e também, em função da amostra ser relativamente pequena, optou-se pelo método de bootstrap não-paramétrico. As duas abordagens foram aplicadas a dois conjuntos de dados de sinistros de 15 municípios do estado do Rio Grande do Sul. Os resultados mostraram que a precipitação acumulada tem influência na ocorrência de sinistros. Entretanto, na modelagem do montante do sinistro não foi encontrada nenhuma variável significativa. Usando o método de bootstrap, foi encontrada influência das variáveis precipitação acumulada e a temperatura média no numero de sinistros / The main objective of this work is to use the generalized linear models and dispersion models in the agricultural insurance context. The Generalized Linear Model (GLM) are an extension of the multiple regression linear models presented by Nelder e Wedderburn (1972). This approach include situations in which the response variable can be included in exponencial the family. The GLM is composed of a randomized component, a sistematic component and the link functions. JÁrgensen (1997) extend the application of the GLM for a more general class of probability models, called dispersion models. Both approaches were applied in two insurance datasets for 15 citys in Rio Grande do Sul. The parameters estimation was based in the maximum likelihood method, in addition, because of the relatively small sample, the non-parametric Bootstrap method was used. This study show, using GLM, that only the accumulated rainfall was statistically significant . However, any of the covariates was significant when modelling the amount of claims. In the analysis using Bootstrap method the accumulated rainfall and average temperature were significant when modelling the number of insurance clains. Agricultural insurance Bootstrap Dispersion models Generalized linear models Modelos lineares generalizados Overdisper- sion Perdas agrícolas - Modelagem Seguro Agrícola Tweedie family Verossimilhança.
89	THE USE OF 3-D HIGHWAY DIFFERENTIAL GEOMETRY IN CRASH PREDICTION MODELING Amiridis, Kiriakos 01 January 2019 (has links) The objective of this research is to evaluate and introduce a new methodology regarding rural highway safety. Current practices rely on crash prediction models that utilize specific explanatory variables, whereas the depository of knowledge for past research is the Highway Safety Manual (HSM). Most of the prediction models in the HSM identify the effect of individual geometric elements on crash occurrence and consider their combination in a multiplicative manner, where each effect is multiplied with others to determine their combined influence. The concepts of 3-dimesnional (3-D) representation of the roadway surface have also been explored in the past aiming to model the highway structure and optimize the roadway alignment. The use of differential geometry on utilizing the 3-D roadway surface in order to understand how new metrics can be used to identify and express roadway geometric elements has been recently utilized and indicated that this may be a new approach in representing the combined effects of all geometry features into single variables. This research will further explore this potential and examine the possibility to utilize 3-D differential geometry in representing the roadway surface and utilize its associated metrics to consider the combined effect of roadway features on crashes. It is anticipated that a series of single metrics could be used that would combine horizontal and vertical alignment features and eventually predict roadway crashes in a more robust manner. It should be also noted that that the main purpose of this research is not to simply suggest predictive crash models, but to prove in a statistically concrete manner that 3-D metrics of differential geometry, e.g. Gaussian Curvature and Mean Curvature can assist in analyzing highway design and safety. Therefore, the value of this research is oriented towards the proof of concept of the link between 3-D geometry in highway design and safety. This thesis presents the steps and rationale of the procedure that is followed in order to complete the proposed research. Finally, the results of the suggested methodology are compared with the ones that would be derived from the, state-of-the-art, Interactive Highway Safety Design Model (IHSDM), which is essentially the software that is currently used and based on the findings of the HSM. 3-D Highway Geometric Design Differential Geometry Gaussian Curvature Mean Curvature Generalized Linear Models Geometry and Topology Statistical Methodology Statistical Models Transportation Engineering
90	Second-order least squares estimation in regression models with application to measurement error problems Abarin, Taraneh 21 January 2009 (has links) This thesis studies the Second-order Least Squares (SLS) estimation method in regression models with and without measurement error. Applications of the methodology in general quasi-likelihood and variance function models, censored models, and linear and generalized linear models are examined and strong consistency and asymptotic normality are established. To overcome the numerical difficulties of minimizing an objective function that involves multiple integrals, a simulation-based SLS estimator is used and its asymptotic properties are studied. Finite sample performances of the estimators in all of the studied models are investigated through simulation studies. / February 2009

Search results