• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 307
  • 92
  • 59
  • 51
  • 12
  • 10
  • 7
  • 6
  • 6
  • 5
  • 4
  • 3
  • 2
  • 2
  • 2
  • Tagged with
  • 644
  • 280
  • 161
  • 138
  • 137
  • 100
  • 72
  • 69
  • 67
  • 66
  • 66
  • 63
  • 57
  • 49
  • 48
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
461

Análise de diagnóstico em modelos semiparamétricos normais / Diagnostic analysis in semiparametric normal models

Gleyce Rocha Noda 18 April 2013 (has links)
Nesta dissertação apresentamos métodos de diagnóstico em modelos semiparamétricos sob erros normais, em especial os modelos semiparamétricos com uma variável explicativa não paramétrica, conhecidos como modelos lineares parciais. São utilizados splines cúbicos para o ajuste da variável resposta e são aplicadas funções de verossimilhança penalizadas para a obtenção dos estimadores de máxima verossimilhança com os respectivos erros padrão aproximados. São derivadas também as propriedades da matriz hat para esse tipo de modelo, com o objetivo de utilizá-la como ferramenta na análise de diagnóstico. Gráficos normais de probabilidade com envelope gerado também foram adaptados para avaliar a adequabilidade do modelo. Finalmente, são apresentados dois exemplos ilustrativos em que os ajustes são comparados com modelos lineares normais usuais, tanto no contexto do modelo aditivo normal simples como no contexto do modelo linear parcial. / In this master dissertation we present diagnostic methods in semiparametric models under normal errors, specially in semiparametric models with one nonparametric explanatory variable, also known as partial linear model. We use cubic splines for the nonparametric fitting, and penalized likelihood functions are applied for obtaining maximum likelihood estimators with their respective approximate standard errors. The properties of the hat matrix are also derived for this kind of model, aiming to use it as a tool for diagnostic analysis. Normal probability plots with simulated envelope graphs were also adapted to evaluate the model suitability. Finally, two illustrative examples are presented, in which the fits are compared with usual normal linear models, such as simple normal additive and partially linear models.
462

Modelos semiparamétricos com resposta binomial negativa / Semiparametric models with negative binomial response

Fabio Hideto Oki 14 May 2015 (has links)
O objetivo principal deste trabalho é discutir estimação e diagnóstico em modelos semiparamétricos com resposta binomial negativa, mais especificamente, modelos de regressão com resposta binomial negativa em que uma das variáveis explicativas contínuas é modelada de forma não paramétrica. Iniciamos o trabalho com um exemplo ilustrativo e fazemos uma breve revisão dos modelos paramétricos com resposta binomial negativa. Em seguida, introduzimos os modelos semiparamétricos com resposta binomial negativa e discutimos alguns aspectos de estimação, inferência e seleção de modelos. Dedicamos um capítulo a procedimentos de diagnóstico, tais como desenvolvimento de medidas de alavanca e de influência sob os aspectos de deleção de pontos e influência local, além de abordar a análise de resíduos. Reanalizamos o exemplo ilustrativo sob o enfoque semiparamétrico e apresentamos algumas conclusões. / The aim of this work is to discuss some aspects on estimation and diagnostics in negative binomial regression models which an explanatory continuous variable is modeled nonparametrically. First, an illustrative example is presented and analyzed under parametric negative binomial regression models. The proposed models are then introduced and some aspects on estimations, inference and model selection are presented. Particular emphasis is given on the development of diagnostic procedures, such as leverage measures, Cook distances, local influence approach and residuals. The motivated example is reanalyzed under the semiparametric viewpoint and some conclusions are given.
463

Avaliação da espirometria de gestantes expostas à poluição atmosférica da Região Metropolitana de São Paulo / Spirometric evaluation of pregnant women exposed to air pollution in the metropolitan region of São Paulo

Luciana Duzolina Manfré Pastro 11 March 2015 (has links)
Introdução: A poluição do ar pode levar a alterações no sistema respiratório, especialmente entre certos grupos, como as gestantes, que são mais vulneráveis aos efeitos de poluentes atmosféricos. A gravidez é um período que envolve alterações funcionais e anatômicas no corpo da mulher, incluindo alterações na função pulmonar, o que pode ser avaliado por espirometria, um método simples, barato e eficaz. Objetivos: Os objetivos deste estudo foram a usar a espirometria para avaliar a função pulmonar de mulheres no primeiro trimestre (T1) e no terceiro trimestre (T3) de gravidez e analisar a influência da exposição a poluição do ar sobre os parâmetros espirométricos. Metodologia: O estudo foi realizado no Ambulatório de Obstetrícia do Hospital das Clínicas da Faculdade de Medicina da Universidade de São Paulo (HCFMUSP) entre Maio de 2011 e Agosto de 2013. Foram aplicados os seguintes critérios de inclusão: gestação única, idade gestacional no dia da primeira espirometria inferior a 13,86 semanas, ausência de doenças maternas préexistentes, preparação adequada para o teste de espirometria e amostradores passivos individuais (APIs) adequados para análise. Os critérios de exclusão foram a mudança de endereço, o aborto, o teste de espirometria inadequado e desistência de participar do projeto. Os APIs contendo dois filtros de celulose embebidos com solução absorvente para capturar os níveis de NO2 e outros dois filtros embebidos com solução de índigo blue para medir os níveis de O3 foram entregues às gestantes cerca de 12 dias antes da realização do teste de espirometria. Dados do relatório anual da Companhia Ambiental do Estado de São Paulo (CETESB) para o mesmo período dos amostradores passivos, foram utilizados. Para a espirometria, um espirômetro Koko foi utilizado, sendo considerada as duas melhores curvas para avaliar os resultados da função pulmonar. Análise estatística: Foi utilizado o teste Mann-Whitney para grupos independentes e Wilcoxon para os dependentes. Devido à pequena variação na exposição à poluição, a exposição no primeiro trimestre (Q1) e quarto trimestre (Q4) foram comparados para cada poluente em T1 e T3 através da análise nãoparamétrica para medidas repetidas. Resultados: Houve uma redução estatisticamente significativa dos valores absolutos (T1: 3,690 L; T3: 3,475 L) e preditos (T1: 101%, T3: 97,5%) da capacidade vital forçada (CVF), p < 0,0001. E uma redução estatisticamente significativa dos valores absolutos (T1: 3.080 L; T3: 2.950 L) e preditos (T1: 99%, T3: 96%) do volume expiratório forçado no primeiro segundo do procedimento (FEV1), p < 0,0001. A exposição à poluição foi semelhante em ambos os trimestres, exceto para a exposição de NO2 no API, o que foi inferior em T3 (p = 0,001). Independentemente do trimestre (T1 ou T3), o grupo de mulheres do Q4 (T1: 97,5%; T3: 98,5%) teve NO2 estatisticamente maior dos valores do fluxo expiratório forçado de 25 a 75% do procedimento (FEF25-75%) do que o grupo em Q1 (T1: 80%; T3: 92%). O grupo de mulheres do Q1 do NO2 teve um aumento significativo neste parâmetro de T1 a T3 (p = 0,042). Em termos de valores absolutos da relação FEV1/FVC, o grupo de mulheres no Q1 mostrou um aumento estatisticamente significativo neste parâmetro de T1 (0,810 L) para T3 (0,840 L) (p = 0,026). Em T3, os valores absolutos e relativos da CVF foram estatisticamente maiores para o grupo de gestantes do Q4 (3,535 L; 100,5%) de NO2 do que para o grupo de gestantes do Q1 (3,345 L; 92%). O grupo de gestantes do Q4 de O3 mostrou o VEF1 estatisticamente maior eno T1 (102,5%) do que no T3 (95,5%) (p < 0,001). Independentemente do trimestre, o grupo de gestantes do Q4 do MP10, teve valores absolutos da CVF (T1: 3.520 L; T3: 3,265 L) e os valores de VEF1 (T1: 2,915 L; T32.840 L) foram estatisticamente menor do que para o grupo em Q1 (FVC - T1: 3.780 L; T3: 3.580 L) (FEV1 - T1: 3.180 L; T3: 3.065 L) p = 0,040; p = 0,035, respectivamente. Valores absolutos e preditos do pico de fluxo expiratório (PEF) em T1 das mulheres do Q4 (5,995 L; 80%) do MP10 foram estatisticamente menores do que para as mulheres no Q1 (6,675 L; 85%) (p = 0,006; p = 0,041 , respectivamente). Mulheres grávidas em Q4 (0,835 L) do MP10 tiveram os valores da relação VEF1/CVF estatisticamente menores valores do que as mulheres do Q1 (0,850 L) (p = 0,029). Conclusão: Exposição a NO2 e O3 foi associada com o aumento de alguns dos parâmetros de espirometria, indicando a presença de uma possível função de defesa pulmonar ou mecanismo compensatório em mulheres grávidas, quando expostos a esses poluentes. O MP10 foi associado com a redução de alguns parâmetros de espirometria durante a gravidez, indicando os efeitos danosos do poluente para a função do pulmão de mulheres grávidas / Introduction - Air pollution can lead to alterations to the respiratory system, particularly among certain groups, such as pregnant women, which are more vulnerable to the effects of air pollutants. Pregnancy is a period involving functional and anatomical changes in a woman\'s body. These changes include pulmonary function, which can be assessed by spirometry, a simple, inexpensive and effective method. Objectives - The aims of this study were to use spirometry to evaluate pulmonary function in pregnant women in the first trimester (T1) and the third trimester (T3) of pregnancy and to analyze the influence of air pollution exposure on the spirometry parameters. Methodology - This study was carried out at the Obstetrics Clinic of Hospital das Clínicas da Faculdade de Medicina da Universidade de São Paulo (HCFMUSP) between May 2011 and August 2013. The following inclusion criteria were applied: singleton pregnancy, 13.86 weeks gestational age at the first spirometry, no preexisting maternal diseases, adequate preparation for the spirometry test and individual passive samplers (IPAs) suitable for analysis. The exclusion criteria were change of address, abortion, inadequate spirometry testing and withdrawal from the project. The exposure to pollutants prior to the spirometry tests was assessed in T1 and T3. The passive samplers containing two cellulose filters soaked with absorbent solution to capture NO2 levels and two other filters soaked with indigo solution to measure O3 levels were provided to the pregnant women roughly 12 days prior to the spirometry test. Data from the environmental Company of São Paulo State (CETESB) annual report for the same period the passive samplers were used. For the spirometry, a Koko spirometer was used, taking the two best curves to assess lung function results. Statistical Analysis - We used Mann- Whitney tests for independent groups and Wilcoxon for the dependent ones. Due to the small variation in the exposure to pollution, exposure in the first quarter (Q1) and fourth quarter (Q4) were compared for each pollutant in T1 and T3 through non-parametrical analysis for repeated measurements. Results - It has been noted a significant statistical reduction of absolute and predicted forced vital capacity (FVC) values (3.690 L; 3.475 L) and forced expiratory volume during the first second of the procedure (FEV1) (3.080 L; 2.950 L) from T1 to T3. Exposure to pollution was similar in both trimesters, except for exposure to NO2 in the passive sampler, which was lower in T3 (p = 0.001). Regardless of the trimester (T1 or T3), the group of pregnant women in Q4 (97.5%; 98.5%) of NO2 had statistically higher predicted forced expiratory values of 25 to 75% of the procedure (FEF25-75%) than the group in Q1 (80%; 92%). Pregnant women in Q1 of NO2 had a significant increase in this parameter from T1 to T3 (p = 0.042). In terms of absolute FEV1/FVC values, pregnant women in Q1 showed a statistically significant rise in this parameter from T1 (0.810 L) to T3 (0,840 L) (p = 0,026). In T3, absolute and predicted FVC values were statistically higher for the group of pregnant women in Q4 (3.535 L; 100.5%) of NO2 than for the group of pregnant women in Q1 (3.345 L; 92%). Pregnant women in Q4 of O3 displayed statistically higher FEV1 values in T1 (102.5%) than in T3 (95.5%) (p < 0.001). Regardless of trimester, for the group of pregnant women in Q4 of MP10, absolute FVC (T1:3 .520 L; T3: 3.265 L) and FEV1 values (T1: 2.915 L; T3: 2.840 L) were statistically lower than for the group in Q1 (FVC - T1: 3.780 L; T3: 3.580 L) (FEV1 - T1: 3.180 L; T3: 3.065 L) p = 0.040; p = 0.035 respectively. Absolute and predicted peak expiratory flow (PEF) in T1 of pregnant women in Q4 (5.995 L; 80%) of MP10 were statistically lower than for the pregnant women in Q1 (6.675 L; 85%) (p = 0.006; p = 0.041, respectively). Pregnant women in Q4 (0.835 L) of MP10 displayed statistically lower absolute FEV1/FVC values than the pregnant women in Q1 (0.850 L) (p = 0.029). Conclusion - Exposure to NO2 and O3 was associated with the increase in some of the spirometry parameters, indicating the presence of a possible lung function defense or compensatory mechanism in pregnant women when exposed to those pollutants. The MP10 was associated with the reduction of some spirometry parameters during pregnancy, indicating the harmful effects of that pollutant to the lung function of pregnant women
464

Abordagem não-paramétrica para cálculo do tamanho da amostra com base em questionários ou escalas de avaliação na área de saúde / Non-parametric approach for calculation of sample size based on questionnaires or scales of assessment in the health care

Euro de Barros Couto Junior 01 October 2009 (has links)
Este texto sugere sobre como calcular um tamanho de amostra com base no uso de um instrumento de coleta de dados formado por itens categóricos. Os argumentos para esta sugestão estão embasados nas teorias da Combinatória e da Paraconsistência. O propósito é sugerir um procedimento de cálculo simples e prático para obter um tamanho de amostra aceitável para coletar informações, organizá-las e analisar dados de uma aplicação de um instrumento de coleta de dados médicos baseado, exclusivamente, em itens discretos (itens categóricos), ou seja, cada item do instrumento é considerado como uma variável não-paramétrica com um número finito de categorias. Na Área de Saúde, é muito comum usar instrumentos para levantamento com base nesse tipo de itens: protocolos clínicos, registros hospitalares, questionários, escalas e outras ferramentas para inquirição consideram uma sequência organizada de itens categóricos. Uma fórmula para o cálculo do tamanho da amostra foi proposta para tamanhos de população desconhecidos e um ajuste dessa fórmula foi proposto para populações de tamanho conhecido. Pôde-se verificar, com exemplos práticos, a possibilidade de uso de ambas as fórmulas, o que permitiu considerar a praticidade de uso nos casos em que se tem disponível pouca ou nenhuma informação sobre a população de onde a amostra será coletada. / This text suggests how to calculate a sample size based on the use of a data collection instrument consisting of categorical items. The arguments for this suggestion are based on theories of Combinatorics and Paraconsistency. The purpose is to suggest a practical and simple calculation procedure to obtain an acceptable sample size to collect information, organize it and analyze data from an application of an instrument for collecting medical data, based exclusively on discrete items (categorical items), i.e., each item of the instrument is considered as a non-parametric variable with finite number of categories. In the health care it is very common to use survey instruments on the basis of such items: clinical protocols, hospital registers, questionnaires, scales and other tools for hearing consider a sequence of items organized categorically. A formula for calculating the sample size was proposed for a population of unknown size, and an adjusted formula has been proposed for population of known size. It was seen, with practical examples, the possibility of using both formulas, allowing to consider the practicality of the use in cases that have little or no information available about the population from which the sample is collected
465

Obtenção dos níveis de significância para os testes de Kruskal-Wallis, Friedman e comparações múltiplas não-paramétricas. / Obtaining significance levels for Kruskal-Wallis, Friedman and nonparametric multiple comparisons tests.

Pontes, Antonio Carlos Fonseca 29 June 2000 (has links)
Uma das principais dificuldades encontradas pelos pesquisadores na utilização da Estatística Experimental Não-Paramétrica é a obtenção de resultados confiáveis. Os testes mais utilizados para os delineamentos com um fator de classificação simples inteiramente casualizados e blocos casualizados são o de Kruskal-Wallis e o de Friedman, respectivamente. As tabelas disponíveis para estes testes são pouco abrangentes, fazendo com que o pesquisador seja obrigado a recorrer a aproximações. Estas aproximações diferem dependendo do autor a ser consultado, podendo levar a resultados contraditórios. Além disso, tais tabelas não consideram empates, mesmo no caso de pequenas amostras. No caso de comparações múltiplas isto é mais evidente ainda, em especial quando ocorrem empates ou ainda, nos delineamentos inteiramente casualizados onde se tem número diferente de repetições entre tratamentos. Nota-se ainda que os softwares mais utilizados em geral recorrem a aproximações para fornecer os níveis de significância, além de não apresentarem resultados para as comparações múltiplas. Assim, o objetivo deste trabalho é apresentar um programa, em linguagem C, que realiza os testes de Kruskal-Wallis, de Friedman e de comparações múltiplas entre todos os tratamentos (bilateral) e entre os tratamentos e o controle (uni e bilateral) considerando todas as configurações sistemáticas de postos ou com 1.000.000 de configurações aleatórias, dependendo do número total de permutações possíveis. Dois níveis de significância são apresentados: o DW ou MaxDif , baseado na comparação com a diferença máxima dentro de cada configuração e o Geral, baseado na comparação com todas as diferenças em cada configuração. Os valores do nível de significância Geral assemelham-se aos fornecidos pela aproximação normal. Os resultados obtidos através da utilização do programa mostram, ainda, que os testes utilizando as permutações aleatórias podem ser bons substitutos nos casos em que o número de permutações sistemáticas é muito grande, já que os níveis de probabilidade são bastante próximos. / One of the most difficulties for the researchers in using Nonparametric Methods is to obtain reliable results. Kruskal-Wallis and Friedman tests are the most used for one-way layout and for randomized blocks, respectively. Tables available for these tests are not too wild, so the research must use approximate values. These approximations are different, depending on the author and the results can be not similar. Furthermore, these tables do not taking account tied observations, even in the case of small sample. For multiple comparisons, this is more evident, specially when tied observations occur or the number of replications is different. Many softwares like SAS, STATISTICA, S-Plus, MINITAB, etc., use approximation in order to get the significance levels and they do not present results for multiple comparisons. Thus, the aim of this work is to present a routine in C language that runs Kruskal-Wallis, Friedman and multiple comparisons among all treatments (bi-tailed) and between treatment and control (uni and bi-tailed), considering all the systematic configurations of the ranks or with more than 1,000,000 random ones, depending on the total of possible permutations. Two levels of significance are presented: DW or MaxDif, based on the comparison of the maximum difference within each configuration and the Geral, based on the comparison of all differences for each configuration. The Geral values of the significance level are very similar for the normal approximation. The obtaining results through this routine show that, the tests using random permutations can be nice substitutes for the case of the number of systematic permutations is too large, once the levels of probability are very near.
466

非固定權重因子之試題難易度模糊統計評估 / Fuzzy statistical evaluation with non-fixed weighted factors in the item difficult parameter

許家源 Unknown Date (has links)
在試題難易度分析上,傳統方式常以答對率或得分率的高低來認定試題的難易度。但答對率或得分率高低並不能真正表達應試者在答題時的難易感受程度。過去研究指出影響數學科試題難易度的主要因素有三:評量的數學內容、解題時的思考策略及解題所需的步驟數。 本論文針對影響數學科試題難易度的三個主要因素進行分析統計。以模糊統計的角度,提出非固定權重因子之二維模糊數。應用模糊數絕對距離的概念以摒除族群中異端值,最後針對不同族群進行難易指標分析與檢定。 / On the basis of difficulty analysis, the hit rate or the scoring rate are considered as the index of the difficulty of the questions traditionally. However, these rates can not represent how test takers feel when taking tests. According to the previous papers on this subject, they note three factors to affect the difficulty analysis on math questions the content of the evaluation, the strategies of question solving, and the required steps to solve a question. This research is focused on the three major factors which influence the difficulty of math examinations. With the angle of fuzzy statistics, a two-dimension fuzzy number will be presented with non-fixed weighted factors. By applying the absolute distance concept of fuzzy number, the extreme values are excluded. Consequently, the indices of difficulty from different groups can be tested and analyzed.
467

Adaptive methods for modelling, estimating and forecasting locally stationary processes

Van Bellegem, Sébastien 16 December 2003 (has links)
In time series analysis, most of the models are based on the assumption of covariance stationarity. However, many time series in the applied sciences show a time-varying second-order structure. That is, variance and covariance, or equivalently the spectral structure, are likely to change over time. Examples may be found in a growing number of fields, such as biomedical time series analysis, geophysics, telecommunications, or financial data analysis, to name but a few. In this thesis, we are concerned with the modelling of such nonstationary time series, and with the subsequent questions of how to estimate their second-order structure and how to forecast these processes. We focus on univariate, discrete-time processes with zero-mean arising, for example, when the global trend has been removed from the data. The first chapter presents a simple model for nonstationarity, where only the variance is time-varying. This model follows the approach of "local stationarity" introduced by [1]. We show that our model satisfactorily explains the nonstationary behaviour of several economic data sets, among which are the U.S. stock returns and exchange rates. This chapter is based on [5]. In the second chapter, we study more complex models, where not only the variance is evolutionary. A typical example of these models is given by time-varying ARMA(p,q) processes, which are ARMA(p,q) with time-varying coefficients. Our aim is to fit such semiparametric models to some nonstationary data. Our data-driven estimator is constructed from a minimisation of a penalised contrast function, where the contrast function is an approximation to the Gaussian likelihood of the model. The theoretical performance of the estimator is analysed via non asymptotic risk bounds for the quadratic risk. In our results, we do not assume that the observed data follow the semiparamatric structure, that is our results hold in the misspecified case. The third chapter introduces a fully nonparametric model for local nonstationarity. This model is a wavelet-based model of local stationarity which enlarges the class of models defined by Nason et al. [3]. A notion of time-varying "wavelet spectrum' is uniquely defined as a wavelet-type transform of the autocovariance function with respect to so-called "autocorrelation wavelets'. This leads to a natural representation of the autocovariance which is localised on scales. One particularly interesting subcase arises when this representation is sparse, meaning that the nonstationary autocovariance may be decomposed in the autocorrelation wavelet basis using few coefficients. We present a new test of sparsity for the wavelet spectrum in Chapter 4. It is based on a non-asymptotic result on the deviations of a functional of a periodogram. In this chapter, we also present another application of this result given by the pointwise adaptive estimation of the wavelet spectrum. Chapters 3 and 4 are based on [6] Computational aspects of the test of sparsity and of the pointwise adaptive estimator are considered in Chapter 5. We give a description of a full algorithm, and an application in biostatistics. In this chapter, we also derive a new test of covariance stationarity, applied to another case study in biostatistics. This chapter is based on [7]. Finally, Chapter 6 address the problem how to forecast the general nonstationary process introduced in Chapter 3. We present a new predictor and derive the prediction equations as a generalisation of the Yule-Walker equations. We propose an automatic computational procedure for choosing the parameters of the forecasting algorithm. Then we apply the prediction algorithm to a meteorological data set. This chapter is based on [2,4]. References [1] Dahlhaus, R. (1997). Fitting time series models to nonstationary processes. Ann. Statist., 25, 1-37, 1997. [2] Fryzlewicz, P., Van Bellegem, S. and von Sachs, R. (2003). Forecasting non-stationary time series by wavelet process modelling. Annals of the Institute of Statistical Mathematics. 55, 737-764. [3] Nason, G.P., von Sachs, R. and Kroisandt, G. (2000). Wavelet processes and adaptive estimation of evolutionary wavelet spectra. Journal of the Royal Statistical Society Series B. 62, 271-292. [4] Van Bellegem, S., Fryzlewicz, P. and von Sachs, R. (2003). A wavelet-based model for forecasting non-stationary processes. In J-P. Gazeau, R. Kerner, J-P. Antoine, S. Metens and J-Y. Thibon (Eds.). GROUP 24: Physical and Mathematical Aspects of Symmetries. Bristol: IOP Publishing (in press). [5] Van Bellegem, S. and von Sachs, R. (2003). Forecasting economic time series with unconditional time-varying variance. International Journal of Forecasting (in press). [6] Van Bellegem, S. and von Sachs, R. (2003). Locally adaptive estimation of sparse, evolutionary wavelet spectra (submitted). [7] Van Bellegem, S. and von Sachs, R. (2003). On adaptive estimation for locally stationary wavelet processes and its applications (submitted).
468

Modelling dependence in actuarial science, with emphasis on credibility theory and copulas

Purcaru, Oana 19 August 2005 (has links)
One basic problem in statistical sciences is to understand the relationships among multivariate outcomes. Although it remains an important tool and is widely applicable, the regression analysis is limited by the basic setup that requires to identify one dimension of the outcomes as the primary measure of interest (the "dependent" variable) and other dimensions as supporting this variable (the "explanatory" variables). There are situations where this relationship is not of primary interest. For example, in actuarial sciences, one might be interested to see the dependence between annual claim numbers of a policyholder and its impact on the premium or the dependence between the claim amounts and the expenses related to them. In such cases the normality hypothesis fails, thus Pearson's correlation or concepts based on linearity are no longer the best ones to be used. Therefore, in order to quantify the dependence between non-normal outcomes one needs different statistical tools, such as, for example, the dependence concepts and the copulas. This thesis is devoted to modelling dependence with applications in actuarial sciences and is divided in two parts: the first one concerns dependence in frequency credibility models and the second one dependence between continuous outcomes. In each part of the thesis we resort to different tools, the stochastic orderings (which arise from the dependence concepts), and copulas, respectively. During the last decade of the 20th century, the world of insurance was confronted with important developments of the a posteriori tarification, especially in the field of credibility. This was dued to the easing of insurance markets in the European Union, which gave rise to an advanced segmentation. The first important contribution is due to Dionne & Vanasse (1989), who proposed a credibility model which integrates a priori and a posteriori information on an individual basis. These authors introduced a regression component in the Poisson counting model in order to use all available information in the estimation of accident frequency. The unexplained heterogeneity was then modeled by the introduction of a latent variable representing the influence of hidden policy characteristics. The vast majority of the papers appeared in the actuarial literature considered time-independent (or static) heterogeneous models. Noticeable exceptions include the pioneering papers by Gerber & Jones (1975), Sundt (1988) and Pinquet, Guillén & Bolancé (2001, 2003). The allowance for an unknown underlying random parameter that develops over time is justified since unobservable factors influencing the driving abilities are not constant. One might consider either shocks (induced by events like divorces or nervous breakdown, for instance) or continuous modifications (e.g. due to learning effect). In the first part we study the recently introduced models in the frequency credibility theory, which can be seen as models of time series for count data, adapted to actuarial problems. More precisely we will examine the kind of dependence induced among annual claim numbers by the introduction of random effects taking unexplained heterogeneity, when these random effects are static and time-dependent. We will also make precise the effect of reporting claims on the a posteriori distribution of the random effect. This will be done by establishing some stochastic monotonicity property of the a posteriori distribution with respect to the claims history. We end this part by considering different models for the random effects and computing the a posteriori corrections of the premiums on basis of a real data set from a Spanish insurance company. Whereas dependence concepts are very useful to describe the relationship between multivariate outcomes, in practice (think for instance to the computation of reinsurance premiums) one need some statistical tool easy to implement, which incorporates the structure of the data. Such tool is the copula, which allows the construction of multivariate distributions for given marginals. Because copulas characterize the dependence structure of random vectors once the effect of the marginals has been factored out, identifying and fitting a copula to data is not an easy task. In practice, it is often preferable to restrict the search of an appropriate copula to some reasonable family, like the archimedean one. Then, it is extremely useful to have simple graphical procedures to select the best fitting model among some competing alternatives for the data at hand. In the second part of the thesis we propose a new nonparametric estimator for the generator, that takes into account the particularity of the data, namely censoring and truncation. This nonparametric estimation then serves as a benchmark to select an appropriate parametric archimedean copula. This selection procedure will be illustrated on a real data set.
469

Mean preservation in censored regression using preliminary nonparametric smoothing

Heuchenne, Cédric 18 August 2005 (has links)
In this thesis, we consider the problem of estimating the regression function in location-scale regression models. This model assumes that the random vector (X,Y) satisfies Y = m(X) + s(X)e, where m(.) is an unknown location function (e.g. conditional mean, median, truncated mean,...), s(.) is an unknown scale function, and e is independent of X. The response Y is subject to random right censoring, and the covariate X is completely observed. In the first part of the thesis, we assume that m(x) = E(Y|X=x) follows a polynomial model. A new estimation procedure for the unknown regression parameters is proposed, which extends the classical least squares procedure to censored data. The proposed method is inspired by the method of Buckley and James (1979), but is, unlike the latter method, a non-iterative procedure due to nonparametric preliminary estimation. The asymptotic normality of the estimators is established. Simulations are carried out for both methods and they show that the proposed estimators have usually smaller variance and smaller mean squared error than the Buckley-James estimators. For the second part, suppose that m(.)=E(Y|.) belongs to some parametric class of regression functions. A new estimation procedure for the true, unknown vector of parameters is proposed, that extends the classical least squares procedure for nonlinear regression to the case where the response is subject to censoring. The proposed technique uses new `synthetic' data points that are constructed by using a nonparametric relation between Y and X. The consistency and asymptotic normality of the proposed estimator are established, and the estimator is compared via simulations with an estimator proposed by Stute in 1999. In the third part, we study the nonparametric estimation of the regression function m(.). It is well known that the completely nonparametric estimator of the conditional distribution F(.|x) of Y given X=x suffers from inconsistency problems in the right tail (Beran, 1981), and hence the location function m(x) cannot be estimated consistently in a completely nonparametric way, whenever m(x) involves the right tail of F(.|x) (like e.g. for the conditional mean). We propose two alternative estimators of m(x), that do not share the above inconsistency problems. The idea is to make use of the assumed location-scale model, in order to improve the estimation of F(.|x), especially in the right tail. We obtain the asymptotic properties of the two proposed estimators of m(x). Simulations show that the proposed estimators outperform the completely nonparametric estimator in many cases.
470

Testing for spatial correlation and semiparametric spatial modeling of binary outcomes with application to aberrant crypt foci in colon carcinogenesis experiments

Apanasovich, Tatiyana Vladimirovna 01 November 2005 (has links)
In an experiment to understand colon carcinogenesis, all animals were exposed to a carcinogen while half the animals were also exposed to radiation. Spatially, we measured the existence of aberrant crypt foci (ACF), namely morphologically changed colonic crypts that are known to be precursors of colon cancer development. The biological question of interest is whether the locations of these ACFs are spatially correlated: if so, this indicates that damage to the colon due to carcinogens and radiation is localized. Statistically, the data take the form of binary outcomes (corresponding to the existence of an ACF) on a regular grid. We develop score??type methods based upon the Matern and conditionally autoregression (CAR) correlation models to test for the spatial correlation in such data, while allowing for nonstationarity. Because of a technical peculiarity of the score??type test, we also develop robust versions of the method. The methods are compared to a generalization of Moran??s test for continuous outcomes, and are shown via simulation to have the potential for increased power. When applied to our data, the methods indicate the existence of spatial correlation, and hence indicate localization of damage. Assuming that there are correlations in the locations of the ACF, the questions are how great are these correlations, and whether the correlation structures di?er when an animal is exposed to radiation. To understand the extent of the correlation, we cast the problem as a spatial binary regression, where binary responses arise from an underlying Gaussian latent process. We model these marginal probabilities of ACF semiparametrically, using ?xed-knot penalized regression splines and single-index models. We ?t the models using pairwise pseudolikelihood methods. Assuming that the underlying latent process is strongly mixing, known to be the case for many Gaussian processes, we prove asymptotic normality of the methods. The penalized regression splines have penalty parameters that must converge to zero asymptotically: we derive rates for these parameters that do and do not lead to an asymptotic bias, and we derive the optimal rate of convergence for them. Finally, we apply the methods to the data from our experiment.

Page generated in 0.0598 seconds