Spelling suggestions: "subject:"count data"" "subject:"count mata""
51 |
Modelling dependence in actuarial science, with emphasis on credibility theory and copulasPurcaru, Oana 19 August 2005 (has links)
One basic problem in statistical sciences is to understand the relationships among multivariate outcomes. Although it remains an important tool and is widely applicable,
the regression analysis is limited by the basic setup that requires to identify one dimension of the outcomes as the primary measure of interest (the "dependent"
variable) and other dimensions as supporting this variable (the "explanatory" variables). There are situations where this relationship is not of primary interest.
For example, in actuarial sciences, one might be interested to see the dependence between annual claim numbers of a policyholder and its impact on the premium
or the dependence between the claim amounts and the expenses related to them. In such cases the normality hypothesis fails, thus Pearson's correlation or concepts based
on linearity are no longer the best ones to be used. Therefore, in order to quantify the dependence between non-normal outcomes one needs different statistical tools,
such as, for example, the dependence concepts and the copulas.
This thesis is devoted to modelling dependence with applications in actuarial sciences and is divided in two parts: the first one concerns dependence in frequency
credibility models and the second one dependence between continuous outcomes. In each part of the thesis we resort to different tools, the stochastic orderings
(which arise from the dependence concepts), and copulas, respectively.
During the last decade of the 20th century, the world of insurance was confronted with important developments of the a posteriori tarification, especially in the
field of credibility. This was dued to the easing of insurance markets in the European Union, which gave rise to an advanced segmentation. The first important
contribution is due to Dionne & Vanasse (1989), who proposed a credibility model which integrates a priori and a posteriori information on an individual basis.
These authors introduced a regression component in the Poisson counting model in order to use all available information in the estimation of accident frequency.
The unexplained heterogeneity was then modeled by the introduction of a latent variable representing the influence of hidden policy characteristics. The vast majority
of the papers appeared in the actuarial literature considered time-independent (or static) heterogeneous models. Noticeable exceptions include the pioneering papers
by Gerber & Jones (1975), Sundt (1988) and Pinquet, Guillén & Bolancé (2001, 2003). The allowance for an unknown underlying random parameter
that develops over time is justified since unobservable factors influencing the driving abilities are not constant. One might consider either shocks (induced by
events like divorces or nervous breakdown, for instance) or continuous modifications (e.g. due to learning effect).
In the first part we study the recently introduced models in the frequency credibility theory, which can be seen as models of time series
for count data, adapted to actuarial problems. More precisely we will examine the kind of dependence induced among annual claim numbers by the introduction of random
effects taking unexplained heterogeneity, when these random effects are static and time-dependent. We will also make precise the effect of reporting claims on the
a posteriori distribution of the random effect. This will be done by establishing some stochastic monotonicity property of the a posteriori distribution
with respect to the claims history. We end this part by considering different models for the random effects and computing the a posteriori corrections of the
premiums on basis of a real data set from a Spanish insurance company.
Whereas dependence concepts are very useful to describe the relationship between multivariate outcomes, in practice (think for instance to the computation of reinsurance
premiums) one need some statistical tool easy to implement, which incorporates the structure of the data. Such tool is the copula, which allows the construction of multivariate
distributions for given marginals. Because copulas characterize the dependence structure of random vectors once the effect of the marginals has been factored out,
identifying and fitting a copula to data is not an easy task. In practice, it is often preferable to restrict the search of an appropriate copula to some reasonable
family, like the archimedean one. Then, it is extremely useful to have simple graphical procedures to select the best fitting model among some competing alternatives
for the data at hand.
In the second part of the thesis we propose a new nonparametric estimator for the generator, that takes into account the particularity of the data, namely censoring and truncation.
This nonparametric estimation then serves as a benchmark to select an appropriate parametric archimedean copula. This selection procedure will be illustrated
on a real data set.
|
52 |
A Simulation Study On The Comparison Of Methods For The Analysis Of Longitudinal Count DataInan, Gul 01 July 2009 (has links) (PDF)
The longitudinal feature of measurements and counting process of responses motivate the regression models for longitudinal count data (LCD) to take into account the phenomenons such as within-subject association and overdispersion. One common problem in longitudinal studies is the missing data problem, which adds additional difficulties into the analysis. The missingness can be handled with missing data techniques. However, the amount of missingness in the data and the missingness mechanism that the data have affect the performance of missing data techniques. In this thesis, among the regression models for LCD, the Log-Log-Gamma marginalized multilevel model (Log-Log-Gamma MMM) and the random-intercept model are focused on. The performance of the models is compared via a simulation study under three missing data mechanisms (missing completely at random, missing at random conditional on observed data, and missing not random), two types of missingness percentage (10% and 20%), and four missing data techniques (complete case analysis, subject, occasion and conditional mean imputation). The simulation study shows that while the mean absolute error and mean square error values of Log-Log-Gamma MMM are larger in amount compared to the random-intercept model, both regression models yield parallel results. The simulation study results justify that the amount of missingness in the data and that the missingness mechanism that the data have, strictly influence the performance of missing data techniques under both regression models. Furthermore, while generally occasion mean imputation displays the worst performance, conditional mean imputation shows a superior performance over occasion and subject mean imputation and gives parallel results with complete case analysis.
|
53 |
疾病群聚檢測方法與檢定力比較 / Disease Cluster Detection Methods and Power Comparison王泰期, Wang, Tai-Ci Unknown Date (has links)
空間群聚分析應用於流行病學已行之有年,但國內這方面的研究仍較缺乏,尤其在找出哪些地區有較高疾病發生率的群聚偵測。本文針對台灣鄉鎮市資料的特性,提出一套合適的群聚檢測方法,這個方法使用兩階段的電腦模擬,實證上更容易使用;這個方法除了可找出最大顯著群聚外,也能夠偵測出多個群聚的分佈。本文使用電腦模擬比較本文的方法與目前使用較為廣泛的方法(包括Kulldorff(1995)的spatial scan statistic和Tango(2005)的flexible scan statistic),以型一誤差、型二誤差及錯誤率三種標準衡量方法的優劣。最後套用台灣癌症死亡率與健保就診次數資料,探討台灣癌症空間群聚與就診情形的變化。 / Spatial cluster analyses have applied in epidemiology for many years. In this topic there still are few researches in Taiwan, especially in detecting the areas which have higher disease intensity. In this paper, we proposed a new cluster detection method which is aimed at Taiwan counties’ data. This method which uses two-stage computer simulation procedures is useful in practice. This method can find the most likely cluster. Besides, it can find multiple clusters. We use computer simulations to compare our method with others (Kulldorff’s spatial scan statistic& Tango’s flexible scan statistic). Type-I error, Type-II error and error rate are criterions of measurement. At last, we use Taiwan cancer mortality data and all the people health insurance data to discuss Taiwan cancer spatial clusters and the change of diagnoses.
|
54 |
Quantitative models of establishments location choices : spatial effects and strategic interactions / Modèles quantitatifs de choix de localisation des établissements : effets spatiaux et interactions stratégiquesBuczkowska, Sabina 28 March 2017 (has links)
Dans un contexte de carence méthodologique, cette thèse vise à apporter un nouveau souffle aux modèles de choix de localisation jusqu’ici incapables d’appréhender de manière réaliste la complexité des processus décisionnels des établissements tels que leurs choix de localisation optimale. Les modèles de choix de localisation utilisent des données géoréférencées, pour lesquelles les ensembles de choix ont une composante spatiale explicite. Il est donc essentiel de comprendre comment représenter l’aspect spatial dans les modèles de choix de localisation. La décision finale d’un établissement semble être liée au paysage économique environnant. La quantification du lien entre les observations voisines implique une prise de décision sur la spécification de la matrice spatiale. Pourtant, la grande majorité des chercheurs appliquent la métrique euclidienne sans considérer des hypothèses sous-jacentes et ses alternatives. Cette démarche a été initialement proposée en raison de données et de puissance informatique limitées plutôt que de son universalité. Dans les régions comme la région parisienne, oû la congestion ainsi que les problèmes de barrières physiques non traversables apparaissent clairement, les distances purement basées sur la topographie peuvent ne pas être les plus appropriées pour l’étude de la localisation intra-urbaine. Il est possible d’acquérir des connaissances en reconsidérant et en mesurant la distance en fonction du problème analysé. Plutôt que d’enfermer les chercheurs dans une structure restrictive de la matrice de pondération, cette thèse propose une approche souple pour identifier la métrique de distance la plus susceptible de prendre en compte correctement les marchés voisins selon le secteur considéré. En plus de la distance euclidienne standard, six autres mesures sont testées : les temps de déplacement en voiture (pour les périodes de pointe et hors pointe) et en transport en commun, ainsi que les distances de réseau correspondantes.Par ailleurs, les décisions d’un établissement particulier sont interdépendantes des choix d’autres acteurs, ce qui rend les choix de localisation particulièrement intéressants et difficiles à analyser. Ces problèmes épineux posés par l’interdépendance des décisions ne peuvent généralement être négligés sans altérer l’authenticité du modèle de décision d’établissement. Les approches classiques de la sélection de localisation échouent en ne fournissant qu’un ensemble d’étapes systématiques pour la résolution de problèmes sans tenir compte des interactions stratégiques entre les établissements sur le marché. L’un des objectifs de la présente thèse est d’explorer comment adapter correctement les modèles de choix de localisation pour étudier les choix discrets d’établissement lorsqu’ils sont interdépendants.En outre, une entreprise peut ouvrir un certain nombre d’unités et servir le marché à partir de plusieurs localisations. Encore une fois, la théorie et les méthodes traditionnelles peuvent ne pas convenir aux situations dans lesquelles les établissements individuels, au lieu de se situer indépendamment les uns des autres, forment une grande organisation, telle qu’une chaîne confrontée à une concurrence féroce d’autres chaînes. Le modèle prend en compte non seulement les interactions intra-chaînes mais aussi inter-chaînes. Aussi, la nécessité d’indiquer une nette différence entre la population de jour et de nuit a été soulignée. La demande est représentée par les flux de piétons et de voitures, la foule de clients potentiels passant par les centres commerciaux, les stations de trains et de métros, les aéroports et les sites touristiques. L’Enquête Globale Transport 2010 (EGT 2010), entre autres, est utile pour atteindre cet objectif. / This thesis is breathing new life into the location choice models of establishments. The need for methodological advances in order to more realistically model the complexity of establishment decision-making processes, such as their optimal location choices, is the key motivation of this thesis. First, location choice models use geo-referenced data, for which choice sets have an explicit spatial component. It is thus critical to understand how to represent spatial aspect in location choice models. The final decision of an establishment seems to be related to the surrounding economic landscape. When accounting for the linkage between neighboring observations, the decision on the spatial weight matrix specification must be made. Yet, researchers overwhelmingly apply the Euclidean metric without realizing its underlying assumptions and its alternatives. This representation has been originally proposed due to scarce data and low computing power, rather than because of its universality. In areas, such as the Paris region, where high congestion or uncrossable physical barriers problems clearly arise, distances purely based on topography may not be the most appropriate for the study of intra-urban location. There are insights to be gained by mindfully reconsidering and measuring distance depending on a problem being analyzed. Rather than locking researchers into a restrictive structure of the weight matrix, this thesis proposes a flexible approach to intimate which distance metric is more likely to correctly account for the nearby markets depending on the sector considered. In addition to the standard Euclidean distance, six alternative metrics are tested: travel times by car (for the peak and off-peak periods) and by public transit, and the corresponding network distances. Second, what makes these location choices particularly interesting and challenging to analyze is that decisions of a particular establishment are interrelated with choices of other players.These thorny problems posed by the interdependence of decisions generally cannot be assumed away, without altering the authenticity of the model of establishment decision making. The conventional approaches to location selection fail by providing only a set of systematic steps for problem-solving without considering strategic interactions between the establishments in the market. One of the goals of the present thesis is to explore how to correctly adapt location choice models to study establishment discrete choices when they are interrelated.Finally, a firm can open a number of units and serve the market from multiple locations. Once again, traditional theory and methods may not be suitable to situations wherein individual establishments, instead of locating independently from each other, form a large orgnization, such as a chain facing a fierce competition from other chains. There is a necessity to incorporate interactions between units within the same and competing firms. In addition, the need to state a clear difference between the daytime and nighttime population has been emphasized. Demand is represented by pedestrian and car flows, the crowd of potential clients passing through the commercial centers, train and subways stations, airports, and highly touristic sites. The Global Survey of Transport (EGT 2010), among others, is of service to reach this objective. More realistically designed location choice models accounting for spatial spillovers, strategic interaction, and with a more appropriate definition of distance and demand can become a powerful and flexible tool to assist in finding a befitting site. An appropriately chosen location in turn can make an implicative difference for the newly-created business. The contents of this thesis provide some useful recommendations for transport analysts, city planners, plan developers, business owners, and shopping center investors.
|
55 |
[en] SOCCER CHAMPIONSHIP PROBABILITS ESTIMATION / [pt] ESTIMAÇÃO DE PROBABILIDADES EM CAMPEONATOS DE FUTEBOLEDUARDO LIMA CAMPOS 26 October 2001 (has links)
[pt] Neste trabalho, apresentamos uma metodologia para obter
probabilidades de classificação e rebaixamento de equipes
em campeonatos de futebol. A metodologia consiste
basicamente em quatro etapas. Na primeira etapa, ajustamos
modelos de séries temporais para dados de contagem a séries
de gols a favor e sofridos pelas equipes em partidas
sucessivas do campeonato, utilizando variáveis explicativas
para considerar o efeito do mando de campo, da participação
de determinados jogadores e de mudanças de técnico.
Alguns problemas referentes à construção de intervalos de
confiança e testes de hipóteses para os hiperparâmetros dos
modelos foram solucionados via bootstrap.
Na segunda etapa, obtivemos as distribuições de
probabilidade associadas aos resultados das partidas
futuras do campeonato, utilizando o Princípio da Máxima
Entropia para combinar as distribuições preditivas dos
modelos ajustados. Na terceira etapa, utilizamos as
distribuições dos resultados das partidas futuras para
simular cenários para o campeonato e, na quarta e última
etapa, estimamos as probabilidades de classificação e
rebaixamento das equipes, pela freqüência relativa da
ocorrência destes eventos em um grande número de cenários
gerados. A metodologia foi aplicada no Campeonato
Brasileiro/1999 e na Copa João Havelange/2000. / [en] In this thesis, we develop a methodology to obtain the
probabilities of qualifying and relegating of teams, in
soccer championships. The methodology consists of four steps.
In the first step, we fit time series models to the series
of number of goals scored in soccer matches. We account for
the effects of playing at home, soccer players and changes
of coaches, by introducing explanatory variables.
Confidence intervals and hipothesis tests are obtained by
bootstrap. In the second step, we get probability
distributions of the future matches results, by combining
preditive distributions of the fitted models via the
Maximum Entropy Principle. In the third step, we use the
distributions of the matches results to generate
simulation sceneries for the champhionship. In the forth
and last step, we finally estimate the probabilities of
qualifying and relegating of the teams, through the
relative frequencies of these events, in a great number of
sceneries generated. The empirical work was carried out
using data from Brazilian Champhionship/1999 and João
Havelange Cup/2000.
|
56 |
Modelos não lineares para dados de contagem longitudinais / Non linear models for count longitudinal dataAna Maria Souza de Araujo 16 February 2007 (has links)
Experimentos em que medidas são realizadas repetidamente na mesma unidade experimental são comuns na área agronômica. As técnicas estatísticas utilizadas para análise de dados desses experimentos são chamadas de análises de medidas repetidas, tendo como caso particular o estudo de dados longitudinais, em que uma mesma variável resposta é observada em várias ocasiões no tempo. Além disso, o comportamento longitudinal pode seguir um padrão não linear, o que ocorre com freqüência em estudos de crescimento. Também são comuns experimentos em que a variável resposta refere-se a contagem. Este trabalho abordou a modelagem de dados de contagem, obtidos a partir de experimentos com medidas repetidas ao longo do tempo, em que o comportamento longitudinal da variável resposta é não linear. A distribuição Poisson multivariada, com covariâncias iguais entre as medidas, foi utilizada de forma a considerar a dependência entre os componentes do vetor de observações de medidas repetidas em cada unidade experimental. O modelo proposto por Karlis e Meligkotsidou (2005) foi estendido para dados longitudinais provenientes de experimentos inteiramente casualizados. Modelos para experimentos em blocos casualizados, supondo-se efeitos fixos ou aleatórios para blocos, foram também propostos. A ocorrência de superdispersão foi considerada e modelada através da distribuição Poisson multivariada mista. A estimação dos parâmetros foi realizada através do método de máxima verossimilhança, via algoritmo EM. A metodologia proposta foi aplicada a dados simulados para cada uma das situações estudadas e a um conjunto de dados de um experimento em blocos casualizados em que foram observados o número de folhas de bromélias em seis instantes no tempo. O método mostrou-se eficiente na estimação dos parâmetros para o modelo considerando o delineamento completamente casualizado, inclusive na ocorrência de superdispersão, e delineamento em blocos casualizados com efeito fixo, sem superdispersão e efeito aleatório para blocos. No entanto, a estimação para o modelo que considera efeito fixo para blocos, na presença de superdispersão e para o parâmetro de variância do efeito aleatório para blocos precisa ser aprimorada. / Experiments in which measurements are taken in the same experimental unit are common in agriculture area. The statistical techniques used to analyse data from those experiments are called repeated measurement analysis, and longitudinal study, in which the response variable is observed along the time, is a particular case. The longitudinal behaviour can be non linear, occuring freq¨uently in growth studies. It is also common to have experiments in which the response variable refers to count data. This work approaches the modelling of count data, obtained from experiments with repeated measurements through time, in which the response variable longitudinal behaviour is non linear. The multivariate Poisson distribution, with equal covariances between measurements, was used to consider the dependence between the components of the repeated measurement observation vector in each experimental unit. The Karlis and Meligkotsidou (2005) proposal was extended to longitudinal data obtained from completely randomized. Models for randomized blocks experiments, assuming fixed or random effects for blocks, were also proposed. The occurence of overdispersion was considered and modelled through mixed multivariate Poisson distribution. The parameter estimation was done using maximum likelihood method, via EM algorithm. The methodology was applied to simulated data for all the cases studied and to a data set from a randomized block experiment in which the number of Bromeliads leaves were observed through six instants in time. The method was efficient to estimate the parameters for the completely randomized experiment, including the occurence of overdispersion, and for the randomized blocks experiments assuming fixed effect, with no overdispersion, and random effect for blocks. The estimation for the model that considers fixed effect for block, with overdispersion and for the variance parameters of the random effect for blocks must be improved.
|
57 |
Ensaio em economia da saúde: análise da demanda no mercado de saúde suplementar utilizando um modelo econométrico de dados de contagemHeck, Joaquim 31 August 2012 (has links)
Submitted by Joaquim Heck (jheck@embrioconsult.com.br) on 2012-09-26T17:54:24Z
No. of bitstreams: 1
PDF_MPFE_2010_Dissertacao_Joaquim_Heck_2012_08_31.pdf: 1254577 bytes, checksum: 286251ccfbb405513a08d5b0d097a172 (MD5) / Approved for entry into archive by Suzinei Teles Garcia Garcia (suzinei.garcia@fgv.br) on 2012-09-26T18:01:33Z (GMT) No. of bitstreams: 1
PDF_MPFE_2010_Dissertacao_Joaquim_Heck_2012_08_31.pdf: 1254577 bytes, checksum: 286251ccfbb405513a08d5b0d097a172 (MD5) / Made available in DSpace on 2012-09-26T18:09:46Z (GMT). No. of bitstreams: 1
PDF_MPFE_2010_Dissertacao_Joaquim_Heck_2012_08_31.pdf: 1254577 bytes, checksum: 286251ccfbb405513a08d5b0d097a172 (MD5)
Previous issue date: 2012-08-31 / This thesis discusses aspects of the demand for healthcare in the Brazilian private health sector. We use econometric analysis of count data models to establish which monetary and non-monetary parameters may influence the demand of healthcare. Finally, we verify if there is any informational asymmetry effect such as moral hazard in the determination of the demand for a case-study involving medical speciality visits. / Este ensaio apresenta um estudo sobre a demanda por serviço de saúde no mercado de saúde suplementar utilizando, através de uma análise econométrica, modelos de regressão de dados de contagem para verificar os fatores monetários e não monetários que podem influenciar a quantidade demandada por este serviço, e determinar se há risco moral na determinação desta demanda, no caso de um modelo de visitas médicas de especialidade.
|
58 |
Statistical models for an MTPL portfolio / Statistical models for an MTPL portfolioPirozhkova, Daria January 2017 (has links)
In this thesis, we consider several statistical techniques applicable to claim frequency models of an MTPL portfolio with a focus on overdispersion. The practical part of the work is focused on the application and comparison of the models on real data represented by an MTPL portfolio. The comparison is presented by the results of goodness-of-fit measures. Furthermore, the predictive power of selected models is tested for the given dataset, using the simulation method. Hence, this thesis provides a combination of the analysis of goodness-of-fit results and the predictive power of the models.
|
59 |
An Empirical Comparison of Static Count Panel Data Models: the Case of Vehicle Fires in Stockholm CountyPihl, Svante, Olivetti, Leonardo January 2020 (has links)
In this paper we study the occurrences of outdoor vehicle fires recorded by the Swedish Civil Contingencies Agency (MSB) for the period 1998-2019, and build static panel data models to predict future occurrences of fire in Stockholm County. Through comparing the performance of different models, we look at the effect of different distributional assumptions for the dependent variable on predictive performance. Our study concludes that treating the dependent variable as continuous does not hamper performance, with the exception of models meant to predict more uncommon occurrences of fire. Furthermore, we find that assuming that the dependent variable follows a Negative Binomial Distribution, rather than a Poisson Distribution, does not lead to substantial gains in performance, even in cases of overdispersion. Finally, we notice a slight increase in the number of vehicle fires shown in the data, and reflect on whether this could be related to the increased population size.
|
60 |
Properties of Hurdle Negative Binomial Models for Zero-Inflated and Overdispersed Count dataBhaktha, Nivedita January 2018 (has links)
No description available.
|
Page generated in 0.0746 seconds