41 |
Analysis of Correlated Data with Measurement Error in Responses or CovariatesChen, Zhijian January 2010 (has links)
Correlated data frequently arise from epidemiological studies, especially familial
and longitudinal studies. Longitudinal design has been used by researchers to investigate the changes of certain characteristics over time at the individual level as well as how potential factors influence the changes. Familial studies are often designed to investigate the dependence of health conditions among family members. Various models have been developed for this type of multivariate data, and a wide variety
of estimation techniques have been proposed. However, data collected from observational
studies are often far from perfect, as measurement error may arise from different
sources such as defective measuring systems, diagnostic tests without gold references,
and self-reports. Under such scenarios only rough surrogate variables are measured. Measurement error in covariates in various regression models has been discussed extensively in the literature. It is well known that naive approaches ignoring covariate error often lead to inconsistent estimators for model parameters.
In this thesis, we develop inferential procedures for analyzing correlated data with
response measurement error. We consider three scenarios: (i) likelihood-based inferences for generalized linear mixed models when the continuous response is subject to nonlinear measurement errors; (ii) estimating equations methods for binary responses with misclassifications; and (iii) estimating equations methods for ordinal
responses when the response variable and categorical/ordinal covariates are subject
to misclassifications.
The first problem arises when the continuous response variable is difficult to measure.
When the true response is defined as the long-term average of measurements, a single measurement is considered as an error-contaminated surrogate. We focus on generalized linear mixed models with nonlinear response error and study the induced bias in naive estimates. We propose likelihood-based methods that can yield consistent and efficient estimators for both fixed-effects and variance parameters. Results of simulation studies and analysis of a data set from the Framingham Heart Study
are presented.
Marginal models have been widely used for correlated binary, categorical, and ordinal data. The regression parameters characterize the marginal mean of a single outcome, without conditioning on other outcomes or unobserved random effects. The generalized estimating equations (GEE) approach, introduced by Liang and Zeger (1986), only models the first two moments of the responses with associations being
treated as nuisance characteristics. For some clustered studies especially familial
studies, however, the association structure may be of scientific interest. With binary
data Prentice (1988) proposed additional estimating equations that allow one to
model pairwise correlations. We consider marginal models for correlated binary data
with misclassified responses. We develop “corrected” estimating equations approaches
that can yield consistent estimators for both mean and association parameters. The
idea is related to Nakamura (1990) that is originally developed for correcting bias
induced by additive covariate measurement error under generalized linear models. Our approaches can also handle correlated misclassifications rather than a simple
misclassification process as considered by Neuhaus (2002) for clustered binary data
under generalized linear mixed models. We extend our methods and further develop
marginal approaches for analysis of longitudinal ordinal data with misclassification in both responses and categorical covariates. Simulation studies show that our proposed methods perform very well under a variety of scenarios. Results from application of the proposed methods to real data are presented.
Measurement error can be coupled with many other features in the data, e.g., complex survey designs, that can complicate inferential procedures. We explore combining
survey weights and misclassification in ordinal covariates in logistic regression
analyses. We propose an approach that incorporates survey weights into estimating
equations to yield design-based unbiased estimators.
In the final part of the thesis we outline some directions for future work, such as
transition models and semiparametric models for longitudinal data with both incomplete
observations and measurement error. Missing data is another common feature in applications. Developing novel statistical techniques for dealing with both missing
data and measurement error can be beneficial.
|
42 |
Socioeconomic, environmental and personal correlates of asthma in a community population of men and women.Kydd, Robyn Marie 09 July 2010 (has links)
<p> Asthma is a multifactorial chronic disease that has shown a marked increase in prevalence over the past few decades, both in Canada and worldwide. Basic knowledge gaps remain about the pathways through which risk factors influence adult asthma. More adult women than men have asthma, and a growing body of research suggests that associations between certain risk factors and asthma may differ by sex. The aim of this thesis was to investigate the socioeconomic, environmental and personal correlates of asthma in men and women.</p>
<p> Data for this thesis were obtained from a cross-sectional study conducted in 2003 in the rural Canadian town of Humboldt, Saskatchewan. The survey response rate was 71% of the resident target population, with 1177 females and 913 males aged 18 to 79 participating in the study. Researchers collected objective data on atopy (skin prick test), and body mass index. Exposures and history of physician-diagnosed asthma in the past year (current asthma) and during the participants lifetime (ever asthma) were self-reported. Multivariable logistic regression models adjusted for age, atopy, and parental asthma history were used to evaluate associations of correlates with asthma. The model building process was based on a conceptual framework of three categories: socioeconomic variables, home and work environment, and personal factors. </p>
<p> The prevalence of asthma was higher in women than men (ever asthma: 10.2% of women versus 5.8% of men; current asthma: 6.2% of women versus 2.8% of men). The logistic regression models for ever asthma and current asthma showed several sex differences. The sequential addition of each category of socioeconomic, environmental, and personal variables contributed significantly to model fit in women, but not in men. Living in a mobile, attached or multiple-family home, household dampness, and overweight/obesity were strong risk factors for female asthma, while farm living, occupational grain dust exposure, and regular alcohol use emerged as protective factors. Male models revealed a strong significant association between household dampness and current asthma. A significant interaction between home type and age was found only in females. Women living in homes other than single-family detached dwellings were more likely to have asthma, an association that decreased in strength with increasing age. </p>
<p> These results suggest that several risk factors for adult asthma may be sex-specific, therefore emphasizing the importance of considering sex as a potential effect modifier in future adult asthma epidemiology studies.</p>
|
43 |
Derivative pricing based on time series models of default probabilitiesChang, Kai-hsiang 02 August 2006 (has links)
In recent years, people pay much attention to
derivative pricing subject to credit risk. In this paper, we proposed an autoregressive time series model of log odds ratios to price derivatives. Examples of the proposed model are given via the structural and reduced form approaches. Pricing formulae of the proposed time series models are derived for bonds and options. Furthermore, simulation studies are performed to confirm the accuracy of derived formulae.
|
44 |
Konsumtion av kosttillskott bland träningsaktiva : En kvantitativ undersökning om köns- och åldersskillnader och samband med träningsformStrandberg Keijser, Alina January 2015 (has links)
Att konsumera kosttillskott ökar i dagens samhälle. Exempel på kosttillskott är energigivande tillskott, prestationshöjande tillskott samt vitaminer och mineraler. En sammanställning av enkätundersökningar i Sverige visar att 61 % av männen och 41 % av kvinnorna konsumerar kosttillskott. Träningsverksamma upplever att produkterna är välgörande och ger positiva effekter på träningen, trots att riskerna med konsumtion av kosttillskott är omdebatterat och ett delvis obeforskat område. Bland träningsverksamma inom styrketräning förekommer även konsumtion av dopningspreparat. Dopning innebär att påverka eller förändra prestationer med olika substanser. Det finns olika slags dopningspreparat och anabola androgena steroider (AAS) är vanligast. Prevalensen av AAS har ökat under 2000-talet. I uppsatsen redovisas omfattningen av konsumtion av kosttillskott och dopningspreparat i Västmanlands län. Köns- och åldersskillnader samt samband mellan konsumtion och träningsform redovisas. Analyser har genomförts på enkätdata från Västmanlands Idrottsförbund, utifrån deras arbete med anti-dopning i Västmanland. Resultaten visar att det är vanligast att män i 17 – 30 års ålder konsumerar kosttillskott. Sannolikheten för att ha en hög konsumtion av kosttillskott ökar vid styrketräning. Socialkognitiv teori används föra att tolka den sociala aspekten på konsumtionen och ger en djupare förståelse för hur beteendet att konsumera kosttillskott och/eller dopningspreparat kan uppstå hos en individ. / Consuming dietary supplements is common in today's society. Examples of dietary supplements is energizing supplements, performance enhancing supplements and vitamins and minerals. A compilation of surveys in Sweden show that 61% of men and 41% of women consume dietary supplements. Athletes are experiencing that the products are beneficial and provides positive effects on the performance, even though there is a lack of knowledge about all the risks of consuming dietary supplement and a field partly un-researched. Consumption of doping substances also occur among athletes. Doping means to affect or change the performance with various substances. There are different types of doping substances, where anabolic androgenic steroids (AAS) are the most common. The prevalence of AAS has increased during the 2000s. The prevalence of the consumption of dietary supplements and doping in Västmanlands län are presented in the present study. Results revealed differences in gender and age and the relationship between consumption and exercise. Analyses have been performed on data from Västmanland Sports Federation, which through its anti-doping work conducted a survey in Västmanlands län. The results show that it is most common among men between the ages 17 – 30 to consume dietary supplements and to consume a few times a week or more often. The likelihood of consuming dietary supplements increases with strength training. Social cognitive theory is being used to interpret the social aspect of consumption and provides a deeper understanding for how a behavior can be developed in an individual.
|
45 |
Prediktion av matchresultat i engelska Premier LeaguePalmberg, Billy January 2015 (has links)
Att i förväg försöka förutsäga vilket lag som kommer vinna i en fotbollsmatch har nog de flesta försökt sig på någon gång. Att gissa och att faktiskt försöka att analysera båda lagens förutsättningar är två väldigt olika metoder att komma fram till sitt resultat. I och med att datorkraften de senaste åren kraftigt förbättrats har det också kommit fler och framför allt tyngre matematiska modeller för att skatta utfallet av matcher. I detta examensarbete används Pi-ratingsystemet som går ut på att varje lag får en rating för hur bra man är på hemma- respektive bortaplan. Som en utveckling av den ursprungliga Pi-rating modellen används det i detta arbete tre olika modeller för att prediktera lagens framtida rating. Modellerna som används är enkelt glidande medelvärde, enkel exponentiell utjämning och en ARIMA-modell. En lösning på hur nya lag som inte spelade i ligan föregående år ska behandlas föreslås också. Avslutningsvis diskuteras olika investeringsmetoder som kan användas för att använda resultat från modellerna på marknaden för vadslagning. Resultatet visar att en spelstrategi som utnyttjat Kellys formel ger störst avkastning för kalibreringsdatat. När denna strategi används på matcher utanför kalibreringsåren visar resultatet på en mycket låg vinst och framför allt att vinsten under lång tid är negativ, vilket från en investeringssyn inte är något man önskar. Sammanfattningsvis är denna metod inte i sig själv tillräckligt bra för att ge en säker avkastning men är en bra grund som kan byggas ut för att ta hänsyn till fler faktorer och då ge möjlighet till stabilare och mer långsiktiga vinster. / To predict a soccer game in advance is something that has been done by most people. If the prediction is the result of an advanced mathematical formula or just ha pure guess done on your favorite team is very different. Since the computer power in recent years has greatly improved the number of mathematical approaches has increased and it is especially the computational heavy models that have increased in number. In this thesis the Pi-rating system is used it gives each team a home and away rating that describe how good/bad they are compared to the average competing team. As an extension of the original Pi-rating model, in this thesis time series analysis is used to predict future values of the teams rating, three different methods are tested and they are simple moving average, simple exponential smoothing and an ARIMA-model. A solution to how new teams that did not play in the league last year should be handled is also suggested. Finally a breath discussion and test of different investment methods that can be applied on the final model to be used on the sport betting market. The results show that the greatest returns on the calibration data is achieved when Kelly’s formula is used as an investment method on an ARIMA(0,1,1)-model, but when this strategy is used outside calibration data, the result shows a very low profit and the method fails to give a stable long term return, which from an investment point of view is not desirable. The conclusion is that this method is not in itself good enough to provide a safe return but is a good foundation that can be expanded to take more factors into account, and then hopefully give bigger and more stable winnings.
|
46 |
Analysis of Correlated Data with Measurement Error in Responses or CovariatesChen, Zhijian January 2010 (has links)
Correlated data frequently arise from epidemiological studies, especially familial
and longitudinal studies. Longitudinal design has been used by researchers to investigate the changes of certain characteristics over time at the individual level as well as how potential factors influence the changes. Familial studies are often designed to investigate the dependence of health conditions among family members. Various models have been developed for this type of multivariate data, and a wide variety
of estimation techniques have been proposed. However, data collected from observational
studies are often far from perfect, as measurement error may arise from different
sources such as defective measuring systems, diagnostic tests without gold references,
and self-reports. Under such scenarios only rough surrogate variables are measured. Measurement error in covariates in various regression models has been discussed extensively in the literature. It is well known that naive approaches ignoring covariate error often lead to inconsistent estimators for model parameters.
In this thesis, we develop inferential procedures for analyzing correlated data with
response measurement error. We consider three scenarios: (i) likelihood-based inferences for generalized linear mixed models when the continuous response is subject to nonlinear measurement errors; (ii) estimating equations methods for binary responses with misclassifications; and (iii) estimating equations methods for ordinal
responses when the response variable and categorical/ordinal covariates are subject
to misclassifications.
The first problem arises when the continuous response variable is difficult to measure.
When the true response is defined as the long-term average of measurements, a single measurement is considered as an error-contaminated surrogate. We focus on generalized linear mixed models with nonlinear response error and study the induced bias in naive estimates. We propose likelihood-based methods that can yield consistent and efficient estimators for both fixed-effects and variance parameters. Results of simulation studies and analysis of a data set from the Framingham Heart Study
are presented.
Marginal models have been widely used for correlated binary, categorical, and ordinal data. The regression parameters characterize the marginal mean of a single outcome, without conditioning on other outcomes or unobserved random effects. The generalized estimating equations (GEE) approach, introduced by Liang and Zeger (1986), only models the first two moments of the responses with associations being
treated as nuisance characteristics. For some clustered studies especially familial
studies, however, the association structure may be of scientific interest. With binary
data Prentice (1988) proposed additional estimating equations that allow one to
model pairwise correlations. We consider marginal models for correlated binary data
with misclassified responses. We develop “corrected” estimating equations approaches
that can yield consistent estimators for both mean and association parameters. The
idea is related to Nakamura (1990) that is originally developed for correcting bias
induced by additive covariate measurement error under generalized linear models. Our approaches can also handle correlated misclassifications rather than a simple
misclassification process as considered by Neuhaus (2002) for clustered binary data
under generalized linear mixed models. We extend our methods and further develop
marginal approaches for analysis of longitudinal ordinal data with misclassification in both responses and categorical covariates. Simulation studies show that our proposed methods perform very well under a variety of scenarios. Results from application of the proposed methods to real data are presented.
Measurement error can be coupled with many other features in the data, e.g., complex survey designs, that can complicate inferential procedures. We explore combining
survey weights and misclassification in ordinal covariates in logistic regression
analyses. We propose an approach that incorporates survey weights into estimating
equations to yield design-based unbiased estimators.
In the final part of the thesis we outline some directions for future work, such as
transition models and semiparametric models for longitudinal data with both incomplete
observations and measurement error. Missing data is another common feature in applications. Developing novel statistical techniques for dealing with both missing
data and measurement error can be beneficial.
|
47 |
Bivariate meta-analysis of sensitivity and specificity of radiographers' plain radiograph reporting in clinical practiceBrealey, S., Hewitt, C., Scally, Andy J., Hahn, S., Godfrey, C., Thomas, N. January 2009 (has links)
Studies of diagnostic accuracy often report paired tests for sensitivity and specificity that can be pooled separately to produce summary estimates in a meta-analysis. This was done recently for a systematic review of radiographers' reporting accuracy of plain radiographs. The problem with pooling sensitivities and specificities separately is that it does not acknowledge any possible (negative) correlation between these two measures. A possible cause of this negative correlation is that different thresholds are used in studies to define abnormal and normal radiographs because of implicit variations in thresholds that occur when radiographers' report plain radiographs. A method that allows for the correlation that can exist between pairs of sensitivity and specificity within a study using a random effects approach is the bivariate model. When estimates of accuracy as a fixed-effects model were pooled separately, radiographers' reported plain radiographs in clinical practice at 93% (95% confidence interval (CI) 92-93%) sensitivity and 98% (95% CI 98-98%) specificity. The bivariate model produced the same summary estimates of sensitivity and specificity but with wider confidence intervals (93% (95% CI 91-95%) and 98% (95% CI 96-98%), respectively) that take into account the heterogeneity beyond chance between studies. This method also allowed us to calculate a 95% confidence ellipse around the mean values of sensitivity and specificity and a 95% prediction ellipse for individual values of sensitivity and specificity. The bivariate model is an improvement on pooling sensitivity and specificity separately when there is a threshold effect, and it is the preferred method of choice.
|
48 |
Testes de superioridade para modelos de chances proporcionais com e sem fração de cura / Superiority test for proportional odds model with and without cure fractionJuliana Cecilia da Silva Teixeira 24 October 2017 (has links)
Estudos que comprovem a superioridade de um fármaco em relação a outros já existentes no mercado são de grande interesse na prática clínica. Através deles a Agência Nacional de Vigilância Sanitária (ANVISA) concede registro a novos produtos, que podem curar mais rápido ou aumentar a probabilidade de cura dos pacientes, em comparação ao tratamento padrão. É de suma importância que os testes de hipóteses controlem a probabilidade do erro tipo I, ou seja, controlem a probabilidade de que um tratamento não superior seja aprovado para uso; e também atinja o poder de teste regulamentado com o menor número de indivíduos possível. Os testes de hipóteses existentes para esta finalidade ou desconsideram o tempo até que o evento de interesse ocorra (reação alérgica, efeito positivo, etc) ou são baseados no modelo de riscos proporcionais. No entanto, na prática, a hipótese de riscos proporcionais pode nem sempre ser satisfeita, como é o caso de ensaios cujos riscos dos diferentes grupos em estudo se igualam com o passar do tempo. Nesta situação, o modelo de chances proporcionais é mais adequado para o ajuste dos dados. Neste trabalho desenvolvemos e investigamos dois testes de hipóteses para ensaios clínicos de superioridade, baseados na comparação de curvas de sobrevivência sob a suposição de que os dados seguem o modelo de chances de sobrevivências proporcionais, um sem a incorporação da fração de cura e outro com esta incorporação. Vários estudos de simulação são conduzidos para analisar a capacidade de controle da probabilidade do erro tipo I e do valor do poder dos testes quando os dados satisfazem ou não a suposição do teste para diversos tamanhos amostrais e dois métodos de estimação das quantidades de interesse. Concluímos que a probabilidade do erro tipo I é subestimada quando os dados não satisfazem a suposição do teste e é controlada quando satisfazem, como esperado. De forma geral, concluímos que é imprescindível satisfazer as suposições dos testes de superioridade. / Studies that prove the superiority of a drug in relation to others already existing in the market are of great interest in clinical practice. Based on them the Brazilian National Agency of Sanitary Surveillance (ANVISA) grants superiority drugs registers which can cure faster or increase the probability of cure of patients, compared to standard treatment. It is of the utmost importance that hypothesis tests control the probability of type I error, that is, they control the probability that a non-superior treatment is approved for use; and also achieve the test power regulated with as few individuals as possible. Tests of hypotheses existing for this purpose or disregard the time until the event of interest occurrence (allergic reaction, positive effect, etc.) or are based on the proportional hazards model. However, in practice, the hypothesis of proportional hazards may not always be satisfied, as is the case of trials whose risks of the different study groups become equal over time. In this situation, the proportional odds survival model is more adequate for the adjustment of the data. In this work we developed and investigated two hypothesis tests for clinical trials of superiority, based on the comparison of survival curves under the assumption that the data follow the proportional survival odds model, one without the incorporation of cure fraction and another considering cure fraction. Several simulation studies are conducted to analyze the ability to control the probability of type I error and the value of the power of the tests when the data satisfy or not the assumption of the test for different sample sizes and two estimation methods of the quantities of interest. We conclude that the probability of type I error is underestimated when the data do not satisfy the assumption of the test and it is controlled when they satisfy, as expected. In general, we conclude that it is indispensable to satisfy the assumptions of superiority tests.
|
49 |
A Bayesian approach to predict the number of soccer goals : Modeling with Bayesian Negative Binomial regressionBäcklund, JOakim, Nils, Johdet January 2018 (has links)
This thesis focuses on a well-known topic in sports betting, predicting the number of goals in soccer games.The data set used comes from the top English soccer league: Premier League, and consists of games played in the seasons 2015/16 to 2017/18.This thesis approaches the prediction with the auxiliary support of the odds from the betting exchange Betfair. The purpose is to find a model that can create an accurate goal distribution. %The other purpose is to investigate whether Negative binomial distribution regressionThe methods used are Bayesian Negative Binomial regression and Bayesian Poisson regression. The results conclude that the Poisson regression is the better model because of the presence of underdispersion.We argue that the methods can be used to compare different sportsbooks accuracies, and may help creating better models.
|
50 |
Fatores de risco para agressões por cães a pessoasBuso, Daniel Sartore [UNESP] 10 December 2010 (has links) (PDF)
Made available in DSpace on 2014-06-11T19:27:18Z (GMT). No. of bitstreams: 0
Previous issue date: 2010-12-10Bitstream added on 2014-06-13T20:35:36Z : No. of bitstreams: 1
buso_ds_me_araca.pdf: 1499027 bytes, checksum: 9471e49d685f30b7f59eb8407cbc0c71 (MD5) / Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) / Milhões de pessoas são mordidas por cães todos os anos no Brasil e no mundo. Por meio de uma análise de casos e de um estudo caso-controle, objetivou-se caracterizar as agressões caninas e estabelecer fatores de risco para a ocorrência de agressões caninas a pessoas no Município de Araçatuba, SP. Foi realizado o teste qui-quadrado para variáveis categóricas e o teste t para variáveis numéricas, seguidos da análise estatística de Regressão Logística Binária, estabelecendo-se então o “Odds ratio” (OR) para determinadas variáveis. A maioria dos cães (71%) foi recebida como presente, sendo a busca por companhia o principal motivo de aquisição. Entre as vítimas, houve predomínio do sexo masculino em crianças e do sexo feminino em idosos. O cão agressor ter escapado (18,7%) foi a principal situação envolvida nas agressões. Consideraram-se fatores de risco o número de crianças do domicílio (OR = 1,70; IC 95% 1,03-2,82), o sexo do cão (machos, OR = 3,08; IC 95% 1,41-6,73), o estado reprodutivo (não esterilizados, OR = 4,28; IC 95% 1,05-17,45), o recebimento do animal como presente (OR = 3,99; IC 95% 1,85-8,64), a aquisição para proteção do domicílio (OR = 9,23; IC 95% 2,25-37,81) e a número de situações resultantes em agressividade (OR = 1,35; IC 95% 1,16-1,57). O número de adultos no domicílio (OR=0,65; IC 95% 0,47-0,91) foi associado negativamente à ocorrência de mordeduras. Foi possível equacionara influência conjunta de tais variáveis sobre a probabilidade de ocorrência de agressão. Estes resultados permitem que se estabeleçam programas preventivos e de posse responsável para agressões visando esclarecer sobre formas mais seguras de interação com cães, sobre os riscos e como evitá-los / Millions of people are bitten by dogs each year in Brazil and worldwide. Through an analysis of cases and a case-control study, the aim of this paper was to characterize the epidemiology of canine aggression from occurring and to establish risk factors for the occurrence of canine aggression to people in the city of Araçatuba, SP. We performed the chi-square test for categorical variables and t test for numerical variables, followed by Binary Logistic Regression, settling then the odds ratio (OR) for certain variables. Most dogs (71%) were received as gifts, search for company being the main reason for the acquisition. Among the victims, children were predominantly male and elderly were predominantly females. The dogs had escaped (18.7%) was the main situation involving attacks. Were considered risk factors the number of children in the household (OR = 1.70, 95% CI 1.03-2.82), sex of the dog (males, OR = 3.08, 95% 1.41-6.73), reproductive status (intact, OR = 4.28, 95% CI 1.05-17.45), receipt of the animal as a present (OR = 3.99, 95% CI 1 0.85-8, 64), acquisition for protection of the home (OR = 9.23, 95% CI 2.25- 37.81) and the counting of situations resulting in aggressiveness (OR = 1.35, 95% 1.16-1.57). The number of adults in the household (OR = 0.65, 95% CI 0.47-0.91) was negatively associated with the occurrence of bites. It was possible to consider the combined influence of these variables on the likelihood of aggression. These results allow us to establish dog bites preventive and responsible dog ownership programs that aim to clarify about the safest ways to interaction with dogs, the risks and how to avoid them
|
Page generated in 0.0277 seconds