Global ETD Search

41	Robust Methods for Interval-Censored Life History Data Tolusso, David January 2008 (has links) Interval censoring arises frequently in life history data, as individuals are often only observed at a sequence of assessment times. This leads to a situation where we do not know when an event of interest occurs, only that it occurred somewhere between two assessment times. Here, the focus will be on methods of estimation for recurrent event data, current status data, and multistate data, subject to interval censoring. With recurrent event data, the focus is often on estimating the rate and mean functions. Nonparametric estimates are readily available, but are not smooth. Methods based on local likelihood and the assumption of a Poisson process are developed to obtain smooth estimates of the rate and mean functions without specifying a parametric form. Covariates and extra-Poisson variation are accommodated by using a pseudo-profile local likelihood. The methods are assessed by simulations and applied to a number of datasets, including data from a psoriatic arthritis clinic. Current status data is an extreme form of interval censoring that occurs when each individual is observed at only one assessment time. If current status data arise in clusters, this must be taken into account in order to obtain valid conclusions. Copulas offer a convenient framework for modelling the association separately from the margins. Estimating equations are developed for estimating marginal parameters as well as association parameters. Efficiency and robustness to the choice of copula are examined for first and second order estimating equations. The methods are applied to data from an orthopedic surgery study as well as data on joint damage in psoriatic arthritis. Multistate models can be used to characterize the progression of a disease as individuals move through different states. Considerable attention is given to a three-state model to characterize the development of a back condition known as spondylitis in psoriatic arthritis, along with the associated risk of mortality. Robust estimates of the state occupancy probabilities are derived based on a difference in distribution functions of the entry times. A five-state model which differentiates between left-side and right-side spondylitis is also considered, which allows us to characterize what effect spondylitis on one side of the body has on the development of spondylitis on the other side. Covariate effects are considered through multiplicative time homogeneous Markov models. The robust state occupancy probabilities are also applied to data on CMV infection in patients with HIV. multistate interval censoring robust estimation local likelihood recurrent events current status data generalized estimating equations piecewise constant Statistics (Biostatistics)
42	Robust Methods for Interval-Censored Life History Data Tolusso, David January 2008 (has links) Interval censoring arises frequently in life history data, as individuals are often only observed at a sequence of assessment times. This leads to a situation where we do not know when an event of interest occurs, only that it occurred somewhere between two assessment times. Here, the focus will be on methods of estimation for recurrent event data, current status data, and multistate data, subject to interval censoring. With recurrent event data, the focus is often on estimating the rate and mean functions. Nonparametric estimates are readily available, but are not smooth. Methods based on local likelihood and the assumption of a Poisson process are developed to obtain smooth estimates of the rate and mean functions without specifying a parametric form. Covariates and extra-Poisson variation are accommodated by using a pseudo-profile local likelihood. The methods are assessed by simulations and applied to a number of datasets, including data from a psoriatic arthritis clinic. Current status data is an extreme form of interval censoring that occurs when each individual is observed at only one assessment time. If current status data arise in clusters, this must be taken into account in order to obtain valid conclusions. Copulas offer a convenient framework for modelling the association separately from the margins. Estimating equations are developed for estimating marginal parameters as well as association parameters. Efficiency and robustness to the choice of copula are examined for first and second order estimating equations. The methods are applied to data from an orthopedic surgery study as well as data on joint damage in psoriatic arthritis. Multistate models can be used to characterize the progression of a disease as individuals move through different states. Considerable attention is given to a three-state model to characterize the development of a back condition known as spondylitis in psoriatic arthritis, along with the associated risk of mortality. Robust estimates of the state occupancy probabilities are derived based on a difference in distribution functions of the entry times. A five-state model which differentiates between left-side and right-side spondylitis is also considered, which allows us to characterize what effect spondylitis on one side of the body has on the development of spondylitis on the other side. Covariate effects are considered through multiplicative time homogeneous Markov models. The robust state occupancy probabilities are also applied to data on CMV infection in patients with HIV. multistate interval censoring robust estimation local likelihood recurrent events current status data generalized estimating equations piecewise constant Statistics (Biostatistics)
43	Fine-scale distribution, habitat use, and movements of sperm whales Milligan, Marina 06 August 2013 (has links) Sperm whales (Physeter macrocephalus) are nomadic species typically studied across broad (>100km) spatial scales. In this study, I model fine-scale (or submesocale) habitat preferences, determine how organization into distinctive units of associating female and juveniles influences habitat use, and describe how movements change across the 24-hour cycle. This study concerns a well-studied population of sperm whales off Dominica in the Eastern Caribbean. Statistical models suggest that overall habitat use is rather homogenous, and social behaviour is best predicted by the presence of mature males. Variation among social units in the amount of time spent, and space occupied, within the study area indicates habitat preferences at the level of the social unit. Finally, movements are influenced by the diurnal cycle, as whales tend to move from inshore to offshore at dusk. This study betters our understanding of sperm whale habitat decisions over fine-scales, and has implication for conservation and management strategies. sperm whale Physeter macrocepahlus habitat movement nomad generalized estimating equations social unit social behaviour defecation rate anthropogenic disturbance diurnal cycle
44	The Effect of ambient air quality on lung function, respiratory symptoms and bronchodilator use among symptomatic children Fryer, Jayne Louise January 2006 (has links) Masters Research - Master of Medical Science / Numerous overseas studies have linked both short and long-term exposures to outdoor air pollution to a range of health effects. The differences in air pollution sources, climate and geography in Australia challenged the generalisability of these overseas findings to the Australian setting. In response, the Hunter Illawarra Study of Airways and Air Pollution (HISAAP) was undertaken. The aim of Phase II of HISAAP was to assess the short-term effects of particulates on respiratory health amongst symptomatic children. This thesis presents the results of an analysis of the 345 primary school children eligible for Phase II of the Hunter component of HISAAP. There were multiple daily diary measures on each child, different types of outcomes such as continuous, dichotomous and count variables, as well as several sources of exposure data on pollutants. Because of the complex and hierarchical nature of data, there are several possible methods of analyses that could be used. The thesis begins with a description of the sampling methods used in the study. Next, an overview of the literature on the relationship between air pollution and respiratory health, followed by a review of the methods of analyses appropriate for longitudinal diary studies of this nature. The methods and results are then presented for the analyses of the association between the three main outcomes of interest – evening peak flow, day cough and bronchodilator use – and air quality variables: particulates (PM10 and TSP), sulphur dioxide, pollens and fungi, using three modelling approaches. These include a representative of data reduction methods (Aggregate analysis), subject-specific or mixed-model methods (Korn-Whittemore analysis) and marginal methods (Generalised Estimating Equations). All estimates were adjusted for climate-related covariates and trend. The final chapter discusses the advantages and disadvantages of the various methods of analyses, and a recommendation for analytic techniques for further studies. air pollution environmental monitoring PM10 TSP panel study longitudinal study epidemiology asthma respiratory symptoms lung function generalized estimating equations children
45	Development and validation of clinical prediction models to diagnose acute respiratory infections in children and adults from Canadian Hutterite communities. Vuichard Gysin, Danielle January 2016 (has links) Acute respiratory infections (ARI) caused by influenza and other respiratory viruses affect millions of people annually. Although usually self-limiting a more complicated or severe course may occur in previously healthy people but are more likely in individuals with underlying illnesses. The most common viral agent is rhinovirus whereas influenza is less frequent but is well known to cause winter epidemics. In primary care, rapid diagnosis of influenza virus infections is essential in order to provide treatment. Clinical presentations vary among the different pathogens but may overlap and may also depend on host factors. Predictive models have been developed for influenza but study results may be biased because only individuals presenting with fever were included. Most of these models have not been adequately validated and their predictive power, therefore, is likely overestimated. The main objective of this thesis was to compare different mathematical models for the derivation of clinical prediction rules in individuals presenting with symptoms of ARI to better distinguish between influenza, influenza A subtypes and entero-/rhinovirus-related illness in children and adults and to evaluate model performance by using data-splitting for internal validation. Data from a completed prospective cluster-randomized trial for the indirect effect of influenza vaccination in children of Hutterite communities served as a basis of my thesis. There were a total of 3288 first episodes per season of ARI in 2202 individuals and 321 (9.8%) influenza positive events over three influenza seasons (2008-2011). The data set was divided into children under 18 years and adults. Both data sets were randomly split by subjects into a derivation (2/3 of the dataset) and a validation population (1/3 of the dataset). All predictive models were developed in the derivation sets. Demographic factors and the classical symptoms of ARI were evaluated with logistic regression and Cox proportional hazard models using forward stepwise selection applying robust estimators to account for non-independent data and by means of recursive partitioning. The beta coefficients of the independent predictors were used to develop different point scores. These scores were then tested in the validation groups and performance between validation and derivation set was compared using receiver operating characteristics (ROC) curves. We determined sensitivities and specificities, positive and negative predictive values, and likelihood ratios at different cut-points which could reflect test and treatment thresholds. Fever, chills, and cough were the most important predictors in children whereas chills and cough but not fever were most predictive of influenza virus infection in adults. Performance of the individual models was moderate with areas under the receiver operating characteristic curves between 0.75 and 0.80 for the main outcome influenza A or B virus infection. There was no statistically significant difference in performance between the derivation and validation sets for the main outcome. The results have shown, that various mathematical models have similar discriminative ability to distinguish influenza from other respiratory viruses. The scores could assist clinicians in their decision-making. However, performance of the models was slightly overestimated due to potential clustering of data and the results would first needed to be validated in a different population before application in clinical practice. / Thesis / Master of Science (MSc) / Every year, millions of people are attacked by "the flu" or the common cold. Certain signs and symptoms apparently are more discriminative between the common cold and the flu. However, the decision between starting a simple symptom orientated treatment, treating empirically for influenza or ordering a rapid diagnostic test that has only moderate sensitivity and specificity can be challenging. This thesis, therefore, aims to help physicians in their decision-making process by developing simple scores and decision trees for the diagnosis of influenza versus non-influenza respiratory infections. Data from a completed trial for the indirect effect of influenza vaccination in children of Hutterite communities served as a basis of my thesis. There were a total of 3288 first seasonal episodes of ARI in 2202 individuals and 321 (9.8%) influenza positive events over three influenza seasons (2008-2011). The data set was divided into children under 18 years and adults. Both data sets were split into a derivation and a validation set (=holdout group). Different mathematical models were applied to the derivation set and demographic factors as well as the classical symptoms of ARI were evaluated. The scores generated from the most important factors that remained in the model were then tested in the validation group and performance between validation and derivation set was compared. Accuracy was determined at different cut-points which could reflect test and treatment thresholds. Fever, chills, and cough were the most important predictors in children whereas chills and cough but not fever were most predictive of influenza virus infection in adults. Performance of the individual models was moderate for the main outcome influenza A or B virus infection. There was no statistically significant difference in performance between the derivation and validation sets for the main outcome. The results have shown, that various mathematical models have similar discriminative ability to distinguish influenza from other respiratory viruses. The scores could assist clinicians in their decision-making. However, the results would first needed to be validated in a different population before application in clinical practice. Prediction models Recursive partitioning Generalized Estimating Equations Cox Proportional Hazard models Acute respiratory infections Score Influenza Decision trees
46	Generalized Estimating Equations for Mixed Models Alnaji, Lulah A. 23 July 2018 (has links) No description available. Statistics Mathematics Generalized Estimating Equations Mixed Models Working Correlation Matrix Clustred Data Longitudinal Pearson Residual Score Function
47	Analysis of Zero-Heavy Data Using a Mixture Model Approach Wang, Shin Cheng 30 March 1998 (has links) The problem of high proportion of zeroes has long been an interest in data analysis and modeling, however, there are no unique solutions to this problem. The solution to the individual problem really depends on its particular situation and the design of the experiment. For example, different biological, chemical, or physical processes may follow different distributions and behave differently. Different mechanisms may generate the zeroes and require different modeling approaches. So it would be quite impossible and inflexible to come up with a unique or a general solution. In this dissertation, I focus on cases where zeroes are produced by mechanisms that create distinct sub-populations of zeroes. The dissertation is motivated from problems of chronic toxicity testing which has a data set that contains a high proportion of zeroes. The analysis of chronic test data is complicated because there are two different sources of zeroes: mortality and non-reproduction in the data. So researchers have to separate zeroes from mortality and fecundity. The use of mixture model approach which combines the two mechanisms to model the data here is appropriate because it can incorporate the mortality kind of extra zeroes. A zero inflated Poisson (ZIP) model is used for modeling the fecundity in <i> Ceriodaphnia dubia</i> toxicity test. A generalized estimating equation (GEE) based ZIP model is developed to handle longitudinal data with zeroes due to mortality. A joint estimate of inhibition concentration (ICx) is also developed as potency estimation based on the mixture model approach. It is found that the ZIP model would perform better than the regular Poisson model if the mortality is high. This kind of toxicity testing also involves longitudinal data where the same subject is measured for a period of seven days. The GEE model allows the flexibility to incorporate the extra zeroes and a correlation structure among the repeated measures. The problem of zero-heavy data also exists in environmental studies in which the growth or reproduction rates of multi-species are measured. This gives rise to multivariate data. Since the inter-relationships between different species are imbedded in the correlation structure, the study of the information in the correlation of the variables, which is often accessed through principal component analysis, is one of the major interests in multi-variate data. In the case where mortality influences the variables of interests, but mortality is not the subject of interests, the use of the mixture approach can be applied to recover the information of the correlation structure. In order to investigate the effect of zeroes on multi-variate data, simulation studies on principal component analysis are performed. A method that recovers the information of the correlation structure is also presented. / Ph. D. Principal Component Analysis Longitudinal Data Inhibition Concentration Generalized Estimating Equations Chronic toxicity testing Ceriodaphnia Dubia Zero-inflated Poisson
48	Rezervování škod v rámci panelových dat / Claims reserving within the panel data framework Gerthofer, Michal January 2015 (has links) In the presented thesis the issue of dependency between response variables within the subjects in the generalized linear models framework is investigated. Reserving in non-life insurance is a key factor for the financial position of a company. The text introduces the basic actuarial notation, terminology and methods. The main part is focused on panel data framework, especially Generalized Linear Mixed Models (GLMM) as well as Generalized Estimating Equations (GEE), and their application on claims reserving. The aim of this thesis is to show the advantages, disadvantages, limitations and the comparison of these approaches on representative datasets, which were chosen according to results obtained from whole database analysis. Significant focus is on model selection and diagnostics used for this purpose. Finally, the obtained results are summarized in tables, figures and the comparison of the methods is provided. Powered by TCPDF (www.tcpdf.org)
49	Modelos estatísticos para dados politômicos nominais em estudos longitudinais com uma aplicação à área agronômica / Statistical models for nominal polytomous data in longitudinal studies with an application to agronomy Menarin, Vinicius 14 January 2016 (has links) Estudos em que a resposta de interesse é uma variável categorizada são bastante comuns nas mais diversas áreas da Ciência. Em muitas situações essa resposta é composta por mais de duas categorias não ordenadas, denominada então de uma variável politômica nominal, e em geral o objetivo do estudo é associar a probabilidade de ocorrência de cada categoria aos efeitos de variáveis explicativas. Ademais, existem tipos especiais de estudos em que os dados são coletados diversas vezes para uma mesma unidade amostral ao longo do tempo, os estudos longitudinais. Estudos assim requerem o uso de modelos estatísticos que considerem em sua formulação algum tipo de estrutura que suporte a dependência que tende a surgir entre observações feitas em uma mesma unidade amostral. Neste trabalho são abordadas duas extensões do modelo de logitos generalizados, usualmente empregado quando a resposta é politômica nominal com observações independentes entre si. A primeira consiste de uma modificação das equações de estimação generalizadas para dados nominais que se utiliza de razões de chances locais para descrever a dependência entre as observações da variável resposta politômica ao longo dos diversos tempos observados. Este tipo de modelo é denominado de modelo marginal. A segunda proposta abordada consiste no modelo de logitos generalizados com a inclusão de efeitos aleatórios no preditor linear, que também leva em conta uma dependência entre as observações. Esta abordagem caracteriza o modelo de logitos generalizados misto. Há diferenças importantes inerentes às interpretações dos modelos marginais e mistos, que são discutidas e que devem ser levadas em consideração na escolha da abordagem adequada. Ambas as propostas são aplicadas em um conjunto de dados proveniente de um experimento da área agronômica realizado em campo, conduzido sob um delineamento casualizado em blocos com esquema fatorial para os tratamentos. O experimento foi acompanhado ao longo de seis estações do ano, caracterizando assim uma estrutura longitudinal, sendo a variável resposta o tipo de vegetação observado no campo (touceiras, plantas invasoras ou espaços vazios). Os resultados encontrados são satisfatórios, embora a dependência presente nos dados não seja tão caracterizada; por meio de testes como da razão de verossimilhanças e de Wald diversas diferenças significativas entre os tratamentos foram encontradas. Ainda, devido às diferenças metodológicas das duas abordagens, o modelo marginal baseado nas equações de estimação generalizadas mostra-se mais adequado para esses dados. / Studies where the response is a categorical variable are quite common in many fields of Sciences. In many situations this response is composed by more than two unordered categories characterizing a nominal polytomous outcome and, in general, the aim of the study is to associate the probability of occurrence of each category to the effects of variables. Furthermore, there are special types of study where many measurements are taken over the time for the same sampling unit, called longitudinal studies. Such studies require special statistical models that consider some kind of structure that support the dependence that tends to arise from the repeated measurements for the same sampling unit. This work focuses on two extensions of the baseline-category logit model usually employed in cases when there is a nominal polytomous response with independent observations. The first one consists in a modification of the well-known generalized estimating equations for longitudinal data based on local odds ratios to describe the dependence between the levels of the response over the repeated measurements. This type of model is also known as a marginal model. The second approach adds random effects to the linear predictor of the baseline-category logit model, which also considers a dependence between the observations. This characterizes a baseline-category mixed model. There are substantial differences inherent to interpretations when marginal and mixed models are compared, what should be considered in the choice of the most appropriated approach for each situation. Both methodologies are applied to the data of an agronomic experiment installed under a complete randomized block design with a factorial arrangement for the treatments. It was carried out over six seasons, characterizing the longitudinal structure, and the response is the type of vegetation observed in field (tussocks, weeds or regions with bare ground). The results are satisfactory, even if the dependence found in data is not so strong, and likelihood-ratio and Wald tests point to several differences between treatments. Moreover, due to methodological differences between the two approaches, the marginal model based on generalized estimating equations seems to be more appropriate for this data. Dados categorizados nominais Equações de estimação generalizadas generalized estimating equations generalized linear mixed models Medidas repetidas no tempo Modelos lineares generalizados mistos nominal categorical data repeated measurements over time
50	Improved Methods and Selecting Classification Types for Time-Dependent Covariates in the Marginal Analysis of Longitudinal Data Chen, I-Chen 01 January 2018 (has links) Generalized estimating equations (GEE) are popularly utilized for the marginal analysis of longitudinal data. In order to obtain consistent regression parameter estimates, these estimating equations must be unbiased. However, when certain types of time-dependent covariates are presented, these equations can be biased unless an independence working correlation structure is employed. Moreover, in this case regression parameter estimation can be very inefficient because not all valid moment conditions are incorporated within the corresponding estimating equations. Therefore, approaches using the generalized method of moments or quadratic inference functions have been proposed for utilizing all valid moment conditions. However, we have found that such methods will not always provide valid inference and can also be improved upon in terms of finite-sample regression parameter estimation. Therefore, we propose a modified GEE approach and a selection method that will both ensure the validity of inference and improve regression parameter estimation. In addition, these modified approaches assume the data analyst knows the type of time-dependent covariate, although this likely is not the case in practice. Whereas hypothesis testing has been used to determine covariate type, we propose a novel strategy to select a working covariate type in order to avoid potentially high type II error rates with these hypothesis testing procedures. Parameter estimates resulting from our proposed method are consistent and have overall improved mean squared error relative to hypothesis testing approaches. Finally, for some real-world examples the use of mean regression models may be sensitive to skewness and outliers in the data. Therefore, we extend our approaches from their use with marginal quantile regression to modeling the conditional quantiles of the response variable. Existing and proposed methods are compared in simulation studies and application examples. Generalized Estimating Equations Time-Dependent Covariate Empirical Covariance Matrix Working Correlation Structure Mean Squared Error Marginal Quantile Regression Applied Statistics Biostatistics Statistical Models

Search results