Spelling suggestions: "subject:"mixed codels"" "subject:"mixed 2models""
41 |
On statistical analysis of vehicle time-headways using mixed distribution modelsYu, Fu January 2014 (has links)
For decades, vehicle time-headway distribution models have been studied by many researchers and traffic engineers. A good time-headway model can be beneficial to traffic studies and management in many aspects; e.g. with a better understanding of road traffic patterns and road user behaviour, the researchers or engineers can give better estimations and predictions under certain road traffic conditions and hence make better decisions on traffic management and control. The models also help us to implement high-quality microscopic traffic simulation studies to seek good solutions to traffic problems with minimal interruption of the real traffic environment and minimum costs. Compared within previously studied models, the mixed (SPM and GQM) mod- els, especially using the gamma or lognormal distributions to describe followers headways, are probably the most recognized ones by researchers in statistical stud- ies of headway data. These mixed models are reported with good fitting results indicated by goodness-of-fit tests, and some of them are better than others in com- putational costs. The gamma-SPM and gamma-GQM models are often reported to have similar fitting qualities, and they often out-perform the lognormal-GQM model in terms of computational costs. A lognormal-SPM model cannot be formed analytically as no explicit Laplace transform is available with the lognormal dis- tribution. The major downsides of using mixed models are the difficulties and more flexibilities in fitting process as they have more parameters than those single models, and this sometimes leads to unsuccessful fitting or unreasonable fitted pa- rameters despite their success in passing GoF tests. Furthermore, it is difficult to know the connections between model parameters and realistic traffic situations or environments, and these parameters have to be estimated using headway samples. Hence, it is almost impossible to explain any traffic phenomena with the param- eters of a model. Moreover, with the gamma distribution as the only common well-known followers headway model, it is hard to justify whether it has described the headway process appropriately. This creates a barrier for better understanding the process of how drivers would follow their preceding vehicles. This study firstly proposes a framework developed using MATLAB, which would help researchers in quick implementations of any headway distributions of interest. This framework uses common methods to manage and prepare headway samples to meet those requirements in data analysis. It also provides common structures and methods on implementing existing or new models, fitting models, testing their performance hence reporting results. This will simplify the development work involved in headway analysis, avoid unnecessary repetitions of work done by others and provide results in formats that are more comparable with those reported by others. Secondly, this study focuses on the implementation of existing mixed models, i.e. the gamma-SPM, gamma-GQM and lognormal-GQM, using the proposed framework. The lognormal-SPM is also tested for the first time, with the recently developed approximation method of Laplace transform available for lognormal distributions. The parameters of these mixed models are specially discussed, as means of restrictions to simplify the fitting process of these models. Three ways of parameter pre-determinations are attempted over gamma-SPM and gamma-GQM models. A couple of response-time (RT) distributions are focused on in the later part of this study. Two RT models, i.e. Ex-Gaussian (EMG) and inverse Gaussian (IVG) are used, for first time, as single models to describe headway data. The fitting performances are greatly comparable to the best known lognormal single model. Further extending this work, these two models are tested as followers headway distributions in both SPM and GQM mixed models. The test results have shown excellent fitting performance. These now bring researchers more alternatives to use mixed models in headway analysis, and this will help to compare the be- haviours of different models when they are used to describe followers headway data. Again, similar parameter restrictions are attempted for these new mixed models, and the results show well-acceptable performance, and also corrections on some unreasonable fittings caused by the over flexibilities using 4- or 5- parameter models.
|
42 |
New Statistical Transfer Learning Models for Health Care ApplicationsJanuary 2018 (has links)
abstract: Transfer learning is a sub-field of statistical modeling and machine learning. It refers to methods that integrate the knowledge of other domains (called source domains) and the data of the target domain in a mathematically rigorous and intelligent way, to develop a better model for the target domain than a model using the data of the target domain alone. While transfer learning is a promising approach in various application domains, my dissertation research focuses on the particular application in health care, including telemonitoring of Parkinson’s Disease (PD) and radiomics for glioblastoma.
The first topic is a Mixed Effects Transfer Learning (METL) model that can flexibly incorporate mixed effects and a general-form covariance matrix to better account for similarity and heterogeneity across subjects. I further develop computationally efficient procedures to handle unknown parameters and large covariance structures. Domain relations, such as domain similarity and domain covariance structure, are automatically quantified in the estimation steps. I demonstrate METL in an application of smartphone-based telemonitoring of PD.
The second topic focuses on an MRI-based transfer learning algorithm for non-invasive surgical guidance of glioblastoma patients. Limited biopsy samples per patient create a challenge to build a patient-specific model for glioblastoma. A transfer learning framework helps to leverage other patient’s knowledge for building a better predictive model. When modeling a target patient, not every patient’s information is helpful. Deciding the subset of other patients from which to transfer information to the modeling of the target patient is an important task to build an accurate predictive model. I define the subset of “transferrable” patients as those who have a positive rCBV-cell density correlation, because a positive correlation is confirmed by imaging theory and the its respective literature.
The last topic is a Privacy-Preserving Positive Transfer Learning (P3TL) model. Although negative transfer has been recognized as an important issue by the transfer learning research community, there is a lack of theoretical studies in evaluating the risk of negative transfer for a transfer learning method and identifying what causes the negative transfer. My work addresses this issue. Driven by the theoretical insights, I extend Bayesian Parameter Transfer (BPT) to a new method, i.e., P3TL. The unique features of P3TL include intelligent selection of patients to transfer in order to avoid negative transfer and maintain patient privacy. These features make P3TL an excellent model for telemonitoring of PD using an At-Home Testing Device. / Dissertation/Thesis / Doctoral Dissertation Industrial Engineering 2018
|
43 |
Métodos de diagnóstico para modelos lineares mistos / Diagnotics methods for linear mixed models.Nobre, Juvencio Santos 04 March 2004 (has links)
Muitos fenômenos podem ser representados por meio de modelos estatísticos de forma satisfatória. Para validar tais modelos é necessário verificar se as suposições envolvidas estão satisfeitas e se o modelo é sensível a pequenas perturbações; este é o objetivo da análise de diagnóstico. Neste trabalho apresentamos, discutimos e propomos técnicas de diagnóstico em modelos lineares mistos e as ilustramos com um exemplo prático. / Many phenomena can be represented through statistical models in a satisfactory way. To validate such models it is necessary to verify whether the assumptions are satisfied and whether the model is sensitive to small deviations; this constitutes the objective of diagnostic analysis. In this work we present, discuss and propose diagnostic techniques for mixed linear models and illustrate them with a practical example.
|
44 |
Site- and Location-Adjusted Approaches to Adaptive Allocation Clinical Trial DesignsDi Pace, Brian S 01 January 2019 (has links)
Response-Adaptive (RA) designs are used to adaptively allocate patients in clinical trials. These methods have been generalized to include Covariate-Adjusted Response-Adaptive (CARA) designs, which adjust treatment assignments for a set of covariates while maintaining features of the RA designs. Challenges may arise in multi-center trials if differential treatment responses and/or effects among sites exist. We propose Site-Adjusted Response-Adaptive (SARA) approaches to account for inter-center variability in treatment response and/or effectiveness, including either a fixed site effect or both random site and treatment-by-site interaction effects to calculate conditional probabilities. These success probabilities are used to update assignment probabilities for allocating patients between treatment groups as subjects accrue. Both frequentist and Bayesian models are considered. Treatment differences could also be attributed to differences in social determinants of health (SDH) that often manifest, especially if unmeasured, as spatial heterogeneity amongst the patient population. In these cases, patient residential location can be used as a proxy for these difficult to measure SDH. We propose the Location-Adjusted Response-Adaptive (LARA) approach to account for location-based variability in both treatment response and/or effectiveness. A Bayesian low-rank kriging model will interpolate spatially-varying joint treatment random effects to calculate the conditional probabilities of success, utilizing patient outcomes, treatment assignments and residential information. We compare the proposed methods with several existing allocation strategies that ignore site for a variety of scenarios where treatment success probabilities vary.
|
45 |
Analysis of Correlated Data with Measurement Error in Responses or CovariatesChen, Zhijian January 2010 (has links)
Correlated data frequently arise from epidemiological studies, especially familial
and longitudinal studies. Longitudinal design has been used by researchers to investigate the changes of certain characteristics over time at the individual level as well as how potential factors influence the changes. Familial studies are often designed to investigate the dependence of health conditions among family members. Various models have been developed for this type of multivariate data, and a wide variety
of estimation techniques have been proposed. However, data collected from observational
studies are often far from perfect, as measurement error may arise from different
sources such as defective measuring systems, diagnostic tests without gold references,
and self-reports. Under such scenarios only rough surrogate variables are measured. Measurement error in covariates in various regression models has been discussed extensively in the literature. It is well known that naive approaches ignoring covariate error often lead to inconsistent estimators for model parameters.
In this thesis, we develop inferential procedures for analyzing correlated data with
response measurement error. We consider three scenarios: (i) likelihood-based inferences for generalized linear mixed models when the continuous response is subject to nonlinear measurement errors; (ii) estimating equations methods for binary responses with misclassifications; and (iii) estimating equations methods for ordinal
responses when the response variable and categorical/ordinal covariates are subject
to misclassifications.
The first problem arises when the continuous response variable is difficult to measure.
When the true response is defined as the long-term average of measurements, a single measurement is considered as an error-contaminated surrogate. We focus on generalized linear mixed models with nonlinear response error and study the induced bias in naive estimates. We propose likelihood-based methods that can yield consistent and efficient estimators for both fixed-effects and variance parameters. Results of simulation studies and analysis of a data set from the Framingham Heart Study
are presented.
Marginal models have been widely used for correlated binary, categorical, and ordinal data. The regression parameters characterize the marginal mean of a single outcome, without conditioning on other outcomes or unobserved random effects. The generalized estimating equations (GEE) approach, introduced by Liang and Zeger (1986), only models the first two moments of the responses with associations being
treated as nuisance characteristics. For some clustered studies especially familial
studies, however, the association structure may be of scientific interest. With binary
data Prentice (1988) proposed additional estimating equations that allow one to
model pairwise correlations. We consider marginal models for correlated binary data
with misclassified responses. We develop “corrected” estimating equations approaches
that can yield consistent estimators for both mean and association parameters. The
idea is related to Nakamura (1990) that is originally developed for correcting bias
induced by additive covariate measurement error under generalized linear models. Our approaches can also handle correlated misclassifications rather than a simple
misclassification process as considered by Neuhaus (2002) for clustered binary data
under generalized linear mixed models. We extend our methods and further develop
marginal approaches for analysis of longitudinal ordinal data with misclassification in both responses and categorical covariates. Simulation studies show that our proposed methods perform very well under a variety of scenarios. Results from application of the proposed methods to real data are presented.
Measurement error can be coupled with many other features in the data, e.g., complex survey designs, that can complicate inferential procedures. We explore combining
survey weights and misclassification in ordinal covariates in logistic regression
analyses. We propose an approach that incorporates survey weights into estimating
equations to yield design-based unbiased estimators.
In the final part of the thesis we outline some directions for future work, such as
transition models and semiparametric models for longitudinal data with both incomplete
observations and measurement error. Missing data is another common feature in applications. Developing novel statistical techniques for dealing with both missing
data and measurement error can be beneficial.
|
46 |
An Additive Bivariate Hierarchical Model for Functional Data and Related ComputationsRedd, Andrew Middleton 2010 August 1900 (has links)
The work presented in this dissertation centers on the theme of regression and
computation methodology. Functional data is an important class of longitudinal
data, and principal component analysis is an important approach to regression with
this type of data. Here we present an additive hierarchical bivariate functional data
model employing principal components to identify random e ects. This additive
model extends the univariate functional principal component model. These models
are implemented in the pfda package for R. To t the curves from this class of models
orthogonalized spline basis are used to reduce the dimensionality of the t, but retain
exibility. Methods for handing spline basis functions in a purely analytical manner,
including the orthogonalizing process and computing of penalty matrices used to t
the principal component models are presented. The methods are implemented in the
R package orthogonalsplinebasis.
The projects discussed involve complicated coding for the implementations in R.
To facilitate this I created the NppToR utility to add R functionality to the popular
windows code editor Notepad . A brief overview of the use of the utility is also
included.
|
47 |
Bayesian Methods in Nutrition Epidemiology and Regression-based Predictive Models in HealthcareZhang, Saijuan 2010 December 1900 (has links)
This dissertation has mainly two parts. In the first part, we propose a bivariate nonlinear multivariate measurement error model to understand the distribution of dietary intake and extend it to a multivariate model to capture dietary patterns in nutrition epidemiology. In the second part, we propose regression-based predictive models to accurately predict surgery duration in healthcare.
Understanding the distribution of episodically consumed dietary components is an important problem in public health. Short-term measurements of episodically consumed dietary components are zero-inflated skewed distributions. So-called two-part models have been developed for such data. However, there is much greater public health interest in the usual intake adjusted for caloric intake. Recently a nonlinear mixed effects model has been developed and fit by maximum likelihood using nonlinear mixed effects programs. However, the fitting is slow and unstable. We develop a Monte-Carlo-based fitting method in Chapter II. We demonstrate numerically that our methods lead to increased speed of computation, converge to reasonable solutions, and have the flexibility to be used in either a frequentist or a Bayesian manner. Diet consists of numerous foods, nutrients and other components, each of which have distinctive attributes. Increasingly nutritionists are interested in exploring them collectively to capture overall dietary patterns. We thus extend the bivariate model described in Chapter III to multivariate level. We use survey-weighted MCMC computations to fit the model, with uncertainty estimation coming from balanced repeated replication. The methodology is illustrated through an application of estimating the population distribution of the Healthy Eating Index-2005 (HEI-2005), a multi-component dietary quality index , among children aged 2-8 in the United States.
The second part of this dissertation is to accurately predict surgery duration. Prior research has identified the current procedural terminology (CPT) codes as the
most important factor when predicting surgical case durations but there has been little reporting of a general predictive methodology using it effectively. In Chapter IV, we propose two regression-based predictive models. However, the naively constructed design matrix is singular. We thus devise a systematic procedure to construct a fullranked design matrix. Using surgical data from a central Texas hospital, we compare the proposed models with a few benchmark methods and demonstrate that our models lead to a remarkable reduction in prediction errors.
|
48 |
Analysis of Correlated Data with Measurement Error in Responses or CovariatesChen, Zhijian January 2010 (has links)
Correlated data frequently arise from epidemiological studies, especially familial
and longitudinal studies. Longitudinal design has been used by researchers to investigate the changes of certain characteristics over time at the individual level as well as how potential factors influence the changes. Familial studies are often designed to investigate the dependence of health conditions among family members. Various models have been developed for this type of multivariate data, and a wide variety
of estimation techniques have been proposed. However, data collected from observational
studies are often far from perfect, as measurement error may arise from different
sources such as defective measuring systems, diagnostic tests without gold references,
and self-reports. Under such scenarios only rough surrogate variables are measured. Measurement error in covariates in various regression models has been discussed extensively in the literature. It is well known that naive approaches ignoring covariate error often lead to inconsistent estimators for model parameters.
In this thesis, we develop inferential procedures for analyzing correlated data with
response measurement error. We consider three scenarios: (i) likelihood-based inferences for generalized linear mixed models when the continuous response is subject to nonlinear measurement errors; (ii) estimating equations methods for binary responses with misclassifications; and (iii) estimating equations methods for ordinal
responses when the response variable and categorical/ordinal covariates are subject
to misclassifications.
The first problem arises when the continuous response variable is difficult to measure.
When the true response is defined as the long-term average of measurements, a single measurement is considered as an error-contaminated surrogate. We focus on generalized linear mixed models with nonlinear response error and study the induced bias in naive estimates. We propose likelihood-based methods that can yield consistent and efficient estimators for both fixed-effects and variance parameters. Results of simulation studies and analysis of a data set from the Framingham Heart Study
are presented.
Marginal models have been widely used for correlated binary, categorical, and ordinal data. The regression parameters characterize the marginal mean of a single outcome, without conditioning on other outcomes or unobserved random effects. The generalized estimating equations (GEE) approach, introduced by Liang and Zeger (1986), only models the first two moments of the responses with associations being
treated as nuisance characteristics. For some clustered studies especially familial
studies, however, the association structure may be of scientific interest. With binary
data Prentice (1988) proposed additional estimating equations that allow one to
model pairwise correlations. We consider marginal models for correlated binary data
with misclassified responses. We develop “corrected” estimating equations approaches
that can yield consistent estimators for both mean and association parameters. The
idea is related to Nakamura (1990) that is originally developed for correcting bias
induced by additive covariate measurement error under generalized linear models. Our approaches can also handle correlated misclassifications rather than a simple
misclassification process as considered by Neuhaus (2002) for clustered binary data
under generalized linear mixed models. We extend our methods and further develop
marginal approaches for analysis of longitudinal ordinal data with misclassification in both responses and categorical covariates. Simulation studies show that our proposed methods perform very well under a variety of scenarios. Results from application of the proposed methods to real data are presented.
Measurement error can be coupled with many other features in the data, e.g., complex survey designs, that can complicate inferential procedures. We explore combining
survey weights and misclassification in ordinal covariates in logistic regression
analyses. We propose an approach that incorporates survey weights into estimating
equations to yield design-based unbiased estimators.
In the final part of the thesis we outline some directions for future work, such as
transition models and semiparametric models for longitudinal data with both incomplete
observations and measurement error. Missing data is another common feature in applications. Developing novel statistical techniques for dealing with both missing
data and measurement error can be beneficial.
|
49 |
Ganho genético para produtividade de grãos de milho na região sul do Brasil / Genetic gain for corn grain yield in southern BrazilSilva, éder David Borges da 11 February 2015 (has links)
The objectives of this study was to calculate the genetic gain corn grain yield in southern Brazil and calculate the genetic gain in two classes of altitude. The methods used were the Vencovsky et al. (1988) based on linear models with inference least squares (ML/MQ) and the Borges et al. (2009), based on mixed models with inference by residual likelihood (MM/REML). We used a database with 30,292 yield observation from corn grain but resulting in 135 genotypes in 2,826 trials conducted in 13 years, from 2001 to 2013, being the year regarding the planting date of the trials. For the stratification of trials, in order to form similar environments, we used as criteria the altitude, down and up to 700m. The methodology of ML/MQ annual mean genetic gain was 121 kg ha-1 year-1 with a confidence interval to 95% [11;232] kg ha-1 year-1, and by MM/REML methodology was 79 kg ha-1 year-1 with a confidence interval to 95% [70; 98] kg ha-1 year-1 for the entire southern region of Brazil. In regions with altitude of 700 m, the genetic gain was 94 kg ha-1 year-1 and in regions with altitude less than 700 mo genetic gain was 74 kg ha-1 year-1.MM/REML methodology provided intervals more accurate and reliable values of genetic gain lower when compared to ML/MQ methodology. In all cases analyzed, the genetic gain was positive and significant between the years 2001-2013. / Os objetivos deste trabalho foram calcular o ganho genético de produtividade de grãos de milho na região Sul do Brasil e calcular o ganho genético em duas classes de altitude. As metodologias utilizadas foram a de Vencovsky et al. (1988), baseada em modelos lineares com inferência por mínimos quadrados (ML/MQ) e a de Borges et al. (2009), baseada em modelos mistos com inferência por verossimilhança residual (MM/REML). Foi utilizada uma base de dados com 30.292 observações de produtividade de grãos de milho, resultantes de 135 genótipos avaliados em 2.826 ensaios conduzidos em 13 anos, de 2001 a 2013, sendo o ano referente à data de semeadura do ensaio. Para a estratificação dos ensaios, com o objetivo de formar classes de altitude, utilizou-se como critério da altitude, inferior e superior a 700m. Pela metodologia de ML/MQ o ganho genético médio anual foi de 121 kg ha-1 ano-1 com intervalo de confiança a 95% de [11;232] kg ha-1.ano-1, e pela metodologia de MM/REML foi de 79 kg ha-1 ano-1 com intervalo de confiança a 95% de [70;98] kg ha-1 ano-1, para a região sul do Brasil. Em altitude superior a 700 m, o ganho genético foi de 94 kg ha-1 ano-1 e em altitude inferior a 700 m o ganho genético foi de 74 kg ha-1 ano-1. A metodologia MM/REML proporcionou intervalos de confiança mais precisos e valores de ganho genético inferiores quando comparada à metodologia de ML/MQ. Em todas as situações analisadas o ganho genético foi positivo, e significativo entre os anos de 2001 a 2013.
|
50 |
Aspectos genéticos da produção de leite e de seus constituintes em búfalas mestiçasRamírez Díaz, Johanna [UNESP] 18 January 2010 (has links) (PDF)
Made available in DSpace on 2014-06-11T19:26:07Z (GMT). No. of bitstreams: 0
Previous issue date: 2010-01-18Bitstream added on 2014-06-13T20:33:40Z : No. of bitstreams: 1
ramirezdiaz_j_me_jabo.pdf: 299917 bytes, checksum: 05bd7ca3c91cc621d21a5b0a924a0ad7 (MD5) / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / Foram analisadas 1842 informações de produção de leite e constituintes – gordura, proteína e sólidos totais- de búfalas leiteiras de diferentes grupos genéticos (GG) com o objetivo de avaliar o efeito do grupo na produção total de leite (Pltotal) e a produção total de gordura, proteína e sólidos totais em quilogramas. Utilizando a metodologia dos modelos mistos com medidas repetidas no tempo, três diferentes modelos foram analisados. A composição racial (CR) foi formada levando-se em conta a porcentagem de Murrah das búfalas, sendo apresentados como desvio dessa raça. O Modelo um (M1) considerou como efeito fixo a CR, enquanto que o modelo dos (M2), considerou, alem da CR, as informações de heterozigose como co-variável, e o modelo três (M3) desconsiderou a CR e utilizou as informações de heterozigose. O efeito do grupo de contemporâneos (GC) definidos por rebanho- ano e estação de parto foi considerado como fixo e a duração da lactação (DL) – efeito linear- e idade da vaca ao parto (IVP) -efeitos linear e quadrático- foram consideradas como co-variáveis nos três modelos. Efeito significativo (P<0.05) do GC, IVP e DL foram observados para todas as características analisadas, enquanto que o efeito da CR não foi significativo em nenhuma das características, independentemente do modelo utilizado. Também foram calculadas as estimativas de herdabilidade (h2) para Pltotal (kg), e para a produção de proteína (Prot), gordura (Gord) e sólidos totais (ST) das primeiras lactações das búfalas sob analises uni-característica utilizando o método de máxima verossimilhança restrita pelo programa MTDFREML (BOLDMAN et al., 1995) considerando o M1 incluindo os efeitos aleatórios de animal, de ambiente permanente e residual. Os coeficientes de herdabilidade estimados para a Pltotal, Gord/kg, Prot/kg e ST/kg foram 0,14 ± 0,05; 0,37±0,07; 0,5±0,14; e 0,46... / We analyzed information on 1842 milk production and constituents -fat, protein and total solids- of water dairy buffalo from different breed in order to evaluate the effect of breed composition (CR) in total milk production (Pltotal) and production in kilograms of fat, protein and total solids. Using the methodology of mixed models with repeated measure, three models were studied. The breed composition was conformed taking into account the percentage of Murrah, as deviation from this breed. Model one (M1) included as fixed effect the CR, whereas the model two (M2), included the information of the CR and heterozygosity as a covariate. The model three (M3) included the information heterozygosity. The effect of contemporary group (CG) conformed by year and season of birth was considered as fixed effect and the duration of lactation (DL) - a linear- and age at calving (IVP)-linear and quadratic effects - were considered as covariates in the three models. We also estimated genetic parameters for Pltotal (kg), and for the production of protein Prot (kg), fat (Gord) and total solids TS (kg) of first lactation of buffaloes. Uni-trait analyses under the method of restricted maximum likelihood method were used for the estimation of variance components and heritability for PLtotal, Gord, Prot and ST. The fixed effects as contemporary group consisting of herd, year and season of calving, and genetic group (GG), linear and quadratic covariate of age at calving (IVP) and the random effects of animal, permanent environment and residual. Estimates of coefficients of heritability estimates for Pltotal, Gord / kg, Prot / kg and ST / kg were 0,14±0,05; 0,37±0,07; 0,5±0,14; e 0,46±0,06 respectively
|
Page generated in 0.0498 seconds