41 |
Robust mixture regression modeling with Pearson type VII distributionZhang, Jingyi January 1900 (has links)
Master of Science / Department of Statistics / Weixing Song / A robust estimation procedure for parametric regression models is proposed in the paper by assuming the error terms follow a Pearson type VII distribution. The estimation procedure is implemented by an EM algorithm based on the fact that the Pearson type VII distributions are a scale mixture of a normal distribution and a Gamma distribution. A trimmed version of proposed procedure is also discussed in this paper, which can successfully trim the high leverage points away from the data. Finite sample performance of the proposed algorithm is evaluated by some extensive simulation studies, together with the comparisons made with other existing procedures in the literature.
|
42 |
Robust mixture regression model fitting by Laplace distributionXing, Yanru January 1900 (has links)
Master of Science / Department of Statistics / Weixing Song / A robust estimation procedure for mixture linear regression models is proposed in this
report by assuming the error terms follow a Laplace distribution. EM algorithm is imple-
mented to conduct the estimation procedure of missing information based on the fact that
the Laplace distribution is a scale mixture of normal and a latent distribution. Finite sample
performance of the proposed algorithm is evaluated by some extensive simulation studies,
together with the comparisons made with other existing procedures in this literature. A
sensitivity study is also conducted based on a real data example to illustrate the application of the proposed method.
|
43 |
Minimum Hellinger distance estimation in a semiparametric mixture modelXiang, Sijia January 1900 (has links)
Master of Science / Department of Statistics / Weixin Yao / In this report, we introduce the minimum Hellinger distance (MHD) estimation method and review its history. We examine the use of Hellinger distance to obtain a new efficient and robust estimator for a class of semiparametric mixture models where one component has known distribution while the other component and the mixing proportion are unknown. Such semiparametric mixture models have been used in biology and the sequential clustering algorithm. Our new estimate is based on the MHD, which has been shown to have good efficiency and robustness
properties. We use simulation studies to illustrate the finite sample performance of the proposed estimate and compare it to some other existing approaches. Our empirical studies demonstrate that the proposed minimum Hellinger distance estimator (MHDE) works at least as well as some existing estimators for most of the examples considered and outperforms the existing estimators when the data are under contamination. A real data set application is also provided to illustrate the effectiveness of our proposed methodology.
|
44 |
Predicting Hearing Loss Using Auditory Steady-State Responsesli, yiwen 14 January 2009 (has links)
Auditory Steady-State Response (ASSR) is a promising tool for detecting hearing loss. In this project, we analyzed hearing threshold data obtained from two ASSR methods and a gold standard, pure tone audiometry, applied to both normal and hearing-impaired subjects. We constructed a repeated measures linear model to identify factors that show significant differences in the mean response. The analysis shows that there are significant differences due to hearing status (normal or impaired) and ASSR method, and that there is a significant interaction between hearing status and test signal frequency. The second task of this project was to predict the PTA threshold (gold standard) from the ASSR-A and ASSR-B thresholds separately at each frequency, in order to measure how accurate the ASSR measurements are and to obtain a ¡°correction function¡± to correct the bias in the ASSR measurements. We used two approaches. In the first, we modeled the relation of the PTA responses to the ASSR values for the two hearing status groups as a mixture model and tried two prediction methods. The mixture modeling was successful, but the predictions gave disappointing results. A second approach, using logistic regression to predict group membership based on ASSR value and then using those predictions to obtain a predictor of the PTA value, gave successful results.
|
45 |
[en] MODELING NONLINEAR TIME SERIES WITH A TREE-STRUCTURED MIXTURE OF GAUSSIAN MODELS / [pt] MODELANDO SÉRIES TEMPORAIS NÃO-LINEARES ATRAVÉS DE UMA MISTURA DE MODELOS GAUSSIANOS ESTRUTURADOS EM ÁRVOREEDUARDO FONSECA MENDES 20 March 2007 (has links)
[pt] Neste trabalho um novo modelo de mistura de distribuições
é proposto, onde a estrutura da mistura é determinada por
uma árvore de decisão com transição suave. Modelos
baseados em mistura de distribuições são úteis para
aproximar distribuições condicionais desconhecidas de
dados multivariados. A estrutura em árvore leva a um
modelo que é mais simples, e em alguns casos mais
interpretável, do que os propostos anteriormente na
literatura. Baseando-se no algoritmo de Esperança-
Maximização (EM), foi derivado um estimador de quasi-
máxima verossimilhança. Além disso, suas propriedades
assintóticas são derivadas sob condições de
regularidades. Uma estratégia de crescimento da árvore,
do especifico para o geral, é também proposta para evitar
possíveis problemas de identificação. Tanto a estimação
quanto a estratégia de crescimento são avaliados em um
experimento Monte Carlo, mostrando que a teoria ainda
funciona para pequenas amostras. A habilidade de
aproximação universal é ainda analisada em experimentos
de simulação. Para concluir, duas aplicações com bases de
dados reais são apresentadas. / [en] In this work a new model of mixture of distributions is
proposed, where the mixing structure is determined by a
smooth transition tree architecture. Models based on
mixture of distributions are useful in order to approximate
unknown conditional distributions of multivariate data. The
tree structure yields a model that is simpler, and in some
cases more interpretable, than previous proposals in the
literature. Based on the Expectation-Maximization (EM)
algorithm a quasi-maximum likelihood estimator is derived
and its asymptotic properties are derived under mild
regularity conditions. In addition, a specific-to-general
model building strategy is proposed in order to avoid
possible identification problems. Both the estimation
procedure and the model building strategy are evaluated in
a Monte Carlo experiment, which give strong support for the
theorydeveloped in small samples. The approximation
capabilities of the model is also analyzed in a simulation
experiment. Finally, two applications with real datasets
are considered.
|
46 |
A Sensitivity Analysis of a Nonignorable Nonresponse Model Via EM Algorithm and BootstrapZong, Yujie 15 April 2011 (has links)
The Slovenian Public Opinion survey (SPOS), which carried out in 1990, was used by the government of Slovenia as a benchmark to prepare for an upcoming plebiscite, which asked the respondents whether they support independence from Yugoslavia. However, the sample size was large and it is quite likely that the respondents and nonrespondents had divergent viewpoints. We first develop an ignorable nonresponse model which is an extension of a bivariate binomial model. In order to accommodate the nonrespondents, we then develop a nonignorable nonresponse model which is an extension of the ignorable model. Our methodology uses an EM algorithm to fit both the ignorable and nonignorable nonresponse models, and estimation is carried out using the bootstrap mechanism. We also perform sensitivity analysis to study different degrees of departures of the nonignorable nonresponse model from the ignorable nonresponse model. We found that the nonignorable nonresponse model is mildly sensitive to departures from the ignorable nonresponse model. In fact, our finding based on the nonignorable model is better than an earlier conclusion about another nonignorable nonresponse model fitted to these data.
|
47 |
Influência local em modelos geoestatísticos T-Student com aplicações a dados agrícolas / Local influence in geoestatistic T-Student models applied to agricultural dataAssumpção, Rosangela Aparecida Botinha 16 December 2010 (has links)
Made available in DSpace on 2017-07-10T19:25:00Z (GMT). No. of bitstreams: 1
Rosangela_texto.pdf: 2310887 bytes, checksum: d9e69eaef22ee697283c66446001b19e (MD5)
Previous issue date: 2010-12-16 / The presence of inconsistent observations make it improper to consider the gaussian process,
as it is found in the literature. This process should be replaced by models of the symmetric
distribution classes, such as the t-student distribution, which incorporates additional parameters
to reduce the influence of inconsistent points. This work has developed the EM algorithm for
estimating the structure of the spatial dependence of the parameters and of the spatial linear
model, assuming that the process shows t-student n-varied distribution. This distribution has
the degree of freedom v as the additional parameter, which has been considered to be fixed in
this research. Techniques to diagnose influence are used after the estimation of parameters, in
order to assess the quality of the adjustment of the model by the assumptions made and for the
robustness of the results of the estimates when there are disturbances in the model or data. In
the present work, diagnostic techniques for the assessment of local influence in linear spatial
models have been developed, considering the process with t-student n-varied distribution. The
usual diagnostic technique evaluates the withdrawing of the likelihood rate by the function of the
likelihood logarithm. In this proposal, in addition to considering the usual technique, we use the
withdrawing of the likelihood by Q-displacement of the complete likelihood. The application
of the usual technique and of the one proposed here are illustrated through the analyses of both
simulated and real data, provenient of agricultural experiments. / A presença de observações discrepantes torna imprópria a análise do processo gaussiano, sendo
assim, como é encontrado na literatura, esse processo deve ser substituído por modelos da
classe das distribuições simétricas, tal como a distribuição t-student, que incorpora parâmetros
adicionais para reduzir a influência dos pontos discrepantes. Neste trabalho, assumiu-se que
o processo apresenta distribuição t-student n-variada. Essa distribuição tem como parâmetro
adicional o grau de liberdade v, que aqui considerou-se fixo. Dessa forma, desenvolveu-se o algoritmo
EM e o algoritmo de NR para a estimação dos parâmetros da estrutura de dependência
espacial e do modelo espacial linear. Após a estimação dos parâmetros, utilizou-se duas técnicas
de diagnósticos de influência local, ambas com o intuito de avaliar a qualidade do ajuste do
modelo pelas suposições feitas e pela robustez dos resultados das estimativas quando há perturbações
no modelo ou nos dados. A primeira técnica, denominada "usual", já utilizada por
diversos autores, avalia o afastamento da verossimilhança pela função do logaritmo da verossimilhança
e a segunda técnica que aqui apresentamos propõe a análise de influência local pelo
Q-afastamento da função de verossimilhança para dados completos. Essas técnicas permitiram
verificar a influência no afastamento da verossimilhança, na matriz de covariância, no preditor
linear e nos valores preditos por meio da análise gráfica. Para ilustrar a aplicação da técnica
usual e da nossa proposta, realizou-se a análise de dados simulados e dados reais provenientes
de experimentos agrícolas.
|
48 |
Influência local em modelos geoestatísticos T-Student com aplicações a dados agrícolas / Local influence in geoestatistic T-Student models applied to agricultural dataAssumpção, Rosangela Aparecida Botinha 16 December 2010 (has links)
Made available in DSpace on 2017-05-12T14:48:22Z (GMT). No. of bitstreams: 1
Rosangela_texto.pdf: 2310887 bytes, checksum: d9e69eaef22ee697283c66446001b19e (MD5)
Previous issue date: 2010-12-16 / The presence of inconsistent observations make it improper to consider the gaussian process,
as it is found in the literature. This process should be replaced by models of the symmetric
distribution classes, such as the t-student distribution, which incorporates additional parameters
to reduce the influence of inconsistent points. This work has developed the EM algorithm for
estimating the structure of the spatial dependence of the parameters and of the spatial linear
model, assuming that the process shows t-student n-varied distribution. This distribution has
the degree of freedom v as the additional parameter, which has been considered to be fixed in
this research. Techniques to diagnose influence are used after the estimation of parameters, in
order to assess the quality of the adjustment of the model by the assumptions made and for the
robustness of the results of the estimates when there are disturbances in the model or data. In
the present work, diagnostic techniques for the assessment of local influence in linear spatial
models have been developed, considering the process with t-student n-varied distribution. The
usual diagnostic technique evaluates the withdrawing of the likelihood rate by the function of the
likelihood logarithm. In this proposal, in addition to considering the usual technique, we use the
withdrawing of the likelihood by Q-displacement of the complete likelihood. The application
of the usual technique and of the one proposed here are illustrated through the analyses of both
simulated and real data, provenient of agricultural experiments. / A presença de observações discrepantes torna imprópria a análise do processo gaussiano, sendo
assim, como é encontrado na literatura, esse processo deve ser substituído por modelos da
classe das distribuições simétricas, tal como a distribuição t-student, que incorpora parâmetros
adicionais para reduzir a influência dos pontos discrepantes. Neste trabalho, assumiu-se que
o processo apresenta distribuição t-student n-variada. Essa distribuição tem como parâmetro
adicional o grau de liberdade v, que aqui considerou-se fixo. Dessa forma, desenvolveu-se o algoritmo
EM e o algoritmo de NR para a estimação dos parâmetros da estrutura de dependência
espacial e do modelo espacial linear. Após a estimação dos parâmetros, utilizou-se duas técnicas
de diagnósticos de influência local, ambas com o intuito de avaliar a qualidade do ajuste do
modelo pelas suposições feitas e pela robustez dos resultados das estimativas quando há perturbações
no modelo ou nos dados. A primeira técnica, denominada "usual", já utilizada por
diversos autores, avalia o afastamento da verossimilhança pela função do logaritmo da verossimilhança
e a segunda técnica que aqui apresentamos propõe a análise de influência local pelo
Q-afastamento da função de verossimilhança para dados completos. Essas técnicas permitiram
verificar a influência no afastamento da verossimilhança, na matriz de covariância, no preditor
linear e nos valores preditos por meio da análise gráfica. Para ilustrar a aplicação da técnica
usual e da nossa proposta, realizou-se a análise de dados simulados e dados reais provenientes
de experimentos agrícolas.
|
49 |
Model selection criteria in the presence of missing data based on the Kullback-Leibler discrepancySparks, JonDavid 01 December 2009 (has links)
An important challenge in statistical modeling involves determining an appropriate structural form for a model to be used in making inferences and predictions. Missing data is a very common occurrence in most research settings and can easily complicate the model selection problem. Many useful procedures have been developed to estimate parameters and standard errors in the presence of missing data;however, few methods exist for determining the actual structural form of a modelwhen the data is incomplete.
In this dissertation, we propose model selection criteria based on the Kullback-Leiber discrepancy that can be used in the presence of missing data. The criteria are developed by accounting for missing data using principles related to the expectation maximization (EM) algorithm and bootstrap methods. We formulate the criteria for three specific modeling frameworks: for the normal multivariate linear regression model, a generalized linear model, and a normal longitudinal regression model. In each framework, a simulation study is presented to investigate the performance of the criteria relative to their traditional counterparts. We consider a setting where the missingness is confined to the outcome, and also a setting where the missingness may occur in the outcome and/or the covariates. The results from the simulation studies indicate that our criteria provide better protection against underfitting than their traditional analogues.
We outline the implementation of our methodology for a general discrepancy measure. An application is presented where the proposed criteria are utilized in a study that evaluates the driving performance of individuals with Parkinson's disease under low contrast (fog) conditions in a driving simulator.
|
50 |
Mixtures-of-Regressions with Measurement ErrorFang, Xiaoqiong 01 January 2018 (has links)
Finite Mixture model has been studied for a long time, however, traditional methods assume that the variables are measured without error. Mixtures-of-regression model with measurement error imposes challenges to the statisticians, since both the mixture structure and the existence of measurement error can lead to inconsistent estimate for the regression coefficients. In order to solve the inconsistency, We propose series of methods to estimate the mixture likelihood of the mixtures-of-regressions model when there is measurement error, both in the responses and predictors. Different estimators of the parameters are derived and compared with respect to their relative efficiencies. The simulation results show that the proposed estimation methods work well and improve the estimating process.
|
Page generated in 0.0502 seconds