• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 3
  • 2
  • 2
  • 1
  • Tagged with
  • 14
  • 14
  • 4
  • 4
  • 4
  • 4
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Empirical Likelihood Confidence Intervals for the Ratio and Difference of Two Hazard Functions

Zhao, Meng 21 July 2008 (has links)
In biomedical research and lifetime data analysis, the comparison of two hazard functions usually plays an important role in practice. In this thesis, we consider the standard independent two-sample framework under right censoring. We construct efficient and useful confidence intervals for the ratio and difference of two hazard functions using smoothed empirical likelihood methods. The empirical log-likelihood ratio is derived and its asymptotic distribution is a chi-squared distribution. Furthermore, the proposed method can be applied to medical diagnosis research. Simulation studies show that the proposed EL confidence intervals have better performance in terms of coverage accuracy and average length than the traditional normal approximation method. Finally, our methods are illustrated with real clinical trial data. It is concluded that the empirical likelihood methods provide better inferential outcomes.
2

Empirical Likelihood Confidence Intervals for the Ratio and Difference of Two Hazard Functions

Zhao, Meng 21 July 2008 (has links)
In biomedical research and lifetime data analysis, the comparison of two hazard functions usually plays an important role in practice. In this thesis, we consider the standard independent two-sample framework under right censoring. We construct efficient and useful confidence intervals for the ratio and difference of two hazard functions using smoothed empirical likelihood methods. The empirical log-likelihood ratio is derived and its asymptotic distribution is a chi-squared distribution. Furthermore, the proposed method can be applied to medical diagnosis research. Simulation studies show that the proposed EL confidence intervals have better performance in terms of coverage accuracy and average length than the traditional normal approximation method. Finally, our methods are illustrated with real clinical trial data. It is concluded that the empirical likelihood methods provide better inferential outcomes.
3

Empirical likelihood with applications in time series

Li, Yuyi January 2011 (has links)
This thesis investigates the statistical properties of Kernel Smoothed Empirical Likelihood (KSEL, e.g. Smith, 1997 and 2004) estimator and various associated inference procedures in weakly dependent data. New tests for structural stability are proposed and analysed. Asymptotic analyses and Monte Carlo experiments are applied to assess these new tests, theoretically and empirically. Chapter 1 reviews and discusses some estimation and inferential properties of Empirical Likelihood (EL, Owen, 1988) for identically and independently distributed data and compares it with Generalised EL (GEL), GMM and other estimators. KSEL is extensively treated, by specialising kernel-smoothed GEL in the working paper of Smith (2004), some of whose results and proofs are extended and refined in Chapter 2. Asymptotic properties of some tests in Smith (2004) are also analysed under local alternatives. These special treatments on KSEL lay the foundation for analyses in Chapters 3 and 4, which would not otherwise follow straightforwardly. In Chapters 3 and 4, subsample KSEL estimators are proposed to assist the development of KSEL structural stability tests to diagnose for a given breakpoint and for an unknown breakpoint, respectively, based on relevant work using GMM (e.g. Hall and Sen, 1999; Andrews and Fair, 1988; Andrews and Ploberger, 1994). It is also original in these two chapters that moment functions are allowed to be kernel-smoothed after or before the sample split, and it is rigorously proved that these two smoothing orders are asymptotically equivalent. The overall null hypothesis of structural stability is decomposed according to the identifying and overidentifying restrictions, as Hall and Sen (1999) advocate in GMM, leading to a more practical and precise structural stability diagnosis procedure. In this framework, these KSEL structural stability tests are also proved via asymptotic analysis to be capable of identifying different sources of instability, arising from parameter value change or violation of overidentifying restrictions. The analyses show that these KSEL tests follow the same limit distributions as their counterparts using GMM. To examine the finite-sample performance of KSEL structural stability tests in comparison to GMM's, Monte Carlo simulations are conducted in Chapter 5 using a simple linear model considered by Hall and Sen (1999). This chapter details some relevant computational algorithms and permits different smoothing order, kernel type and prewhitening options. In general, simulation evidence seems to suggest that compared to GMM's tests, these newly proposed KSEL tests often perform comparably. However, in some cases, the sizes of these can be slightly larger, and the false null hypotheses are rejected with much higher frequencies. Thus, these KSEL based tests are valid theoretical and practical alternatives to GMM's.
4

Kernel smoothing dos dados de chuva no Nordeste

BARBOSA, Nyedja Fialho Morais 22 March 2013 (has links)
Submitted by (ana.araujo@ufrpe.br) on 2016-08-09T13:11:01Z No. of bitstreams: 1 Nyedja Fialho Morais Barbosa.pdf: 3325046 bytes, checksum: 58f0c964732402cfaf2333cb5ea24c35 (MD5) / Made available in DSpace on 2016-08-09T13:11:01Z (GMT). No. of bitstreams: 1 Nyedja Fialho Morais Barbosa.pdf: 3325046 bytes, checksum: 58f0c964732402cfaf2333cb5ea24c35 (MD5) Previous issue date: 2013-03-22 / Northeastern Brazil has great climatic adversity, is considered a very complex region, attracting the interest of scholars from around the world. The rainfall over this region is considered by seasonal behave more intensely on three internal zones of the region in different periods of the year, lasting three months, besides suffering heavily influenced by the incidence of El Niño, La Niña and other phenomena acting on the basins of the tropical Pacific and Atlantic oceans. In this work the technique was applied computational mathematics-interpolation Kernel Smoothing the data of rain on northeastern Brazil collected in the period from 1904 to 1998, from 2283 conventional weather stations located in all states of the Northeast. The calculations were performed on the GPU developed "Cluster Neumann" Program Graduate in Applied Statistics and Biometry, Department of Statistics and Informatics UFRPE through software "kernel" written in C language and Cuda. This tool allowed to do the interpolation of more than 26 million measurements of rainfall over the entire Northeast, allowing generate maps of rainfall intensity over the entire region, and make estimates in areas of missing data, and calculate statistics for precipitation Northeast in general scope and seasonal. According to the interpolations made could be detected among the studied period, the driest years and wettest, the spatial distribution of rainfall in each month as well as the characteristic of rainfall in times of El Niño and La Niña. / O Nordeste do Brasil possui grande diversidade climática, sendo considerada uma região bastante complexa, despertando o interesse de estudiosos de todo o mundo. O regime de chuvas sobre esta região é considerada sazonal por comportar-se de forma mais intensa sobre três zonas internas da região, em períodos do ano diferenciados, com duração de três meses, além de sofrer fortes influências pela incidência do El Niño, La Niña e outros fenômenos atuantes sobre as bacias dos oceanos Pacífico e Atlântico Tropicais. Neste trabalho foi aplicada a técnica matemática-computacional de interpolação do Kernel Smoothing nos dados de chuva sobre a Região Nordeste do Brasil coletados no período de 1904 a 1998, provenientes de 2.283 estações meteorológicas convencionais localizadas em todos os estados do Nordeste. Os cálculos realizados foram desenvolvidos no GPU "Cluster Neumann" do Programa de Pós-Graduação em Biometria e Estatística Aplicada do Departamento de Estatística e Informática da UFRPE através do software "Kernel" escrito em linguagem C e Cuda. Tal ferramenta possibilitou fazer a interpolação de mais de 26 milhões de medidas de precipitação de chuva sobre todo o Nordeste, permitindo gerar mapas de intensidade de chuva sobre toda a região, além de fazer estimativas em áreas de dados ausentes, e calcular estatísticas para a precipitação do Nordeste em âmbito geral e sazonal. De acordo com as interpolações realizadas foi possível detectar, dentre o período estudado, os anos mais secos e mais chuvosos, a distribuição espacial das chuvas em cada mês, bem como a característica da precipitação pluviométrica em épocas de El Niño e La Niña.
5

Výnosové křivky / Yield Curves

Korbel, Michal January 2019 (has links)
The master thesis is looking into the estimation of yield curve using two ap- proaches. The first one is searching for parametric model which is able to describe the behavior of yield curve well and estimate its parameters. The parametric mo- dels used in the thesis are derived from the class of models introduced by Nelson and Siegel. The second approach is nonparametric estimation of yield curves using spline smoothing and kernel smoothing. All used methods are then compared on real observed data and their suitability for various tasks and concrete available observations is considered. 1
6

An Assessment of The Nonparametric Approach for Evaluating The Fit of Item Response Models

Liang, Tie 01 February 2010 (has links)
As item response theory (IRT) has developed and is widely applied, investigating the fit of a parametric model becomes an important part of the measurement process when implementing IRT. The usefulness and successes of IRT applications rely heavily on the extent to which the model reflects the data, so it is necessary to evaluate model-data fit by gathering sufficient evidence before any model application. There is a lack of promising solutions on the detection of model misfit in IRT. In addition, commonly used fit statistics are not satisfactory in that they often do not possess desirable statistical properties and lack a means of examining the magnitude of misfit (e.g., via graphical inspections). In this dissertation, a newly-proposed nonparametric approach, RISE was thoroughly and comprehensively studied. Specifically, the purposes of this study are to (a) examine the promising fit procedure, RISE, (b) compare the statistical properties of RISE with that of the commonly used goodness-of-fit procedures, and (c) investigate how RISE may be used to examine the consequences of model misfit. To reach the above-mentioned goals, both a simulation study and empirical study were conducted. In the simulation study, four factors including ability distribution, sample size, test length and model were varied as the factors which may influence the performance of a fit statistic. The results demonstrated that RISE outperformed G2 and S-X2 in that it controlled Type I error rates and provided adequate power under all conditions. In the empirical study, the three fit statistics were applied to one empirical data and the misfitting items were flagged. RISE and S-X2 detected reasonable numbers of misfitting items while G2 detected almost all items when sample size is large. To further demonstrate an advantage of RISE, the residual plot on each misfitting item was shown. Compared to G2 and S-X2, RISE gave a much clearer picture of the location and magnitude of misfit for each misfitting item. Other than statistical properties and graphical displays, the score distribution and test characteristic curve (TCC) were investigated as model misfit consequence. The results indicated that for the given data, there was no practical consequence on classification before and after replacement of misfitting items detected by three fit statistics.
7

Visualizing and modeling partial incomplete ranking data

Sun, Mingxuan 23 August 2012 (has links)
Analyzing ranking data is an essential component in a wide range of important applications including web-search and recommendation systems. Rankings are difficult to visualize or model due to the computational difficulties associated with the large number of items. On the other hand, partial or incomplete rankings induce more difficulties since approaches that adapt well to typical types of rankings cannot apply generally to all types. While analyzing ranking data has a long history in statistics, construction of an efficient framework to analyze incomplete ranking data (with or without ties) is currently an open problem. This thesis addresses the problem of scalability for visualizing and modeling partial incomplete rankings. In particular, we propose a distance measure for top-k rankings with the following three properties: (1) metric, (2) emphasis on top ranks, and (3) computational efficiency. Given the distance measure, the data can be projected into a low dimensional continuous vector space via multi-dimensional scaling (MDS) for easy visualization. We further propose a non-parametric model for estimating distributions of partial incomplete rankings. For the non-parametric estimator, we use a triangular kernel that is a direct analogue of the Euclidean triangular kernel. The computational difficulties for large n are simplified using combinatorial properties and generating functions associated with symmetric groups. We show that our estimator is computational efficient for rankings of arbitrary incompleteness and tie structure. Moreover, we propose an efficient learning algorithm to construct a preference elicitation system from partial incomplete rankings, which can be used to solve the cold-start problems in ranking recommendations. The proposed approaches are examined in experiments with real search engine and movie recommendation data.
8

[en] A SUGGESTION FOR THE STRUCTURE IDENTIFICATION OF LINEAR AND NON LINEAR TIME SERIES BY THE USE OF NON PARAMETRIC REGRESSION / [pt] UMA SUGESTÃO PARA IDENTIFICAÇÃO DA ESTRUTURA DE SÉRIES TEMPORAIS, LINEARES E NÃO LINEARES, UTILIZANDO REGRESSÃO NÃO PARAMÉTRICA

ROSANE MARIA KIRCHNER 10 February 2005 (has links)
[pt] Esta pesquisa fundamenta-se na elaboração de uma metodologia para identificação da estrutura de séries temporais lineares e não lineares, baseada na estimação não paramétrica e semi-paramétrica de curvas em modelos do tipo Yt=E(Yt|Xt) +e, onde Xt=(Yt-1, Yt-2,...,Yt-d). Um modelo de regressão linear paramétrico tradicional assume que a forma da função E(Yt|Xt) é linear. O processo de estimação é global, isto é, caso a suposição seja, por exemplo, a de uma função linear, então a mesma reta é usada ao longo do domínio da covariável. Entretanto, tal abordagem pode ser inadequada em muitos casos. Já a abordagem não paramétrica, permite maior flexibilidade na possível forma da função desconhecida, sendo que ela pode ser estimada através de funções núcleo local. Desse modo, somente pontos na vizinhança local do ponto xt , onde se deseja estimar E(Yt|Xt=xt), influenciarão nessa estimativa. Isto é, através de estimadores núcleo, a função desconhecida será estimada através de uma regressão local, em que as observações mais próximas do ponto onde se deseja estimar a curva receberão um peso maior e as mais afastadas, um peso menor. Para estimação da função desconhecida, o parâmetro de suavização h (janela) foi escolhido automaticamente com base na amostra via minimização de resíduos, usando o critério de validação cruzada. Além desse critério, utilizamos intencionalmente valores fixos para o parâmetro h, que foram 0.1, 0.5, 0.8 e 1. Após a estimação da função desconhecida, calculamos o coeficiente de determinação para verificar a dependência de cada defasagem. Na metodologia proposta, verificamos que a função de dependência da defasagem (FDD) e a função de dependência parcial da defasagem (FDPD), fornecem boas aproximações no caso linear da função de autocorrelação (FAC) e da função de autocorrelação parcial (FACP), respectivamente, as quais são utilizadas na análise clássica de séries lineares. A representação gráfica também é muito semelhante àquelas usadas para FAC e FACP. Para a função de dependência parcial da defasagem (FDPD), necessitamos estimar funções multivariadas. Nesse caso, utilizamos um modelo aditivo, cuja estimação é feita através do método backfitting (Hastie e Tibshirani-1990). Para a construção dos intervalos de confiança, foi utilizada a técnica Bootstrap. Conduzimos o estudo de forma a avaliar e comparar a metodologia proposta com metodologias já existentes. As séries utilizadas para esta análise foram geradas de acordo com modelos lineares e não lineares. Para cada um dos modelos foi gerada uma série de 100 ou mais observações. Além dessas, também foi exemplificada com o estudo da estrutura de duas séries de demanda de energia elétrica, uma do DEMEI- Departamento Municipal de Energia de Ijuí, Rio Grande do Sul e outra de uma concessionária da região Centro-Oeste. Utilizamos como terceiro exemplo uma série econômica de ações da Petrobrás. / [en] This paper suggests an approach for the identification of the structure of inear and non-linear time series through non-parametric estimation of the unknown curves in models of the type Y)=E(Yt|Xt =xt) +e , where Xt=(Yt-1,Yt-2,...,Yt- d). A traditional nonlinear parametric model assumes that the form of the function E(Yt,Xt) is known. The estimation process is global, that is, under the assumption of a linear function for instance, then the same line is used along the domain of the covariate. Such an approach may be inadequate in many cases, though. On the other hand, nonparametric regression estimation, allows more flexibility in the possible form of the unknown function, since the function itself can be estimated through a local kernel regression. By doing so, only points in the local neighborhood of the point Xt, where E(Yt|Xt =xt) is to be estimated, will influence this estimate. In other words, with kernel estimators, the unknown function will be estimated by local regression, where the nearest observations to the point where the curve is to be estimated will receive more weight and the farthest ones, a less weight. For the estimation of the unknown function, the smoothing parameter h (window) was chosen automatically based on the sample through minimization of residuals, using the criterion of cross-validation. After the estimation of the unknown function, the determination coefficient is calculated in order to verify the dependence of each lag. Under the proposed methodology, it was verified that the Lag Dependence Function (LDF) and the Partial Lag Dependence Function (PLDF) provide good approximations in the linear case to the function of autocorrelation (ACF) and partial function of autocorrelation (PACF) respectively, used in classical analysis of linear time series. The graphic representation is also very similar to those used in ACF and PACF. For the Partial Lag Dependence Function (PLDF) it becomes necessary to estimate multivariable functions. In this case, an additive model was used, whose estimate is computed through the backfitting method, according to Hastie and Tibshirani (1990). For the construction of confidence intervals, the bootstrap technique was used. The research was conducted to evaluate and compare the proposed methodology to traditional ones. The simulated time series were generated according to linear and nonlinear models. A series of one hundred observations was generated for each model. The approach was illustrated with the study of the structure of two time series of electricity demand of DEMEI- the city department of energy of Ijui, Rio Grande do Sul, Brazil and another of a concessionary of the Centro- Oeste region. We used as third example an economical series of Petrobras.
9

Statistical Predictions Based on Accelerated Degradation Data and Spatial Count Data

Duan, Yuanyuan 04 March 2014 (has links)
This dissertation aims to develop methods for statistical predictions based on various types of data from different areas. We focus on applications from reliability and spatial epidemiology. Chapter 1 gives a general introduction of statistical predictions. Chapters 2 and 3 investigate the photodegradation of an organic coating, which is mainly caused by ultraviolet (UV) radiation but also affected by environmental factors, including temperature and humidity. In Chapter 2, we identify a physically motivated nonlinear mixed-effects model, including the effects of environmental variables, to describe the degradation path. Unit-to-unit variabilities are modeled as random effects. The maximum likelihood approach is used to estimate parameters based on the accelerated test data from laboratory. The developed model is then extended to allow for time-varying covariates and is used to predict outdoor degradation where the explanatory variables are time-varying. Chapter 3 introduces a class of models for analyzing degradation data with dynamic covariate information. We use a general path model with random effects to describe the degradation paths and a vector time series model to describe the covariate process. Shape restricted splines are used to estimate the effects of dynamic covariates on the degradation process. The unknown parameters of these models are estimated by using the maximum likelihood method. Algorithms for computing the estimated lifetime distribution are also described. The proposed methods are applied to predict the photodegradation path of an organic coating in a complicated dynamic environment. Chapter 4 investigates the Lyme disease emergency in Virginia at census tract level. Based on areal (census tract level) count data of Lyme disease cases in Virginia from 1998 to 2011, we analyze the spatial patterns of the disease using statistical smoothing techniques. We also use the space and space-time scan statistics to reveal the presence of clusters in the spatial and spatial/temporal distribution of Lyme disease. Chapter 5 builds a predictive model for Lyme disease based on historical data and environmental/demographical information of each census tract. We propose a Divide-Recombine method to take advantage of parallel computing. We compare prediction results through simulation studies, which show our method can provide comparable fitting and predicting accuracy but can achieve much more computational efficiency. We also apply the proposed method to analyze Virginia Lyme disease spatio-temporal data. Our method makes large-scale spatio-temporal predictions possible. Chapter 6 gives a general review on the contributions of this dissertation, and discusses directions for future research. / Ph. D.
10

EFFICIENT INFERENCE AND DOMINANT-SET BASED CLUSTERING FOR FUNCTIONAL DATA

Xiang Wang (18396603) 03 June 2024 (has links)
<p dir="ltr">This dissertation addresses three progressively fundamental problems for functional data analysis: (1) To do efficient inference for the functional mean model accounting for within-subject correlation, we propose the refined and bias-corrected empirical likelihood method. (2) To identify functional subjects potentially from different populations, we propose the dominant-set based unsupervised clustering method using the similarity matrix. (3) To learn the similarity matrix from various similarity metrics for functional data clustering, we propose the modularity guided and dominant-set based semi-supervised clustering method.</p><p dir="ltr">In the first problem, the empirical likelihood method is utilized to do inference for the mean function of functional data by constructing the refined and bias-corrected estimating equation. The proposed estimating equation not only improves efficiency but also enables practically feasible empirical likelihood inference by properly incorporating within-subject correlation, which has not been achieved by previous studies.</p><p dir="ltr">In the second problem, the dominant-set based unsupervised clustering method is proposed to maximize the within-cluster similarity and applied to functional data with a flexible choice of similarity measures between curves. The proposed unsupervised clustering method is a hierarchical bipartition procedure under the penalized optimization framework with the tuning parameter selected by maximizing the clustering criterion called modularity of the resulting two clusters, which is inspired by the concept of dominant set in graph theory and solved by replicator dynamics in game theory. The advantage offered by this approach is not only robust to imbalanced sizes of groups but also to outliers, which overcomes the limitation of many existing clustering methods.</p><p dir="ltr">In the third problem, the metric-based semi-supervised clustering method is proposed with similarity metric learned by modularity maximization and followed by the above proposed dominant-set based clustering procedure. Under semi-supervised setting where some clustering memberships are known, the goal is to determine the best linear combination of candidate similarity metrics as the final metric to enhance the clustering performance. Besides the global metric-based algorithm, another algorithm is also proposed to learn individual metrics for each cluster, which permits overlapping membership for the clustering. This is innovatively different from many existing methods. This method is superiorly applicable to functional data with various similarity metrics between functional curves, while also exhibiting robustness to imbalanced sizes of groups, which are intrinsic to the dominant-set based clustering approach.</p><p dir="ltr">In all three problems, the advantages of the proposed methods are demonstrated through extensive empirical investigations using simulations as well as real data applications.</p>

Page generated in 0.0348 seconds