1 |
Methods of inference for nonparametric curves and surfacesBock, Mitchum T. January 1999 (has links)
No description available.
|
2 |
Bayesian inference for functionsUpsdell, M. P. January 1985 (has links)
No description available.
|
3 |
Optimal design in regression and spline smoothingCho, Jaerin 19 July 2007 (has links)
This thesis represents an attempt to generalize the classical Theory of Optimal Design to popular regression models, based on Rational and Spline approximations. The problem of finding optimal designs for such models can be reduced to solving certain minimax problems. Explicit solutions to such
problems can be obtained only in a few selected models, such as polynomial regression.
Even when an optimal design can be found, it has, from the point of view of modern nonparametric regression, certain drawbacks. For example, in the polynomial regression case, the optimal design crucially depends on the degree m of approximating polynomial.
Hence, it can be used only when such degree is given/known in advance.
We present a partial, but practical, solution to this problem. Namely, the so-called Super Chebyshev Design has been found, which does not depend on the degree m of the underlying
polynomial regression in a large range of m, and at the same time is asymptotically more than 90% efficient. Similar results are obtained in the case of rational regression, even though the exact form of optimal design in this case remains unknown.
Optimal Designs in the case of Spline Interpolation are also currently unknown. This problem, however, has a simple solution in the case of Cardinal Spline Interpolation. Until recently, this model has been practically unknown in modern nonparametric
regression. We demonstrate the usefulness of Cardinal Kernel Spline Estimates in nonparametric regression, by proving their
asymptotic optimality, in certain classes of smooth functions. In this way, we have found, for the first time, a theoretical justification of a well known empirical observation, by which cubic splines suffice, in most practical applications. / Thesis (Ph.D, Mathematics & Statistics) -- Queen's University, 2007-07-18 16:06:06.767
|
4 |
Asymptotic properties of Non-parametric Regression with Beta KernelsNatarajan, Balasubramaniam January 1900 (has links)
Doctor of Philosophy / Department of Statistics / Weixing Song / Kernel based non-parametric regression is a popular statistical tool to identify the relationship between response and predictor variables when standard parametric regression models are not appropriate. The efficacy of kernel based methods depend both on the kernel choice and the smoothing parameter. With insufficient smoothing, the resulting regression estimate is too rough and with excessive smoothing, important features of the underlying relationship is lost. While the choice of the kernel has been shown to have less of an effect on the quality of regression estimate, it is important to choose kernels to best match the support set of the underlying predictor variables. In the past few decades, there have been multiple efforts to quantify the properties of asymmetric kernel density and regression estimators. Unlike classic symmetric kernel based estimators, asymmetric kernels do not suffer from boundary problems. For example, Beta kernel estimates are especially suitable for investigating the distribution structure of predictor variables with compact support. In this dissertation, two types of Beta kernel based non parametric regression estimators are proposed and analyzed. First, a Nadaraya-Watson type Beta kernel estimator is introduced within the regression setup followed by a local linear regression estimator based on Beta kernels. For both these regression estimators, a comprehensive analysis of its large sample properties is presented. Specifically, for the first time, the asymptotic normality and the uniform almost sure convergence results for the new estimators are established. Additionally, general guidelines for bandwidth selection is provided. The finite sample performance of the proposed estimator is evaluated via both a simulation study and a real data application. The results presented and validated in this dissertation help advance the understanding and use of Beta kernel based methods in other non-parametric regression applications.
|
5 |
Μέθοδοι μη παραμετρικής παλινδρόμησηςΒαρελάς, Γεώργιος 08 July 2011 (has links)
Ένα πράγμα που θέτει τους στατιστικολόγους πέρα από άλλους επιστήμονες είναι σχετική άγνοια του κοινού γενικά σχετικά με το τι είναι στην πραγματικότητα το πεδίο της στατιστικής. Ο κόσμος έχει μια μικρή γενική ιδέα του τι είναι η χημεία ή η βιολογία — αλλά τι είναι αυτό ακριβώς που κάνουν οι στατιστικολόγοι;
Μία απάντηση στο ερώτημα αυτό έχει ως εξής: στατιστική είναι η επιστήμη που ασχολείται με τη συλλογή, περιληπτική παρουσίαση της πληροφορίας, παρουσίαση και ερμηνεία των δεδομένων. Τα δεδομένα είναι το κλειδί, φυσικά — τα πράγματα από τα οποία εμείς αποκτούμε γνώσεις και βγάζουμε αποφάσεις. Ένας πίνακας δεδομένων παρουσιάζει μια συλλογή έγκυρων δεδομένων, αλλά είναι σαφές ότι είναι εντελώς ανεπαρκής για την σύνοψη ή την ερμηνεία τους.Το πρόβλημα είναι ότι δεν έγιναν παραδοχές σχετικά με τη διαδικασία που δημιούργησε αυτά τα δεδομένα (πιο απλά, η ανάλυση είναι καθαρά μη παραμετρική, υπό την έννοια ότι δεν επιβάλλεται καμία τυπική δομή για τα δεδομένα). Επομένως, καμία πραγματική περίληψη ή σύνοψη δεν είναι δυνατή. Η κλασική προσέγγιση σε αυτή τη δυσκολία είναι να υποθέσουμε ένα παραμετρικό μοντέλο για την υποκείμενη διαδικασία, καθορίζοντας μια συγκεκριμένη φόρμα για την υποκείμενη πυκνότητα. Στη συνέχεια, μπορούν να υπολογιστούν διάφορα στατιστικά στοιχεία και μπορούν να παρουσιαστούν μέσω μιας προσαρμοσμένης πυκνότητας.Δυστυχώς, η ισχύς της παραμετρικής μοντελοποίησης είναι επίσης η αδυναμία της. Συνδέοντας ένα συγκεκριμένο μοντέλο, μπορούμε να έχουμε μεγάλα οφέλη, αλλά μόνο εάν το πρότυπο θεωρείται ότι ισχύει (τουλάχιστον κατά προσέγγιση). Εάν το υποτιθέμενο μοντέλο δεν είναι σωστό, οι αποφάσεις που θα αντλήσουμε από αυτό μπορεί να είναι χειρότερες από άχρηστες, οδηγώντας μας σε παραπλανητικές ερμηνείες των δεδομένων. / A thing that places the statisticians beyond other scientists is relative ignorance of public as generally speaking with regard to what it is in reality the field of statistics. The world does have a small general idea what is chemistry or biology - but what is precisely that statisticians do? An answer in this question has as follows: statistics is the science that deals with the collection, general presentation of information, presentation and interpretation of data. The data are the key, from which we acquire knowledge and make decisions. A table of data presents a collection of valid data, but it is obvious that it is completely insufficient for their synopsis or their interpretation. The problem is that no assumptions have been made about the process that created these data (more simply, the analysis is no parametric, under the significance that is no formal structure is imposed on the data). Consequently, no real summary or synopsis is possible. The classical approach in this difficulty is to assume a parametric model for the underlying process, determining a concrete form for the underlying density. Afterwards, can be calculated various statistical elements and a fitted density can manifest itself. The power of parametric modelling is also its weakness. By linking inference to a specific model, we can have big profits, but only if the model is true. If the assumed model is not correct, the decisions that we will draw from this can be worse than useless, leading us to misleading interpretations of data.
|
6 |
Parametric, Nonparametric and Semiparametric Approaches in Profile Monitoring of Poisson DataPiri, Sepehr 01 January 2017 (has links)
Profile monitoring is a relatively new approach in quality control best used when the process data follow a profile (or curve). The majority of previous studies in profile monitoring focused on the parametric modeling of either linear or nonlinear profiles under the assumption of the correct model specification. Our work considers those cases where the parametric model for the family of profiles is unknown or, at least uncertain. Consequently, we consider monitoring Poisson profiles via three methods, a nonparametric (NP) method using penalized splines, a nonparametric (NP) method using wavelets and a semi parametric (SP) procedure that combines both parametric and NP profile fits. Our simulation results show that SP method is robust to the common problem of model misspecification of the user's proposed parametric model. We also showed that Haar wavelets are a better choice than the penalized splines in situations where a sudden jump happens or the jump is edgy.
In addition, we showed that the penalized splines are better than wavelets when the shape of the profiles are smooth. The proposed novel techniques have been applied to a real data set and compare with some state-of-the arts.
|
7 |
On the MSE Performance and Optimization of Regularized ProblemsAlrashdi, Ayed 11 1900 (has links)
The amount of data that has been measured, transmitted/received, and stored
in the recent years has dramatically increased. So, today, we are in the world of big
data. Fortunately, in many applications, we can take advantages of possible structures
and patterns in the data to overcome the curse of dimensionality. The most well
known structures include sparsity, low-rankness, block sparsity. This includes a wide
range of applications such as machine learning, medical imaging, signal processing,
social networks and computer vision. This also led to a specific interest in recovering
signals from noisy compressed measurements (Compressed Sensing (CS) problem).
Such problems are generally ill-posed unless the signal is structured. The structure
can be captured by a regularizer function. This gives rise to a potential interest
in regularized inverse problems, where the process of reconstructing the structured
signal can be modeled as a regularized problem. This thesis particularly focuses
on finding the optimal regularization parameter for such problems, such as ridge
regression, LASSO, square-root LASSO and low-rank Generalized LASSO. Our goal
is to optimally tune the regularizer to minimize the mean-squared error (MSE) of the
solution when the noise variance or structure parameters are unknown. The analysis
is based on the framework of the Convex Gaussian Min-max Theorem (CGMT) that
has been used recently to precisely predict performance errors.
|
8 |
Méthodes de surface de réponse basées sur la décomposition de la variance fonctionnelle et application à l'analyse de sensibilité / Response surface methods based on analysis of variance expansion for sensitivity analysisTouzani, Samir 20 April 2011 (has links)
L'objectif de cette thèse est l'investigation de nouvelles méthodes de surface de réponse afin de réaliser l'analyse de sensibilité de modèles numériques complexes et coûteux en temps de calcul. Pour ce faire, nous nous sommes intéressés aux méthodes basées sur la décomposition ANOVA. Nous avons proposé l'utilisation d'une méthode basée sur les splines de lissage de type ANOVA, alliant procédures d'estimation et de sélection de variables. L'étape de sélection de variable peut devenir très coûteuse en temps de calcul, particulièrement dans le cas d'un grand nombre de paramètre d'entrée. Pour cela nous avons développé un algorithme de seuillage itératif dont l'originalité réside dans sa simplicité d'implémentation et son efficacité. Nous avons ensuite proposé une méthode directe pour estimer les indices de sensibilité. En s'inspirant de cette méthode de surface de réponse, nous avons développé par la suite une méthode adaptée à l'approximation de modèles très irréguliers et discontinus, qui utilise une base d'ondelettes. Ce type de méthode a pour propriété une approche multi-résolution permettant ainsi une meilleure approximation des fonctions à forte irrégularité ou ayant des discontinuités. Enfin, nous nous sommes penchés sur le cas où les sorties du simulateur sont des séries temporelles. Pour ce faire, nous avons développé une méthodologie alliant la méthode de surface de réponse à base de spline de lissage avec une décomposition en ondelettes. Afin d'apprécier l'efficacité des méthodes proposées, des résultats sur des fonctions analytiques ainsi que sur des cas d'ingénierie de réservoir sont présentées. / The purpose of this thesis is to investigate innovative response surface methods to address the problem of sensitivity analysis of complex and computationally demanding computer codes. To this end, we have focused our research work on methods based on ANOVA decomposition. We proposed to use a smoothing spline nonparametric regression method, which is an ANOVA based method that is performed using an iterative algorithm, combining an estimation procedure and a variable selection procedure. The latter can become computationally demanding when dealing with high dimensional problems. To deal with this, we developed a new iterative shrinkage algorithm, which is conceptually simple and efficient. Using the fact that this method is an ANOVA based method, it allows us to introduce a new method for computing sensitivity indices. Inspiring by this response surface method, we developed a new method to approximate the model for which the response involves more complex outputs. This method is based on a multiresolution analysis with wavelet decompositions, which is well known to produce very good approximations on highly nonlinear or discontinuous models. Finally we considered the problem of approximating the computer code when the outputs are times series. We proposed an original method for performing this task, combining the smoothing spline response surface method and wavelet decomposition. To assess the efficiency of the developed methods, numerical experiments on analytical functions and reservoir engineering test cases are presented.
|
9 |
Rare events simulation by shaking transformations : Non-intrusive resampler for dynamic programming / Simulation des événements rares par transformations de shaking : Rééchantillonneur non-intrusif pour la programmation dynamiqueLiu, Gang 23 November 2016 (has links)
Cette thèse contient deux parties: la simulation des événements rares et le rééchantillonnage non-intrusif stratifié pour la programmation dynamique. La première partie consiste à quantifier des statistiques liées aux événements très improbables mais dont les conséquences sont sévères. Nous proposons des transformations markoviennes sur l'espace des trajectoires et nous les combinons avec les systèmes de particules en interaction et l'ergodicité de chaîne de Markov, pour proposer des méthodes performantes et applicables en grande généralité. La deuxième partie consiste à résoudre numériquement le problème de programmation dynamique dans un contexte où nous avons à disposition seulement des données historiques en faible nombre et nous ne connaissons pas les valeurs des paramètres du modèle. Nous développons et analysons un nouveau schéma composé de stratification et rééchantillonnage / This thesis contains two parts: rare events simulation and non-intrusive stratified resampler for dynamic programming. The first part consists of quantifying statistics related to events which are unlikely to happen but which have serious consequences. We propose Markovian transformation on path spaces and combine them with the theories of interacting particle system and of Markov chain ergodicity to propose methods which apply very generally and have good performance. The second part consists of resolving dynamic programming problem numerically in a context where we only have historical observations of small size and we do not know the values of model parameters. We propose and analyze a new scheme with stratification and resampling techniques.
|
10 |
[en] A SUGGESTION FOR THE STRUCTURE IDENTIFICATION OF LINEAR AND NON LINEAR TIME SERIES BY THE USE OF NON PARAMETRIC REGRESSION / [pt] UMA SUGESTÃO PARA IDENTIFICAÇÃO DA ESTRUTURA DE SÉRIES TEMPORAIS, LINEARES E NÃO LINEARES, UTILIZANDO REGRESSÃO NÃO PARAMÉTRICAROSANE MARIA KIRCHNER 10 February 2005 (has links)
[pt] Esta pesquisa fundamenta-se na elaboração de uma
metodologia para identificação da estrutura de séries
temporais lineares e não lineares, baseada na estimação não
paramétrica e semi-paramétrica de curvas em modelos do tipo
Yt=E(Yt|Xt) +e, onde Xt=(Yt-1, Yt-2,...,Yt-d). Um modelo de
regressão linear paramétrico tradicional assume que a forma
da função E(Yt|Xt) é linear. O processo de estimação é
global, isto é, caso a suposição seja, por exemplo, a de
uma função linear, então a mesma reta é usada ao longo do
domínio da covariável. Entretanto, tal abordagem pode ser
inadequada em muitos casos. Já a abordagem não paramétrica,
permite maior flexibilidade na possível forma da função
desconhecida, sendo que ela pode ser estimada através de
funções núcleo local. Desse modo, somente pontos na
vizinhança local do ponto xt , onde se deseja estimar
E(Yt|Xt=xt), influenciarão nessa estimativa. Isto é,
através de estimadores núcleo, a função desconhecida será
estimada através de uma regressão local, em que as
observações mais próximas do ponto onde se deseja estimar a
curva receberão um peso maior e as mais afastadas, um peso
menor. Para estimação da função desconhecida, o parâmetro
de suavização h (janela) foi escolhido automaticamente com
base na amostra via minimização de resíduos, usando o
critério de validação cruzada. Além desse critério,
utilizamos intencionalmente valores fixos para o parâmetro
h, que foram 0.1, 0.5, 0.8 e 1. Após a estimação da função
desconhecida, calculamos o coeficiente de determinação para
verificar a dependência de cada defasagem. Na metodologia
proposta, verificamos que a função de dependência da
defasagem (FDD) e a função de dependência parcial da
defasagem (FDPD), fornecem boas aproximações no caso linear
da função de autocorrelação (FAC) e da função de
autocorrelação parcial (FACP), respectivamente, as quais
são utilizadas na análise clássica de séries lineares. A
representação gráfica também é muito semelhante àquelas
usadas para FAC e FACP. Para a função de dependência
parcial da defasagem (FDPD), necessitamos estimar funções
multivariadas. Nesse caso, utilizamos um modelo aditivo,
cuja estimação é feita através do método backfitting
(Hastie e Tibshirani-1990). Para a construção dos
intervalos de confiança, foi utilizada a técnica Bootstrap.
Conduzimos o estudo de forma a avaliar e comparar a
metodologia proposta com metodologias já existentes. As
séries utilizadas para esta análise foram geradas de acordo
com modelos lineares e não lineares. Para cada um dos
modelos foi gerada uma série de 100 ou mais observações.
Além dessas, também foi exemplificada com o estudo da
estrutura de duas séries de demanda de energia elétrica,
uma do DEMEI- Departamento Municipal de Energia de Ijuí,
Rio Grande do Sul e outra de uma concessionária da região
Centro-Oeste. Utilizamos como terceiro exemplo uma série
econômica de ações da Petrobrás. / [en] This paper suggests an approach for the identification of
the structure of inear and non-linear time series through
non-parametric estimation of the unknown curves in models
of the type Y)=E(Yt|Xt =xt) +e , where Xt=(Yt-1,Yt-2,...,Yt-
d). A traditional nonlinear parametric model assumes that
the form of the function E(Yt,Xt) is known. The estimation
process is global, that is, under the assumption of a
linear function for instance, then the same line is used
along the domain of the covariate. Such an approach may be
inadequate in many cases, though. On the other hand,
nonparametric regression estimation, allows more
flexibility in the possible form of the unknown function,
since the function itself can be estimated through a local
kernel regression. By doing so, only points in the local
neighborhood of the point Xt, where E(Yt|Xt =xt) is to be
estimated, will influence this estimate. In other words,
with kernel estimators, the unknown function will be
estimated by local regression, where the nearest
observations to the point where the curve is to be
estimated will receive more weight and the farthest ones, a
less weight. For the estimation of the unknown function, the
smoothing parameter h (window) was chosen automatically
based on the sample through minimization of residuals,
using the criterion of cross-validation. After the
estimation of the unknown function, the determination
coefficient is calculated in order to verify the dependence
of each lag. Under the proposed methodology, it was
verified that the Lag Dependence Function (LDF) and the
Partial Lag Dependence Function (PLDF) provide good
approximations in the linear case to the function of
autocorrelation (ACF) and partial function of
autocorrelation (PACF) respectively, used in classical
analysis of linear time series. The graphic representation
is also very similar to those used in ACF and PACF.
For the Partial Lag Dependence Function (PLDF) it becomes
necessary to estimate multivariable functions. In this
case, an additive model was used, whose estimate is
computed through the backfitting method, according to
Hastie and Tibshirani (1990). For the construction of
confidence intervals, the bootstrap technique was used.
The research was conducted to evaluate and compare the
proposed methodology to traditional ones. The simulated
time series were generated according to linear and nonlinear
models. A series of one hundred observations was generated
for each model. The approach was illustrated with the study
of the structure of two time series of electricity demand
of DEMEI- the city department of energy of Ijui, Rio Grande
do Sul, Brazil and another of a concessionary of the Centro-
Oeste region. We used as third example an economical series
of Petrobras.
|
Page generated in 0.1121 seconds