• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 80
  • 17
  • 9
  • 7
  • 7
  • 6
  • 5
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 168
  • 168
  • 41
  • 41
  • 35
  • 32
  • 29
  • 29
  • 23
  • 22
  • 18
  • 17
  • 17
  • 15
  • 15
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
81

Estimation fonctionnelle non paramétrique au voisinage du bord / Functional non-parametric estimation near the edge

Jemai, Asma 16 March 2018 (has links)
L’objectif de cette thèse est de construire des estimateurs non-paramétriques d’une fonction de distribution, d’une densité de probabilité et d’une fonction de régression en utilisant les méthodes d’approximation stochastiques afin de corriger l’effet du bord créé par les estimateurs à noyaux continus classiques. Dans le premier chapitre, on donne quelques propriétés asymptotiques des estimateurs continus à noyaux. Puis, on présente l’algorithme stochastique de Robbins-Monro qui permet d’introduire les estimateurs récursifs. Enfin, on rappelle les méthodes utilisées par Vitale, Leblanc et Kakizawa pour définir des estimateurs d’une fonction de distribution et d’une densité de probabilité en se basant sur les polynômes de Bernstein.Dans le deuxième chapitre, on a introduit un estimateur récursif d’une fonction de distribution en se basant sur l’approche de Vitale. On a étudié les propriétés de cet estimateur : biais, variance, erreur quadratique intégré (MISE) et on a établi sa convergence ponctuelle faible. On a comparé la performance de notre estimateur avec celle de Vitale et on a montré qu’avec le bon choix du pas et de l’ordre qui lui correspond notre estimateur domine en terme de MISE. On a confirmé ces résultatsthéoriques à l’aide des simulations. Pour la recherche pratique de l’ordre optimal, on a utilisé la méthode de validation croisée. Enfin, on a confirmé les meilleures qualités de notre estimateur à l’aide des données réelles. Dans le troisième chapitre, on a estimé une densité de probabilité d’une manière récursive en utilisant toujours les polynômes de Bernstein. On a donné les caractéristiques de cet estimateur et on les a comparées avec celles de l’estimateur de Vitale, de Leblanc et l’estimateur donné par Kakizawa en utilisant la méthode multiplicative de correction du biais. On a appliqué notre estimateur sur des données réelles. Dans le quatrième chapitre, on a introduit un estimateur récursif et non récursif d’une fonction de régression en utilisant les polynômes de Bernstein. On a donné les caractéristiques de cet estimateur et on les a comparées avec celles de l’estimateur à noyau classique. Ensuite, on a utilisé notre estimateur pour interpréter des données réelles. / The aim of this thesis is to construct nonparametric estimators of distribution, density and regression functions using stochastic approximation methods in order to correct the edge effect created by kernels estimators. In the first chapter, we givesome asymptotic properties of kernel estimators. Then, we introduce the Robbins-Monro stochastic algorithm which creates the recursive estimators. Finally, we recall the methods used by Vitale, Leblanc and Kakizawa to define estimators of distribution and density functions based on Bernstein polynomials. In the second chapter, we introduced a recursive estimator of a distribution function based on Vitale’s approach. We studied the properties of this estimator : bias, variance, mean integratedsquared error (MISE) and we established a weak pointwise convergence. We compared the performance of our estimator with that of Vitale and we showed that, with the right choice of the stepsize and its corresponding order, our estimator dominatesin terms of MISE. These theoretical results were confirmed using simulations. We used the cross-validation method to search the optimal order. Finally, we applied our estimator to interpret real dataset. In the third chapter, we introduced a recursive estimator of a density function using Bernstein polynomials. We established the characteristics of this estimator and we compared them with those of the estimators of Vitale, Leblanc and Kakizawa. To highlight our proposed estimator, we used real dataset. In the fourth chapter, we introduced a recursive and non-recursive estimator of a regression function using Bernstein polynomials. We studied the characteristics of this estimator. Then, we compared our proposed estimator with the classical kernel estimator using real dataset.
82

Asymptotischer Vergleich höherer Ordnung adaptiver linearer Schätzer in Regressionsmodellen

Ilouga, Pierre Emmanuel 20 April 2001 (has links)
Das Hauptanliegen dieser Arbeit ist der asymptotische Vergleich verschiedener nichtparametrischer adaptiver Schätzer des Mittelwertvektors in einer Regressionssituation mit wachsender Anzahl von Wiederholungen an festen Versuchspunkten. Da die adaptiven Schätzer nicht mehr linear in den Beobachtungen sind, wird ihre mittleren quadratischen Fehler durch ihre Risiken höherer Ordnung approximiert und dadurch wird ein Vergleich unter Annahme normalverteilter Beobachtungsfehlern ermöglicht. Es wird gezeigt, daß der Plug-In Schätzer des unbekannten Mittelwertvektors in dritter Ordnung besser ist als die anderen adaptiven Schätzer und, daß die mit Hilfe der Full Cross-validation Idee konstruierte Schätzer des Mittelwertvektors besser ist als die Schätzung mit Cross-validation, falls die unbekannte Regressionsfunktion "unglatt" ist. In speziellen Situationen ist die Full Cross-validation Adaptation besser als die Adaptationen mit Hilfe der in der Arbeit betrachteten "automatischen" Kriterien. Es wird außerdem einen Schätzer des Mittelwertvektors mit einer Plug-In Idee konstruiert, dessen Risiken zweiter Ordnung kleiner sind als die Risiken zweiter Ordnung der anderen adaptiven Schätzer. Wenn aber eine Vermutung vorliegt, daß die unbekannte Regressionsfunktion "sehr glatt" ist, wird bewiesen daß der Projektionsschätzer den kleinsten mittleren quadratischen Fehler liefert. / The main objective of this thesis is the asymptotic comparison of various nonparametric adaptive estimators of the mean vector in a regression situation with increasing number of replications at fixed design points. Since the adaptive estimators are no longer linear in the observations, one approximates their mean square errors by their heigher order risks and a comparison under the assumption of normal distributed errors of the observations will be enabled. It is shown that the Plug-In estimators of the unknown mean vector is better in third order than the others adaptive estimators and that the estimator defined with the full cross-validation idea is better than the estimator with cross-validation, if the unknown regression function is "non smooth". In some special situations, the full cross-validation adaptation will be better than the adaptations using a minimizer of the "automatic" criteria considered in this thesis. Additionally, an estimator of the mean vector is proposed, whose second order risk is smaller than the second order risks of the other adaptive estimators. If however one presumes that the unknown regression function is "very smooth", then it is shown that the projection estimator gives the smallest mean square error.
83

[en] A SUGGESTION FOR THE STRUCTURE IDENTIFICATION OF LINEAR AND NON LINEAR TIME SERIES BY THE USE OF NON PARAMETRIC REGRESSION / [pt] UMA SUGESTÃO PARA IDENTIFICAÇÃO DA ESTRUTURA DE SÉRIES TEMPORAIS, LINEARES E NÃO LINEARES, UTILIZANDO REGRESSÃO NÃO PARAMÉTRICA

ROSANE MARIA KIRCHNER 10 February 2005 (has links)
[pt] Esta pesquisa fundamenta-se na elaboração de uma metodologia para identificação da estrutura de séries temporais lineares e não lineares, baseada na estimação não paramétrica e semi-paramétrica de curvas em modelos do tipo Yt=E(Yt|Xt) +e, onde Xt=(Yt-1, Yt-2,...,Yt-d). Um modelo de regressão linear paramétrico tradicional assume que a forma da função E(Yt|Xt) é linear. O processo de estimação é global, isto é, caso a suposição seja, por exemplo, a de uma função linear, então a mesma reta é usada ao longo do domínio da covariável. Entretanto, tal abordagem pode ser inadequada em muitos casos. Já a abordagem não paramétrica, permite maior flexibilidade na possível forma da função desconhecida, sendo que ela pode ser estimada através de funções núcleo local. Desse modo, somente pontos na vizinhança local do ponto xt , onde se deseja estimar E(Yt|Xt=xt), influenciarão nessa estimativa. Isto é, através de estimadores núcleo, a função desconhecida será estimada através de uma regressão local, em que as observações mais próximas do ponto onde se deseja estimar a curva receberão um peso maior e as mais afastadas, um peso menor. Para estimação da função desconhecida, o parâmetro de suavização h (janela) foi escolhido automaticamente com base na amostra via minimização de resíduos, usando o critério de validação cruzada. Além desse critério, utilizamos intencionalmente valores fixos para o parâmetro h, que foram 0.1, 0.5, 0.8 e 1. Após a estimação da função desconhecida, calculamos o coeficiente de determinação para verificar a dependência de cada defasagem. Na metodologia proposta, verificamos que a função de dependência da defasagem (FDD) e a função de dependência parcial da defasagem (FDPD), fornecem boas aproximações no caso linear da função de autocorrelação (FAC) e da função de autocorrelação parcial (FACP), respectivamente, as quais são utilizadas na análise clássica de séries lineares. A representação gráfica também é muito semelhante àquelas usadas para FAC e FACP. Para a função de dependência parcial da defasagem (FDPD), necessitamos estimar funções multivariadas. Nesse caso, utilizamos um modelo aditivo, cuja estimação é feita através do método backfitting (Hastie e Tibshirani-1990). Para a construção dos intervalos de confiança, foi utilizada a técnica Bootstrap. Conduzimos o estudo de forma a avaliar e comparar a metodologia proposta com metodologias já existentes. As séries utilizadas para esta análise foram geradas de acordo com modelos lineares e não lineares. Para cada um dos modelos foi gerada uma série de 100 ou mais observações. Além dessas, também foi exemplificada com o estudo da estrutura de duas séries de demanda de energia elétrica, uma do DEMEI- Departamento Municipal de Energia de Ijuí, Rio Grande do Sul e outra de uma concessionária da região Centro-Oeste. Utilizamos como terceiro exemplo uma série econômica de ações da Petrobrás. / [en] This paper suggests an approach for the identification of the structure of inear and non-linear time series through non-parametric estimation of the unknown curves in models of the type Y)=E(Yt|Xt =xt) +e , where Xt=(Yt-1,Yt-2,...,Yt- d). A traditional nonlinear parametric model assumes that the form of the function E(Yt,Xt) is known. The estimation process is global, that is, under the assumption of a linear function for instance, then the same line is used along the domain of the covariate. Such an approach may be inadequate in many cases, though. On the other hand, nonparametric regression estimation, allows more flexibility in the possible form of the unknown function, since the function itself can be estimated through a local kernel regression. By doing so, only points in the local neighborhood of the point Xt, where E(Yt|Xt =xt) is to be estimated, will influence this estimate. In other words, with kernel estimators, the unknown function will be estimated by local regression, where the nearest observations to the point where the curve is to be estimated will receive more weight and the farthest ones, a less weight. For the estimation of the unknown function, the smoothing parameter h (window) was chosen automatically based on the sample through minimization of residuals, using the criterion of cross-validation. After the estimation of the unknown function, the determination coefficient is calculated in order to verify the dependence of each lag. Under the proposed methodology, it was verified that the Lag Dependence Function (LDF) and the Partial Lag Dependence Function (PLDF) provide good approximations in the linear case to the function of autocorrelation (ACF) and partial function of autocorrelation (PACF) respectively, used in classical analysis of linear time series. The graphic representation is also very similar to those used in ACF and PACF. For the Partial Lag Dependence Function (PLDF) it becomes necessary to estimate multivariable functions. In this case, an additive model was used, whose estimate is computed through the backfitting method, according to Hastie and Tibshirani (1990). For the construction of confidence intervals, the bootstrap technique was used. The research was conducted to evaluate and compare the proposed methodology to traditional ones. The simulated time series were generated according to linear and nonlinear models. A series of one hundred observations was generated for each model. The approach was illustrated with the study of the structure of two time series of electricity demand of DEMEI- the city department of energy of Ijui, Rio Grande do Sul, Brazil and another of a concessionary of the Centro- Oeste region. We used as third example an economical series of Petrobras.
84

Critérios robustos de seleção de modelos de regressão e identificação de pontos aberrantes / Robust model selection criteria in regression and outliers identification

Guirado, Alia Garrudo 08 March 2019 (has links)
A Regressão Robusta surge como uma alternativa ao ajuste por mínimos quadrados quando os erros são contaminados por pontos aberrantes ou existe alguma evidência de violação das suposições do modelo. Na regressão clássica existem critérios de seleção de modelos e medidas de diagnóstico que são muito conhecidos. O objetivo deste trabalho é apresentar os principais critérios robustos de seleção de modelos e medidas de detecção de pontos aberrantes, assim como analisar e comparar o desempenho destes de acordo com diferentes cenários para determinar quais deles se ajustam melhor a determinadas situações. Os critérios de validação cruzada usando simulações de Monte Carlo e o Critério de Informação Bayesiano são conhecidos por desenvolver-se de forma adequada na identificação de modelos. Na dissertação confirmou-se este fato e além disso, suas alternativas robustas também destacam-se neste aspecto. A análise de resíduos constitui uma forte ferramenta da análise diagnóstico de um modelo, no trabalho detectou-se que a análise clássica de resíduos sobre o ajuste do modelo de regressão linear robusta, assim como a análise das ponderações das observações, são medidas de detecção de pontos aberrantes eficientes. Foram aplicados os critérios e medidas analisados ao conjunto de dados obtido da Estação Meteorológica do Instituto de Astronomia, Geofísica e Ciências Atmosféricas da Universidade de São Paulo para detectar quais variáveis meteorológicas influem na temperatura mínima diária durante o ano completo, e ajustou-se um modelo que permite identificar os dias associados à entrada de sistemas frontais. / Robust Regression arises as an alternative to least squares method when errors are contaminated by outliers points or there are some evidence of violation of model assumptions. In classical regression there are several criteria for model selection and diagnostic measures that are well known. The objective of this work is to present the main robust criteria of model selection and outliers detection measures, as well as to analyze and compare their performance according to different stages to determine which of them fit better in certain situations. The cross-validation criteria using Monte Carlo simulations and Beyesian Information Criterion are known to be adequately developed in model identification. This fact was confirmed, and in addition, its robust alternatives also stand out in this aspect. The residual analysis is a strong tool for model diagnostic analysis, in this work it was detected that the classic residual analysis on the robust linear model regression fit, as well as the analysis of the observations weights, are efficient measures of outliers detection points. The analyzed criteria and measures were applied to the data set obtained from the Meteorological Station of the Astronomy, Geophysics and Atmospheric Sciences Institute of São Paulo University to detect which meteorological variables influence the daily minimum temperature during the whole year, and was fitted a model that allows identify the days associated with the entry of frontal systems.
85

Iterative interpolation of daily precipitation data over Alberta

Dai, Qingfang 06 1900 (has links)
This thesis develops an optimal interpolation method that takes daily precipitation values collected from weather stations and produces precipitation estimates on a grid. The method, called Hybrid 2.0, combines EOF-based linear interpolation with the nearest-station method. Gridded monthly precipitation is first obtained via EOF, then distributed among days via nearest station. Hybrid 2.0 builds on an earlier method, called Hybrid 1.0, that applies an inverse-distance weighting method to obtain gridded monthly values. Hybrid 2.0 uses these monthly Hybrid 1.0 values as inputs when constructing EOF functions. The data used in this thesis were obtained from the Meteorological Service of Canada. Few weather stations were located in the northern and mountain regions of Alberta prior to 1950. As a result, the Hybrid 1.0 gridded results underestimate precipitation in these regions for that period. The main contribution of Hybrid 2.0 is a substantial reduction in this bias, obtained by implicitly taking topographic elevation into account. Bias reduction is achieved by extracting EOFs from Hybrid 1.0 output for 1951-2002, when many more stations were present in the northern and mountain regions. Hybrid 2.0 is shown to be more accurate in interpolating both monthly and daily precipitation in Alberta, when compared with Hybrid 1.0 and other methods. The thesis also provides detailed analyses of precipitation trends and droughts using the gridded Hybrid 2.0 daily values. Optimality of the selected EOF modes and sensitivity to data error in the EOF-based linear reconstruction are also discussed in this thesis. Agricultural uses of historical climate data have become extremely important. Applications include: enabling prompt, optimal decisions on market prices and disaster aid, designing future agricultural practices such as adaptation to climate and technology changes, and managing risks for agricultural producers and governments in areas such as drought monitoring. Many applications require a reliable interpolation technique to accurately reconstruct daily climate estimates onto grids of various resolutions. The gridded Hybrid 2.0 daily precipitation values produced by this thesis satisfy this requirement and can be used as inputs for many agricultural applications. / Applied Mathematics
86

Some Topics in Roc Curves Analysis

Huang, Xin 07 May 2011 (has links)
The receiver operating characteristic (ROC) curves is a popular tool for evaluating continuous diagnostic tests. The traditional definition of ROC curves incorporates implicitly the idea of "hard" thresholding, which also results in the empirical curves being step functions. The first topic is to introduce a novel definition of soft ROC curves, which incorporates the idea of "soft" thresholding. The softness of a soft ROC curve is controlled by a regularization parameter that can be selected suitably by a cross-validation procedure. A byproduct of the soft ROC curves is that the corresponding empirical curves are smooth. The second topic is on combination of several diagnostic tests to achieve better diagnostic accuracy. We consider the optimal linear combination that maximizes the area under the receiver operating characteristic curve (AUC); the estimates of the combination's coefficients can be obtained via a non-parametric procedure. However, for estimating the AUC associated with the estimated coefficients, the apparent estimation by re-substitution is too optimistic. To adjust for the upward bias, several methods are proposed. Among them the cross-validation approach is especially advocated, and an approximated cross-validation is developed to reduce the computational cost. Furthermore, these proposed methods can be applied for variable selection to select important diagnostic tests. However, the above best-subset variable selection method is not practical when the number of diagnostic tests is large. The third topic is to further develop a LASSO-type procedure for variable selection. To solve the non-convex maximization problem in the proposed procedure, an efficient algorithm is developed based on soft ROC curves, difference convex programming, and coordinate descent algorithm.
87

Infrared Spectroscopy in Combination with Advanced Statistical Methods for Distinguishing Viral Infected Biological Cells

Tang, Tian 17 November 2008 (has links)
Fourier Transform Infrared (FTIR) microscopy is a sensitive method for detecting difference in the morphology of biological cells. In this study FTIR spectra were obtained for uninfected cells, and cells infected with two different viruses. The spectra obtained are difficult to discriminate visually. Here we apply advanced statistical methods to the analysis of the spectra, to test if such spectra are useful for diagnosing viral infections in cells. Logistic Regression (LR) and Partial Least Squares Regression (PLSR) were used to build models which allow us to diagnose if spectral differences are related to infection state of the cells. A three-fold, balanced cross-validation method was applied to estimate the shrinkages of the area under the receiving operator characteristic curve (AUC), and specificities at sensitivities of 95%, 90% and 80%. AUC, sensitivity and specificity were used to gauge the goodness of the discrimination methods. Our statistical results shows that the spectra associated with different cellular states are very effectively discriminated. We also find that the overall performance of PLSR is better than that of LR, especially for new data validation. Our analysis supports the idea that FTIR microscopy is a useful tool for detection of viral infections in biological cells.
88

Advanced Statistical Methodologies in Determining the Observation Time to Discriminate Viruses Using FTIR

Luo, Shan 13 July 2009 (has links)
Fourier transform infrared (FTIR) spectroscopy, one method of electromagnetic radiation for detecting specific cellular molecular structure, can be used to discriminate different types of cells. The objective is to find the minimum time (choice among 2 hour, 4 hour and 6 hour) to record FTIR readings such that different viruses can be discriminated. A new method is adopted for the datasets. Briefly, inner differences are created as the control group, and Wilcoxon Signed Rank Test is used as the first selecting variable procedure in order to prepare the next stage of discrimination. In the second stage we propose either partial least squares (PLS) method or simply taking significant differences as the discriminator. Finally, k-fold cross-validation method is used to estimate the shrinkages of the goodness measures, such as sensitivity, specificity and area under the ROC curve (AUC). There is no doubt in our mind 6 hour is enough for discriminating mock from Hsv1, and Coxsackie viruses. Adeno virus is an exception.
89

Data-driven estimation for Aalen's additive risk model

Boruvka, Audrey 02 August 2007 (has links)
The proportional hazards model developed by Cox (1972) is by far the most widely used method for regression analysis of censored survival data. Application of the Cox model to more general event history data has become possible through extensions using counting process theory (e.g., Andersen and Borgan (1985), Therneau and Grambsch (2000)). With its development based entirely on counting processes, Aalen’s additive risk model offers a flexible, nonparametric alternative. Ordinary least squares, weighted least squares and ridge regression have been proposed in the literature as estimation schemes for Aalen’s model (Aalen (1989), Huffer and McKeague (1991), Aalen et al. (2004)). This thesis develops data-driven parameter selection criteria for the weighted least squares and ridge estimators. Using simulated survival data, these new methods are evaluated against existing approaches. A survey of the literature on the additive risk model and a demonstration of its application to real data sets are also provided. / Thesis (Master, Mathematics & Statistics) -- Queen's University, 2007-07-18 22:13:13.243
90

Iterative interpolation of daily precipitation data over Alberta

Dai, Qingfang Unknown Date
No description available.

Page generated in 0.1041 seconds