Spelling suggestions: "subject:"bootstrap"" "subject:"gbootstrap""
271 |
Estimation de paramètres et planification d’expériences adaptée aux problèmes de cinétique - Application à la dépollution des fumées en sortie des moteurs / Parameter estimation and design of experiments adapted to kinetics problems - Application for depollution of exhaust smoke from the output of enginesCanaud, Matthieu 14 September 2011 (has links)
Les modèles physico-chimiques destinés à représenter la réalité expérimentale peuvent se révéler inadéquats. C'est le cas du piège à oxyde d'azote, utilisé comme support applicatif de notre thèse, qui est un système catalytique traitant les émissions polluantes du moteur Diesel. Les sorties sont des courbes de concentrations des polluants, qui sont des données fonctionnelles, dépendant de concentrations initiales scalaires.L'objectif initial de cette thèse est de proposer des plans d'expériences ayant un sens pour l'utilisateur. Cependant les plans d'expérience s'appuyant sur des modèles, l'essentiel du travail a conduit à proposer une représentation statistique tenant compte des connaissances des experts, et qui permette de construire ce plan.Trois axes de recherches ont été explorés. Nous avons d'abord considéré une modélisation non fonctionnelle avec le recours à la théorie du krigeage. Puis, nous avons pris en compte la dimension fonctionnelle des réponses, avec l'application et l'extension des modèles à coefficients variables. Enfin en repartant du modèle initial, nous avons fait dépendre les paramètres cinétiques des entrées (scalaires) à l'aide d'une représentation non paramétrique.Afin de comparer les méthodes, il a été nécessaire de mener une campagne expérimentale, et nous proposons une démarche de plan exploratoire, basée sur l’entropie maximale. / Physico-chemical models designed to represent experimental reality may prove to be inadequate. This is the case of nitrogen oxide trap, used as an application support of our thesis, which is a catalyst system treating the emissions of the diesel engine. The outputs are the curves of concentrations of pollutants, which are functional data, depending on scalar initial concentrations.The initial objective of this thesis is to propose experiental design that are meaningful to the user. However, the experimental design relying on models, most of the work has led us to propose a statistical representation taking into account the expert knowledge, and allows to build this plan.Three lines of research were explored. We first considered a non-functional modeling with the use of kriging theory. Then, we took into account the functional dimension of the responses, with the application and extension of varying coefficent models. Finally, starting again from the original model, we developped a model depending on the kinetic parameters of the inputs (scalar) using a nonparametric representation.To compare the methods, it was necessary to conduct an experimental campaign, and we propose an exploratory design approach, based on maximum entropy.
|
272 |
[en] METHODOLOGY FOR MEASURING THE IMPACT OF THE PHOTOVOLTAIC GENERATION POTENTIAL TO LONG-TERM ENERGETIC OPERATION PLANNING / [pt] METODOLOGIA PARA MENSURAÇÃO DO IMPACTO DO POTENCIAL DE GERAÇÃO FOTOVOLTAICA NO PLANEJAMENTO DA OPERAÇÃO ENERGÉTICA DE MÉDIO PRAZOSILVIA REGINA DOS SANTOS GONÇALVES 17 August 2017 (has links)
[pt] Nos últimos anos, o Brasil tem enfrentado frequentes desafios para garantir o suprimento de energia elétrica. A produção de energia fotovoltaica tem ganhado destaque, pois a fonte solar é um recurso abundante e renovável. No entanto, no âmbito do planejamento da operação energética de médio prazo, a estimativa proveniente dos parques geradores fotovoltaicos tem caráter determinístico. Quanto à geração fotovoltaica distribuída, pouco se sabe como é feita sua consideração para os próximos anos. Nesse contexto, essa dissertação propõe uma metodologia para mensuração do impacto do potencial de geração fotovoltaica residencial no planejamento da operação energética de médio prazo. Para alcançar tal objetivo, utilizou-se a metodologia Box and Jenkins com simulação de cenários via Bootstrap, levando em consideração os níveis de irradiação solar, a área de telhado útil e a eficiência na conversão do recurso solar em eletricidade. Os principais resultados dessa dissertação são: custo total da operação, custo marginal de operação, energia armazenada, custo de déficit, risco de déficit, geração hidráulica, geração térmica, intercâmbio de energia e custo de vertimento. A consideração de cenários do potencial de geração fotovoltaica residencial reduziu o custo total da operação nos Programas Mensais da Operação Energética de janeiro/2015 e janeiro/2016, chegando a máxima redução de (7,8 por cento) e (1,5 por cento), respectivamente. Os demais resultados também foram impactados. Conclui-se que, a geração fotovoltaica residencial impacta, significativamente, o planejamento da operação energética de médio prazo, sendo necessário o desenvolvimento de outros estudos para avaliar a inserção e evolução dessa geração na matriz energética brasileira. / [en] In recent years, Brazil has faced frequent challenges to ensure the supply of electricity. The production of photovoltaic energy has gained prominence, because the solar source is an abundant and renewable resource. However, in the context of long-term energy operation planning, the estimation from photovoltaic generating parks is deterministic. As for distributed photovoltaic generation, little is known how its consideration is made for the next few years. In this context, this Master s thesis proposes a methodology for measuring the impact of the potential of residential photovoltaic generation in the planning of medium-term energy operation. In order to achieve this objective, the Box and Jenkins methodology was simulated using Bootstrap scenarios, taking into account the levels of solar radiation, the useful roof area and the efficiency in the conversion of the solar resource into electricity. The main results of this dissertation are: total cost of operation, marginal cost of operation, stored energy, cost of deficit, deficit risk, hydraulic generation, thermal generation, energy exchange and delivery cost. The consideration of scenarios of the potential of residential photovoltaic generation reduced the total cost of operation in the Monthly Energy Operation Programs of January 2015 and January 2016, reaching the maximum reduction of 7.8 percent and 1.5 percent), Respectively. The other results were also impacted. It is concluded that, residential photovoltaic generation significantly impacts medium-term energy operation planning, and it is necessary to develop other studies to evaluate the insertion and evolution of this generation in the Brazilian energy matrix.
|
273 |
[pt] MODELAGEM HÍBRIDA WAVELET INTEGRADA COM BOOTSTRAP NA PROJEÇÃO DE SÉRIES TEMPORAIS / [en] MODELING HYBRID WAVELET INTEGRATED WITH BOOTSTRAP IN PROJECTION TEMPORAL SERIESRICARDO VELA DE BRITTO PEREIRA 31 March 2016 (has links)
[pt] Na previsão de séries temporais, alguns autores supõem que um método de previsão individual (por exemplo, um modelo ARIMA) produz resíduos (ou erros de previsão) semelhantes a um processo de ruído branco (imprevisível). No entanto, principalmente devido às estruturas de autodependência não mapeadas por um método preditivo individual, tal suposição pode ser facilmente violada na prática. Esta tese propõe um Previsor Híbrido Wavelet (PHW) que integra as seguintes técnicas: decomposição wavelet; modelos ARIMA; redes neurais artificiais (RNAs); combinação de previsões; programação matemática não linear e amostrador Bootstrap. Em termos gerais, o PHW proposto aqui é capaz de capturar, ao mesmo tempo, estruturas com autodependência linear por meio de uma combinação linear wavelet (CLW) de modelos ARIMA, (cujo ajuste numérico ótimo ocorre por programação matemática não linear) e não linear (usando uma RNA wavelet automática) exibidas pela série de tempo a ser predita. Diferentemente de outras abordagens híbridas existentes na literatura, as previsões híbridas produzidas pela PHW proposto levam em conta implicitamente, através da abordagem de decomposição wavelet, as informações oriundas da frequência espectral presentes na série temporal subjacente. Os resultados estatísticos mostram que a metodologia híbrida supracitada alcançou ganhos de precisão relevantes no processo preditivo de quatro séries de tempo diferentes bem conhecidas, quando se compara com outras meteorologistas competitivas. / [en] In time series analysis some authors presume that a single model (an ARIMA for instance) may yield white noise errors. However that assumption can be easily violated, especially in scenarios where unmapped auto dependency structures are present inside the series. With that being said, this thesis proposes a new approach called Hybrid Wavelet Predictor (HWP) which integrates the following techniques: Wavelet Decomposition, ARIMA models, Neural Networks (NN), Combined Prediction, Non-linear mathematical programming and Bootstrap Sampling. In a broad sense, the proposed HWP is able to capture not only the linear auto-dependent structures from ARIMA using linear wavelet combination (where its optimal numerical adjustment is made through non-linear mathematical programming), but also the non-linear structures by using Neural Network. Differently from others hybrid approaches known to date, the hybrid predictions given by HWP model take into account. Statistical tests show that the hybrid approach stated above increased the prediction s effectiveness by a significant amount when compared with four well known processes.
|
274 |
[en] BANKRUPTCY PREDICTION FOR AMERICAN INDUSTRY: CALIBRATING THE ALTMAN S Z-SCORE / [pt] PREVISÃO DE FALÊNCIA PARA INDUSTRIA AÉREA AMERICANA: CALIBRANDO O Z-SCORE DE ALTMAN23 September 2020 (has links)
[pt] Os estudos de modelos de previsão de falência tiveram seu início há quase 90 anos, sempre com o intuito de ser uma ferramenta de gestão útil para analistas e gestores das empresas. Embora as primeiras pesquisas sejam antigas, o assunto continua atual. Diversos setores da economia passaram, ou passam, por crises ao longo do tempo e não foi diferente para a indústria de aviação. Nesse contexto, o presente trabalho usou dados históricos de indicadores financeiros das empresas aéreas americanas de um período de três décadas para elaborar quatro modelos de previsão de falência e comparar suas performances preditivas com o Modelo Z-Score. Todas as elaborações foram calibragens do Modelo Z-Score, usando técnicas de simulação e estatística. Duas usaram Análise Discriminante Múltipla (MDA) e duas utilizaram Bootstrap junto com MDA. Um par de cada método utilizou as variáveis originais do Modelo Z-Score e o outro par apresentou sugestão de novo conjunto de variáveis. Os resultados mostraram que o modelo de previsão mais preciso, com 75,0 porcento de acerto na amostra In-Sample e 79,2 porcento na Out-of-Sample, utilizou o conjunto original de variáveis e as técnicas Bootstrap e MDA. / [en] Studies of bankruptcy prediction models started almost 90 years ago, with the intention of being a useful management tool for analysts and managers. Although the first researches are ancient, the subject remains current. Several sectors of the economy have experienced, or are experiencing, crises over time and the aviation industry is no exception. In this context, the present work used historical data of financial indicators of American airlines over a period of three decades to develop four models of bankruptcy forecast and compared their predictive performances with the Z-Score Model. All proposed models were calibrations of the Z-Score model, using simulation and statistical techniques. Two models were generated using Discriminant Analyzes Multiple (MDA) and two using Bootstrap along with MDA. A pair of each method used the original variables of the model s Z-Score and the other pair presented a novel set of variables. Results showed that the most accurate forecasting model, with 75.0 percent accuracy in-sample and 79.2 percent out-of-sample, used the original variables of the model s Z-Score and the Bootstrap e MDA techniques.
|
275 |
Clustering of the Stockholm County housing market / Klustring av bostadsmarknaden i Stockholms länMadsen, Christopher January 2019 (has links)
In this thesis a clustering of the Stockholm county housing market has been performed using different clustering methods. Data has been derived and different geographical constraints have been used. DeSO areas (Demographic statistical areas), developed by SCB, have been used to divide the housing market in to smaller regions for which the derived variables have been calculated. Hierarchical clustering methods, SKATER and Gaussian mixture models have been applied. Methods using different kinds of geographical constraints have also been applied in an attempt to create more geographically contiguous clusters. The different methods are then compared with respect to performance and stability. The best performing method is the Gaussian mixture model EII, also known as the K-means algorithm. The most stable method when applied to bootstrapped samples is the ClustGeo-method. / I denna uppsats har en klustring av Stockholms läns bostadsmarknad genomförts med olika klustringsmetoder. Data har bearbetats och olika geografiska begränsningar har använts. DeSO (Demografiska Statistiska Områden), som utvecklats av SCB, har använts för att dela in bostadsmarknaden i mindre regioner för vilka områdesattribut har beräknats. Hierarkiska klustringsmetoder, SKATER och Gaussian mixture models har tillämpats. Metoder som använder olika typer av geografiska begränsningar har också tillämpats i ett försök att skapa mer geografiskt sammanhängande kluster. De olika metoderna jämförs sedan med avseende på kvalitet och stabilitet. Den bästa metoden, med avseende på kvalitet, är en Gaussian mixture model kallad EII, även känd som K-means. Den mest stabila metoden är ClustGeo-metoden.
|
276 |
A Comparative study of data splitting algorithms for machine learning model selectionBirba, Delwende Eliane January 2020 (has links)
Data splitting is commonly used in machine learning to split data into a train, test, or validation set. This approach allows us to find the model hyper-parameter and also estimate the generalization performance. In this research, we conducted a comparative analysis of different data partitioning algorithms on both real and simulated data. Our main objective was to address the question of how the choice of data splitting algorithm can improve the estimation of the generalization performance. Data splitting algorithms used in this study were variants of k-fold, Kennard-Stone, SPXY ( sample set partitioning based on joint x-y distance), and random sampling algorithm. Each algorithm divided the data into two subset, training/validation. The training set was used to fit the model and validation for the evaluation. We then analyzed the different data splitting algorithms based on the generalization performances estimated from the validation and the external test set. From the result, we noted that the important determinant for a good generalization is the size of the dataset. For all the data sample methods applied on small data set, the gap between the performance estimated on the validation and test set was significant. However, we noted that the gap reduced when there was more data in training or validation. Too many or few data in the training set can also lead to bad model performance. So it is importance to have a reasonable balance between the training/validation set sizes. In our study, KS and SPXY was the splitting algorithm with poor model performance estimation. Indeed these methods select the most representative samples to train the model, and poor representative samples are left for model performance estimation. / Datapartitionering används vanligtvis i maskininlärning för att dela data i en tränings, test eller valideringsuppsättning. Detta tillvägagångssätt gör det möjligt för oss att hitta hyperparametrar för modellen och även uppskatta generaliseringsprestanda. I denna forskning genomförde vi en jämförande analys av olika datapartitionsalgoritmer på både verkliga och simulerade data. Vårt huvudmål var att undersöka frågan om hur valet avdatapartitioneringsalgoritm kan förbättra uppskattningen av generaliseringsprestanda. Datapartitioneringsalgoritmer som användes i denna studie var varianter av k-faldig korsvalidering, Kennard-Stone (KS), SPXY (partitionering baserat på gemensamt x-y-avstånd) och bootstrap-algoritm. Varje algoritm användes för att dela upp data i två olika datamängder: tränings- och valideringsdata. Vi analyserade sedan de olika datapartitioneringsalgoritmerna baserat på generaliseringsprestanda uppskattade från valideringen och den externa testuppsättningen. Från resultatet noterade vi att det avgörande för en bra generalisering är storleken på data. För alla datapartitioneringsalgoritmer som använts på små datamängder var klyftan mellan prestanda uppskattad på valideringen och testuppsättningen betydande. Vi noterade emellertid att gapet minskade när det fanns mer data för träning eller validering. För mycket eller för litet data i träningsuppsättningen kan också leda till dålig prestanda. Detta belyser vikten av att ha en korrekt balans mellan storlekarna på tränings- och valideringsmängderna. I vår studie var KS och SPXY de algoritmer med sämst prestanda. Dessa metoder väljer de mest representativa instanserna för att träna modellen, och icke-representativa instanser lämnas för uppskattning av modellprestanda.
|
277 |
Resampling-based tuning of ordered model selectionWillrich, Niklas 02 December 2015 (has links)
In dieser Arbeit wird die Smallest-Accepted Methode als neue Lepski-Typ Methode für Modellwahl im geordneten Fall eingeführt. In einem ersten Schritt wird die Methode vorgestellt und im Fall von Schätzproblemen mit bekannter Fehlervarianz untersucht. Die Hauptkomponenten der Methode sind ein Akzeptanzkriterium, basierend auf Modellvergleichen für die eine Familie von kritischen Werten mit einem Monte-Carlo-Ansatz kalibriert wird, und die Wahl des kleinsten (in Komplexität) akzeptierten Modells. Die Methode kann auf ein breites Spektrum von Schätzproblemen angewandt werden, wie zum Beispiel Funktionsschätzung, Schätzung eines linearen Funktionals oder Schätzung in inversen Problemen. Es werden allgemeine Orakelungleichungen für die Methode im Fall von probabilistischem Verlust und einer polynomialen Verlustfunktion gezeigt und Anwendungen der Methode in spezifischen Schätzproblemen werden untersucht. In einem zweiten Schritt wird die Methode erweitert auf den Fall einer unbekannten, möglicherweise heteroskedastischen Fehlerstruktur. Die Monte-Carlo-Kalibrierung wird durch eine Bootstrap-basierte Kalibrierung ersetzt. Eine neue Familie kritischer Werte wird eingeführt, die von den (zufälligen) Beobachtungen abhängt. In Folge werden die theoretischen Eigenschaften dieser Bootstrap-basierten Smallest-Accepted Methode untersucht. Es wird gezeigt, dass unter typischen Annahmen unter normalverteilten Fehlern für ein zugrundeliegendes Signal mit Hölder-Stetigkeits-Index s > 1/4 und log(n) (p^2/n) klein, wobei n hier die Anzahl der Beobachtungen und p die maximale Modelldimension bezeichnet, die Anwendung der Bootstrap-Kalibrierung anstelle der Monte-Carlo-Kalibrierung theoretisch gerechtfertigt ist. / In this thesis, the Smallest-Accepted method is presented as a new Lepski-type method for ordered model selection. In a first step, the method is introduced and studied in the case of estimation problems with known noise variance. The main building blocks of the method are a comparison-based acceptance criterion relying on Monte-Carlo calibration of a set of critical values and the choice of the model as the smallest (in complexity) accepted model. The method can be used on a broad range of estimation problems like function estimation, estimation of linear functionals and inverse problems. General oracle results are presented for the method in the case of probabilistic loss and for a polynomial loss function. Applications of the method to specific estimation problems are studied. In a next step, the method is extended to the case of an unknown possibly heteroscedastic noise structure. The Monte-Carlo calibration step is now replaced by a bootstrap-based calibration. A new set of critical values is introduced, which depends on the (random) observations. Theoretical properties of this bootstrap-based Smallest-Accepted method are then studied. It is shown for normal errors under typical assumptions, that the replacement of the Monte-Carlo step by bootstrapping in the Smallest-Accepted method is valid, if the underlying signal is Hölder-continuous with index s > 1/4 and log(n) (p^2/n) is small for a sample size n and a maximal model dimension p.
|
278 |
Re-sampling in instrumental variables regressionKoziuk, Andzhey 13 July 2020 (has links)
Diese Arbeit behandelt die Instrumentalvariablenregression im Kontext der Stichprobenwiederholung. Es wird ein Rahmen geschaffen, der das Ziel der Inferenz identifiziert. Diese Abhandlung versucht die Instrumentalvariablenregression von einer neuen Perspektive aus zu motivieren. Dabei wird angenommen, dass das Ziel der Schätzung von zwei Faktoren gebildet wird, einer Umgebung und einer zu einem internen Model spezifischen Struktur.
Neben diesem Rahmen entwickelt die Arbeit eine Methode der Stichprobenwiederholung, die geeignet für das Testen einer linearen Hypothese bezüglich der Schätzung des Ziels ist. Die betreffende technische Umgebung und das Verfahren werden im Zusammenhang in der Einleitung und im Hauptteil der folgenden Arbeit erklärt. Insbesondere, aufbauend auf der Arbeit von Spokoiny, Zhilova 2015, rechtfertigt und wendet diese Arbeit ein numerisches ’multiplier-bootstrap’ Verfahren an, um nicht asymptotische Konfidenzintervalle für den Hypothesentest zu konstruieren. Das Verfahren und das zugrunde liegende statistische Werkzeug wurden so gewählt und angepasst, um ein im Model auftretendes und von asymptotischer Analysis übersehenes Problem zu erklären, das formal als Schwachheit der Instrumentalvariablen bekannt ist. Das angesprochene Problem wird jedoch durch den endlichen Stichprobenansatz von Spokoiny 2014 adressiert. / Instrumental variables regression in the context of a re-sampling is considered. In the work a framework is built to identify an inferred target function. It attempts to approach an idea of a non-parametric regression and motivate instrumental variables regression from a new perspective. The framework assumes a target of estimation to be formed by two factors - an environment and an internal, model specific structure.
Aside from the framework, the work develops a re-sampling method suited to test linear hypothesis on the target. Particular technical environment and procedure are given and explained in the introduction and in the body of the work. Specifically, following the work of Spokoiny, Zhilova 2015, the writing justifies and applies numerically 'multiplier bootstrap' procedure to construct confidence intervals for the testing problem. The procedure and underlying statistical toolbox were chosen to account for an issue appearing in the model and overlooked by asymptotic analysis, that is weakness of instrumental variables. The issue, however, is addressed by design of the finite sample approach by Spokoiny 2014.
|
279 |
Functional data analysis with applications in financeBenko, Michal 26 January 2007 (has links)
An vielen verschiedenen Stellen der angewandten Statistik sind die zu untersuchenden Objekte abhängig von stetigen Parametern. Typische Beispiele in Finanzmarktapplikationen sind implizierte Volatilitäten, risikoneutrale Dichten oder Zinskurven. Aufgrund der Marktkonventionen sowie weiteren technisch bedingten Gründen sind diese Objekte nur an diskreten Punkten, wie zum Beispiel an Ausübungspreise und Maturitäten, für die ein Geschäft in einem bestimmten Zeitraum abgeschlossen wurde, beobachtbar. Ein funktionaler Datensatz ist dann vorhanden, wenn diese Funktionen für verschiedene Zeitpunkte (z.B. Tage) oder verschiedene zugrundeliegende Aktiva gesammelt werden. Das erste Thema, das in dieser Dissertation betrachtet wird, behandelt die nichtparametrischen Methoden der Schätzung dieser Objekte (wie z.B. implizierte Volatilitäten) aus den beobachteten Daten. Neben den bekannten Glättungsmethoden wird eine Prozedur für die Glättung der implizierten Volatilitäten vorgeschlagen, die auf einer Kombination von nichtparametrischer Glättung und den Ergebnissen der arbitragefreien Theorie basiert. Der zweite Teil der Dissertation ist der funktionalen Datenanalyse (FDA), speziell im Zusammenhang mit den Problemen, der empirischen Finanzmarktanalyse gewidmet. Der theoretische Teil der Arbeit konzentriert sich auf die funktionale Hauptkomponentenanalyse -- das funktionale Ebenbild der bekannten Dimensionsreduktionstechnik. Ein umfangreicher überblick der existierenden Methoden wird gegeben, eine Schätzmethode, die von der Lösung des dualen Problems motiviert ist und die Zwei-Stichproben-Inferenz basierend auf der funktionalen Hauptkomponentenanalyse werden behandelt. Die FDA-Techniken sind auf die Analyse der implizierten Volatilitäten- und Zinskurvendynamik angewandt worden. Darüber hinaus, wird die Implementation der FDA-Techniken zusammen mit einer FDA-Bibliothek für die statistische Software Xplore behandelt. / In many different fields of applied statistics an object of interest is depending on some continuous parameter. Typical examples in finance are implied volatility functions, yield curves or risk-neutral densities. Due to the different market conventions and further technical reasons, these objects are observable only on a discrete grid, e.g. for a grid of strikes and maturities for which the trade has been settled at a given time-point. By collecting these functions for several time points (e.g. days) or for different underlyings, a bunch (sample) of functions is obtained - a functional data set. The first topic considered in this thesis concerns the strategies of recovering the functional objects (e.g. implied volatilities function) from the observed data based on the nonparametric smoothing methods. Besides the standard smoothing methods, a procedure based on a combination of nonparametric smoothing and the no-arbitrage-theory results is proposed for implied volatility smoothing. The second part of the thesis is devoted to the functional data analysis (FDA) and its connection to the problems present in the empirical analysis of the financial markets. The theoretical part of the thesis focuses on the functional principal components analysis -- functional counterpart of the well known multivariate dimension-reduction-technique. A comprehensive overview of the existing methods is given, an estimation method based on the dual problem as well as the two-sample inference based on the functional principal component analysis are discussed. The FDA techniques are applied to the analysis of the implied volatility and yield curve dynamics. In addition, the implementation of the FDA techniques together with a FDA library for the statistical environment XploRe are presented.
|
280 |
Multiple imputation in the presence of a detection limit, with applications : an empirical approach / Shawn Carl LiebenbergLiebenberg, Shawn Carl January 2014 (has links)
Scientists often encounter unobserved or missing measurements that are typically reported as less than a fixed detection limit. This especially occurs in the environmental sciences when detection of low exposures are not possible due to limitations of the measuring instrument, and the resulting data are often referred to as type I and II left censored data. Observations lying below this detection limit are therefore often ignored, or `guessed' because it cannot be measured accurately. However, reliable estimates of the population parameters are nevertheless required to perform statistical analysis. The problem of dealing with values below a detection limit becomes increasingly complex when a large number of observations are present below this limit. Researchers thus have interest in developing statistical robust estimation procedures for dealing with left- or right-censored data sets (SinghandNocerino2002). The aim of this study focuses on several main components regarding the problems mentioned above. The imputation of censored data below a fixed detection limit are studied, particularly using the maximum likelihood procedure of Cohen(1959), and several variants thereof, in combination with four new variations of the multiple imputation concept found in literature. Furthermore, the focus also falls strongly on estimating the density of the resulting imputed, `complete' data set by applying various kernel density estimators. It should be noted that bandwidth selection issues are not of importance in this study, and will be left for further research. In this study, however, the maximum likelihood estimation method of Cohen (1959) will be compared with several variant methods, to establish which of these maximum likelihood estimation procedures for censored data estimates the population parameters of three chosen Lognormal distribution, the most reliably in terms of well-known discrepancy measures. These methods will be implemented in combination with four new multiple imputation procedures, respectively, to assess which of these nonparametric methods are most effective with imputing the 12 censored values below the detection limit, with regards to the global discrepancy measures mentioned above. Several variations of the Parzen-Rosenblatt kernel density estimate will be fitted to the complete filled-in data sets, obtained from the previous methods, to establish which is the preferred data-driven method to estimate these densities. The primary focus of the current study will therefore be the performance of the four chosen multiple imputation methods, as well as the recommendation of methods and procedural combinations to deal with data in the presence of a detection limit. An extensive Monte Carlo simulation study was performed to compare the various methods and procedural combinations. Conclusions and recommendations regarding the best of these methods and combinations are made based on the study's results. / MSc (Statistics), North-West University, Potchefstroom Campus, 2014
|
Page generated in 0.0786 seconds