Global ETD Search

41	Model Uncertainty and Aggregated Default Probabilities: New Evidence from Austria Hofmarcher, Paul, Kerbl, Stefan, Grün, Bettina, Sigmund, Michael, Hornik, Kurt 01 1900 (has links) (PDF) Understanding the determinants of aggregated default probabilities (PDs) has attracted substantial research over the past decades. This study addresses two major difficulties in understanding the determinants of aggregate PDs: Model uncertainty and multicollinearity among the regressors. We present Bayesian Model Averaging (BMA) as a powerful tool that overcomes model uncertainty. Furthermore, we supplement BMA with ridge regression to mitigate multicollinearity. We apply our approach to an Austrian dataset. Our findings suggest that factor prices like short term interest rates and energy prices constitute major drivers of default rates, while firms' profits reduce the expected number of failures. Finally, we show that the results of our baseline model are fairly robust to the choice of the prior model size. / Series: Research Report Series / Department of Statistics and Mathematics JEL E44, C52, E37
42	Equações monoespecíficas de incremento em área basal de Handroanthus serratifolius (Vahl) S.O.Grose (ipê amarelo) e Handroanthus impetiginosus (Mart. ex DC.) Mattos (ipê roxo) da floresta tropical pluvial do Acre / Monospecific equations of basal area increment of Handroanthus serratifolius (Vahl) S.O.Grose (ipê amarelo) and Handroanthus impetiginosus (Mart. ex DC.) Mattos (ipê roxo) of tropical pluvial forest of Acre Gama, Lorenna Eleamen da Silva 22 February 2017 (has links) In natural forests rarely considers the forest structure and the growth rate of the species as criteria for management, which is based on species groups with distinct characteristics. Given this assumption, this work was developed seeking to advance in the knowledge of the rate of growth of timber species exploited in the East of the State of Acre in order to contribute to the sustainable exploitation of these forests. For this was modeled the growth of Handroanthus impetiginosus (Mart. Ex DC.) Mattos (ipê roxo) and Handroanthus serratifolius (Vahl) S. O. Grose (ipê amarelo) with measured data with the technique of growth ring analysis. In the regression model were investigated covariates associated with size and morphology of crown, competitive status, sanity and the load of lianas in the crown, as descriptors of the basal area increment rate. The study was carried out with Amazon rainforest trees, measured in the municipality of Porto Acre, Acre state, in particular area under sustainable forest management adopted by IMAC – Institute of environment of Acre. With the Pressler borer were collected, the height of the dbh, four rolls of increment of 0.5 mm in diameter and approximately 10 cm in length, obeying the cardinal points. The rolls were extracted from trees H. impetiginosus sample (n = 30) and H. serratifolius (n = 35) in a diametric amplitude between 13.5 cm to 88.1 cm dbh, totaling 260 rolls of increment. The width of the growth rings was measured on the rays towards bark/pith, in each roll with increment of tablet, with magnifying glass attached and TSAP-WinTM Scientific software. The width of the growth rings were rebuilt the dimensions of the dbh and the increment rates corresponding to the period from 2011 to 2014. The regression model was adjusted for Ordinary Least Squares considering hypothetical normal distribution, Generalized Linear Models considering distribution Gamma and logarithmic link function. The selection of the covariates considered the correlation with periodic basal area growth. The selected model had selected variables the diameter (dbh), height (h), h/d ratio and Hegyi competition index. The presence of multicollinearity between the covariates was corrected by Ridge Regression procedure. Based on statistical criteria and residual evaluation, adjusted growth model with the addition of the constant K = 0,024 to the coefficients of the model proved to be suitable to describe the variation of periodic annual increment in basal area (IPAg). / Em florestas naturais, raramente considera-se a estrutura da floresta e o ritmo de crescimento das espécies como critérios para o manejo. Quando considerado, é normalmente baseado em grupos de espécies com características distintas. Diante desse pressuposto, este trabalho foi desenvolvido buscando avançar no conhecimento do ritmo de crescimento das espécies madeireiras exploradas no leste do estado do Acre visando contribuir com a exploração sustentável dessas florestas. Para isso, foi modelado o crescimento de Handroanthus impetiginosus (Mart. Ex DC.) Mattos (ipê roxo) e Handroanthus serratifolius (Vahl) S.O. Grose (ipê amarelo) a partir de dados obtidos com a técnica de análise de anéis de crescimento. No modelo de regressão, foram investigadas covariáveis associadas ao tamanho e à morfometria da copa, ao status competitivo, à sanidade e à carga de lianas na copa, como descritoras da taxa de incremento em área basal. O estudo foi desenvolvido com árvores da Floresta Amazônica, mensuradas no município de Porto Acre, estado do Acre, em área particular, sob manejo florestal sustentável, aprovado pelo Instituto do Meio Ambiente do Acre – IMAC. Com o Trado de Pressler, foram coletados, à altura do dap, quatro rolos de incremento de 0,5 mm de diâmetro e de, aproximadamente, 10 cm de comprimento, obedecendo os pontos cardeais. Os rolos foram extraídos de árvores amostra de H. impetiginosus (n=30) e H. serratifolius (n=35), em uma amplitude diamétrica entre 13,5 a 88,1 cm, totalizando 260 rolos de incremento. A largura dos anéis de crescimento foi medida sobre os raios no sentido casca/medula, em cada rolo de incremento com auxílio de mesa digitalizadora, com lupa acoplada e software TSAP-WinTM Scientific. A partir da largura dos anéis de crescimento, foram reconstruídas as dimensões do dap e as taxas de incremento correspondente ao período de 2011 a 2014. O modelo de regressão foi ajustado por Mínimos Quadrados Ordinários, considerando distribuição normal hipotética, com os Mínimos Quadrados Generalizados, considerando distribuição Gama e função de ligação logarítmica. A seleção das covariáveis considerou a correlação com o crescimento periódico anual em área basal. O modelo selecionado teve como variáveis selecionadas o logaritmo do diâmetro (lnd), altura (h), relação altura/diâmetro (h/d) e índice de competição de Hegyi (IC). A presença de multicolinearidade entre as covariáveis foi corrigida pelo procedimento de Regressão de Cumeeira. Com base nos critérios estatísticos e na avaliação residual, o modelo de crescimento ajustado com a adição da constante K=0,024 aos coeficientes do modelo demonstrou ser adequado para descrever a variação de incremento periódico anual em área basal (IPAg). Regressão de cumeeira Dendrocronologia Incremento periódico Ridge regression Dendrochronology Periodic increment
43	Robustifikace statistických a ekonometrických metod regrese / Robustification of statistical and econometrical regression methods Jurczyk, Tomáš January 2016 (has links) Title: Robustification of statistical and econometrical regression methods Author: Mgr. Tomáš Jurczyk Department: Department of probability and mathematical statistics Supervisor: prof. RNDr. Jan Ámos Víšek CSc., IES FSV UK Praha Abstract: Multicollinearity and outlier presence are two problems of data which can occur during the regression analysis. In this thesis we are interested mainly in situations where combined outlier-multicollinearity problem is present. We will show first the behavior of classical methods developed for overcoming one of these problems. We will investigate the functionality of methods proposed as robust multicollinearity detectors as well. We will prove that proposed two-step procedures (in one step typically based on robust regression methods) are failing in outlier detection and therefore also multicollinearity detection, if the strong multicollinearity is present in the majority of the data. We will propose a new one-step method as a candidate for the robust detector of multicollinearity as well as the robust ridge regression estimate. We will derive its properties, behavior and propose the diagnostic tools derived from that method. Keywords: multicollinearity, outliers, robust detector of multicollinearity, ro- bust ridge regression 1
44	Modelos e metodologias para estimação dos efeitos genéticos fixos em uma população multirracial Angus x Nelore / Models and methodologies to estimate fixed genetic effects estiimation in a crossbred population Angus x Nelore Bertoli, Claudia Damo January 2015 (has links) Os objetivos deste trabalho foram estimar os efeitos genéticos fixos atuando sobre uma população sintética e testar diferentes modelos e metodologias neste processo de estimação. Os efeitos genéticos fixos testados foram os efeitos aditivos direto e materno de raça e não aditivos diretos e maternos de heterose, perdas epistáticas e complementariedade. Os modelos testados incluem alternada e conjuntamente todos estes efeitos. As metodologias de regressão de cumeeira e regressão por quadrados mínimos foram comparadas assim como dois métodos distintos para determinação do ridge parameter. Uma população sintética, envolvendo as raças Angus e Nelore foi utilizada. Foram utilizados 294.045 registros de desmame e 148.443 registros de sobreano de uma população sintética envolvendo as raças Angus e Nelore. Foram estudadas as seguintes características: ganho de peso do nascimento ao desmame (WG), escores de conformação (WC), precocidade (WP) e musculatura (WM) coletados ao desmame, ganho de peso do desmame ao sobreano (PG), escores fenotípicos de conformação (PC), precocidade (PP) e musculatura (PM) e perímetro escrotal (SC) coletados ao sobreano. Na maioria das análises, os efeitos genéticos fixos estimados foram estatisticamente significativos. O modelo completo, incluindo todos os efeitos genéticos fixos foi o mais indicado nas duas metodologias testadas. Na estimação por regressão de quadrados mínimos, o modelo mais parcimonioso foi o que incluiu apenas os efeitos aditivos de raça e não aditivos de heterose (dominância) e na estimação por regressão de cumeeira o mais parcimonioso foi o aquele que incluiu, além dos dois já referidos, os efeitos não aditivos de perdas epistáticas. As metodologias mostraram-se equivalentes, para os modelos que incluíram apenas efeito aditivo de raça e não aditivo de heterose. Todavia com a inclusão dos efeitos não aditivos de perdas epistáticas e/ou complementariedade, a regressão de cumeeira mostrou-se mais indicada até o momento em que os dados atingiram um determinado volume e estrutura, com grande parte das classes de composições raciais representadas na amostra e, a partir daí os modelos se mostraram equivalentes. Na comparação entre os métodos de determinação do ridge parameter, o mais indicado foi o método que identifica o menor valor possível que produz fatores de inflação de variância abaixo de 10 para todos os regressores estimados. / The objectives of this study were to estimate the fixed genetic effects acting on a synthetic population, as well as test different models and methodologies in this estimation process. The tested fixed genetic effects were the direct and maternal breed additive and direct and maternal heterosis, epistatic loss and complementarity non-additive effects The tested models include alternate and together all these effects. The ridge regression and least square regression methodologies were compared and were also compared two different methods for determining the ridge parameter to use in the ridge regression. A synthetic beef cattle population, involving Angus and Nellore in several breed combinations was used. 294,045 records at weaning and 148,443 records at yearling were used. The traits of weight gain from birth to weaning (WG), phenotypic scores of conformation (WC), precocity (WP) and muscling (WM) collected at weaning, weight gain from weaning to yearling (PG), phenotypic scores of conformation (PC), precocity (PP) and muscles (PM) collected at yearling and scrotal circumference (SC) were used in the analyzes. In most of analyzes, the estimated fixed genetic effects were statistically significant. The complete model, including all fixed genetic effects was the most suitable in the two tested methodologies. In the estimation by least squares regression, the most parsimonious model was the model that included only breed additive and non-additive heterosis (dominance) effects and in the estimation by ridge regression the most parsimonious model was that included, besides the breed additive and non-additive heterosis (dominance) effects, the non-additive epistatic loss effects. Comparing the two methodologies, for models that include only breed additive and non-additive heterosis effects, methodologies proved to be equivalent; with the inclusion of non-additive epistatic loss and / or complementarity effects, ridge regression was more indicated originally. After reached a certain volume and structure, with much of classes of breeds represented in the sample. Both least squares and ridge regression were equivalent. Comparing the methods for determining the ridge parameter, the best method was that which identifies the smallest possible value that produces the variance inflation factors below 10 for all estimated regressors. Bovino de corte Genetica animal Melhoramento genetico animal Cruzamento animal Complementarity Crossbred beef cattle Epistatic loss Genetic effects Heterosis Ridge regression
45	Modelos e metodologias para estimação dos efeitos genéticos fixos em uma população multirracial Angus x Nelore / Models and methodologies to estimate fixed genetic effects estiimation in a crossbred population Angus x Nelore Bertoli, Claudia Damo January 2015 (has links) Os objetivos deste trabalho foram estimar os efeitos genéticos fixos atuando sobre uma população sintética e testar diferentes modelos e metodologias neste processo de estimação. Os efeitos genéticos fixos testados foram os efeitos aditivos direto e materno de raça e não aditivos diretos e maternos de heterose, perdas epistáticas e complementariedade. Os modelos testados incluem alternada e conjuntamente todos estes efeitos. As metodologias de regressão de cumeeira e regressão por quadrados mínimos foram comparadas assim como dois métodos distintos para determinação do ridge parameter. Uma população sintética, envolvendo as raças Angus e Nelore foi utilizada. Foram utilizados 294.045 registros de desmame e 148.443 registros de sobreano de uma população sintética envolvendo as raças Angus e Nelore. Foram estudadas as seguintes características: ganho de peso do nascimento ao desmame (WG), escores de conformação (WC), precocidade (WP) e musculatura (WM) coletados ao desmame, ganho de peso do desmame ao sobreano (PG), escores fenotípicos de conformação (PC), precocidade (PP) e musculatura (PM) e perímetro escrotal (SC) coletados ao sobreano. Na maioria das análises, os efeitos genéticos fixos estimados foram estatisticamente significativos. O modelo completo, incluindo todos os efeitos genéticos fixos foi o mais indicado nas duas metodologias testadas. Na estimação por regressão de quadrados mínimos, o modelo mais parcimonioso foi o que incluiu apenas os efeitos aditivos de raça e não aditivos de heterose (dominância) e na estimação por regressão de cumeeira o mais parcimonioso foi o aquele que incluiu, além dos dois já referidos, os efeitos não aditivos de perdas epistáticas. As metodologias mostraram-se equivalentes, para os modelos que incluíram apenas efeito aditivo de raça e não aditivo de heterose. Todavia com a inclusão dos efeitos não aditivos de perdas epistáticas e/ou complementariedade, a regressão de cumeeira mostrou-se mais indicada até o momento em que os dados atingiram um determinado volume e estrutura, com grande parte das classes de composições raciais representadas na amostra e, a partir daí os modelos se mostraram equivalentes. Na comparação entre os métodos de determinação do ridge parameter, o mais indicado foi o método que identifica o menor valor possível que produz fatores de inflação de variância abaixo de 10 para todos os regressores estimados. / The objectives of this study were to estimate the fixed genetic effects acting on a synthetic population, as well as test different models and methodologies in this estimation process. The tested fixed genetic effects were the direct and maternal breed additive and direct and maternal heterosis, epistatic loss and complementarity non-additive effects The tested models include alternate and together all these effects. The ridge regression and least square regression methodologies were compared and were also compared two different methods for determining the ridge parameter to use in the ridge regression. A synthetic beef cattle population, involving Angus and Nellore in several breed combinations was used. 294,045 records at weaning and 148,443 records at yearling were used. The traits of weight gain from birth to weaning (WG), phenotypic scores of conformation (WC), precocity (WP) and muscling (WM) collected at weaning, weight gain from weaning to yearling (PG), phenotypic scores of conformation (PC), precocity (PP) and muscles (PM) collected at yearling and scrotal circumference (SC) were used in the analyzes. In most of analyzes, the estimated fixed genetic effects were statistically significant. The complete model, including all fixed genetic effects was the most suitable in the two tested methodologies. In the estimation by least squares regression, the most parsimonious model was the model that included only breed additive and non-additive heterosis (dominance) effects and in the estimation by ridge regression the most parsimonious model was that included, besides the breed additive and non-additive heterosis (dominance) effects, the non-additive epistatic loss effects. Comparing the two methodologies, for models that include only breed additive and non-additive heterosis effects, methodologies proved to be equivalent; with the inclusion of non-additive epistatic loss and / or complementarity effects, ridge regression was more indicated originally. After reached a certain volume and structure, with much of classes of breeds represented in the sample. Both least squares and ridge regression were equivalent. Comparing the methods for determining the ridge parameter, the best method was that which identifies the smallest possible value that produces the variance inflation factors below 10 for all estimated regressors. Bovino de corte Genetica animal Melhoramento genetico animal Cruzamento animal Complementarity Crossbred beef cattle Epistatic loss Genetic effects Heterosis Ridge regression
46	Modelos e metodologias para estimação dos efeitos genéticos fixos em uma população multirracial Angus x Nelore / Models and methodologies to estimate fixed genetic effects estiimation in a crossbred population Angus x Nelore Bertoli, Claudia Damo January 2015 (has links) Os objetivos deste trabalho foram estimar os efeitos genéticos fixos atuando sobre uma população sintética e testar diferentes modelos e metodologias neste processo de estimação. Os efeitos genéticos fixos testados foram os efeitos aditivos direto e materno de raça e não aditivos diretos e maternos de heterose, perdas epistáticas e complementariedade. Os modelos testados incluem alternada e conjuntamente todos estes efeitos. As metodologias de regressão de cumeeira e regressão por quadrados mínimos foram comparadas assim como dois métodos distintos para determinação do ridge parameter. Uma população sintética, envolvendo as raças Angus e Nelore foi utilizada. Foram utilizados 294.045 registros de desmame e 148.443 registros de sobreano de uma população sintética envolvendo as raças Angus e Nelore. Foram estudadas as seguintes características: ganho de peso do nascimento ao desmame (WG), escores de conformação (WC), precocidade (WP) e musculatura (WM) coletados ao desmame, ganho de peso do desmame ao sobreano (PG), escores fenotípicos de conformação (PC), precocidade (PP) e musculatura (PM) e perímetro escrotal (SC) coletados ao sobreano. Na maioria das análises, os efeitos genéticos fixos estimados foram estatisticamente significativos. O modelo completo, incluindo todos os efeitos genéticos fixos foi o mais indicado nas duas metodologias testadas. Na estimação por regressão de quadrados mínimos, o modelo mais parcimonioso foi o que incluiu apenas os efeitos aditivos de raça e não aditivos de heterose (dominância) e na estimação por regressão de cumeeira o mais parcimonioso foi o aquele que incluiu, além dos dois já referidos, os efeitos não aditivos de perdas epistáticas. As metodologias mostraram-se equivalentes, para os modelos que incluíram apenas efeito aditivo de raça e não aditivo de heterose. Todavia com a inclusão dos efeitos não aditivos de perdas epistáticas e/ou complementariedade, a regressão de cumeeira mostrou-se mais indicada até o momento em que os dados atingiram um determinado volume e estrutura, com grande parte das classes de composições raciais representadas na amostra e, a partir daí os modelos se mostraram equivalentes. Na comparação entre os métodos de determinação do ridge parameter, o mais indicado foi o método que identifica o menor valor possível que produz fatores de inflação de variância abaixo de 10 para todos os regressores estimados. / The objectives of this study were to estimate the fixed genetic effects acting on a synthetic population, as well as test different models and methodologies in this estimation process. The tested fixed genetic effects were the direct and maternal breed additive and direct and maternal heterosis, epistatic loss and complementarity non-additive effects The tested models include alternate and together all these effects. The ridge regression and least square regression methodologies were compared and were also compared two different methods for determining the ridge parameter to use in the ridge regression. A synthetic beef cattle population, involving Angus and Nellore in several breed combinations was used. 294,045 records at weaning and 148,443 records at yearling were used. The traits of weight gain from birth to weaning (WG), phenotypic scores of conformation (WC), precocity (WP) and muscling (WM) collected at weaning, weight gain from weaning to yearling (PG), phenotypic scores of conformation (PC), precocity (PP) and muscles (PM) collected at yearling and scrotal circumference (SC) were used in the analyzes. In most of analyzes, the estimated fixed genetic effects were statistically significant. The complete model, including all fixed genetic effects was the most suitable in the two tested methodologies. In the estimation by least squares regression, the most parsimonious model was the model that included only breed additive and non-additive heterosis (dominance) effects and in the estimation by ridge regression the most parsimonious model was that included, besides the breed additive and non-additive heterosis (dominance) effects, the non-additive epistatic loss effects. Comparing the two methodologies, for models that include only breed additive and non-additive heterosis effects, methodologies proved to be equivalent; with the inclusion of non-additive epistatic loss and / or complementarity effects, ridge regression was more indicated originally. After reached a certain volume and structure, with much of classes of breeds represented in the sample. Both least squares and ridge regression were equivalent. Comparing the methods for determining the ridge parameter, the best method was that which identifies the smallest possible value that produces the variance inflation factors below 10 for all estimated regressors. Bovino de corte Genetica animal Melhoramento genetico animal Cruzamento animal Complementarity Crossbred beef cattle Epistatic loss Genetic effects Heterosis Ridge regression
47	Robustní optimalizace v klasifikačních a regresních úlohách / Robust optimization in classification and regression problems Semela, Ondřej January 2016 (has links) In this thesis, we present selected methods of regression and classification analysis in terms of robust optimization which aim to compensate for data imprecisions and measurement errors. In the first part, ordinary least squares method and its generalizations derived within the context of robust optimization - ridge regression and Lasso method are introduced. The connection between robust least squares and stated generalizations is also shown. Theoretical results are accompanied with simulation study investigating from a different perspective the robustness of stated methods. In the second part, we define a modern classification method - Support Vector Machines (SVM). Using the obtained knowledge, we formulate a robust SVM method, which can be applied in robust classification. The final part is devoted to the biometric identification of a style of typing and an individual based on keystroke dynamics using the formulated theory. Powered by TCPDF (www.tcpdf.org)
48	An Optical Flow Implementation Comparison Study Bodily, John M. 12 March 2009 (has links) (PDF) Optical flow is the apparent motion of brightness patterns within an image scene. Algorithms used to calculate the optical flow for a sequence of images are useful in a variety of applications, including motion detection and obstacle avoidance. Typical optical flow algorithms are computationally intense and run slowly when implemented in software, which is problematic since many potential applications of the algorithm require real-time calculation in order to be useful. To increase performance of the calculation, optical flow has recently been implemented on FPGA and GPU platforms. These devices are able to process optical flow in real-time, but are generally less accurate than software solutions. For this thesis, two different optical flow algorithms have been implemented to run on a GPU using NVIDIA's CUDA SDK. Previous FPGA implementations of the algorithms exist and are used to make a comparison between the FPGA and GPU devices for the optical flow calculation. The first algorithm calculates optical flow using 3D gradient tensors and is able to process 640x480 images at about 238 frames per second with an average angular error of 12.1 degrees when run on a GeForce 8800 GTX GPU. The second algorithm uses increased smoothing and a ridge regression calculation to produce a more accurate result. It reduces the average angular error by about 2.3x, but the additional computational complexity of the algorithm also reduces the frame rate by about 1.5x. Overall, the GPU outperforms the FPGA in frame rate and accuracy, but requires much more power and is not as flexible. The most significant advantage of the GPU is the reduced design time and eﬀort needed to implement the algorithms, with the FPGA designs requiring 10x to 12x the eﬀort. optical flow GPU FPGA motion detection CUDA computer vision algorithm comparison 3D tensors ridge regression Electrical and Computer Engineering
49	Predicting Reactor Instability Using Neural Networks Hubert, Hilborn January 2022 (has links) The study of the instabilities in boiling water reactors is of significant importance to the safety withwhich they can be operated, as they can cause damage to the reactor posing risks to both equipmentand personnel. The instabilities that concern this paper are progressive growths in the oscillatingpower of boiling-water reactors. As thermal power is oscillatory is important to be able to identifywhether or not the power amplitude is stable. The main focus of this paper has been the development of a neural network estimator of these insta-bilities, fitting a non-linear model function to data by estimating it’s parameters. In doing this, theambition was to optimize the networks to the point that it can deliver near ”best-guess” estimationsof the parameters which define these instabilities, evaluating the usefulness of these networks whenapplied to problems like this. The goal was to design both MLP(Multi-Layer Perceptron) and SVR/KRR(Support Vector Regres-sion/Kernel Rigde Regression) networks and improve them to the point that they provide reliableand useful information about the waves in question. This goal was accomplished only in part asthe SVR/KRR networks proved to have some difficulty in ascertaining the phase shift of the waves.Overall, however, these networks prove very useful in this kind of task, succeeding with a reasonabledegree of confidence to calculating the different parameters of the waves studied. Boiling Water Reactors Density Wave Instability Multi-Layer Perceptron Support Vector Regression Kernel-Ridge Regression Physical Sciences Fysik
50	The implementation of noise addition partial least squares Moller, Jurgen Johann 03 1900 (has links) Thesis (MComm (Statistics and Actuarial Science))--University of Stellenbosch, 2009. / When determining the chemical composition of a specimen, traditional laboratory techniques are often both expensive and time consuming. It is therefore preferable to employ more cost effective spectroscopic techniques such as near infrared (NIR). Traditionally, the calibration problem has been solved by means of multiple linear regression to specify the model between X and Y. Traditional regression techniques, however, quickly fail when using spectroscopic data, as the number of wavelengths can easily be several hundred, often exceeding the number of chemical samples. This scenario, together with the high level of collinearity between wavelengths, will necessarily lead to singularity problems when calculating the regression coefficients. Ways of dealing with the collinearity problem include principal component regression (PCR), ridge regression (RR) and PLS regression. Both PCR and RR require a significant amount of computation when the number of variables is large. PLS overcomes the collinearity problem in a similar way as PCR, by modelling both the chemical and spectral data as functions of common latent variables. The quality of the employed reference method greatly impacts the coefficients of the regression model and therefore, the quality of its predictions. With both X and Y subject to random error, the quality the predictions of Y will be reduced with an increase in the level of noise. Previously conducted research focussed mainly on the effects of noise in X. This paper focuses on a method proposed by Dardenne and Fernández Pierna, called Noise Addition Partial Least Squares (NAPLS) that attempts to deal with the problem of poor reference values. Some aspects of the theory behind PCR, PLS and model selection is discussed. This is then followed by a discussion of the NAPLS algorithm. Both PLS and NAPLS are implemented on various datasets that arise in practice, in order to determine cases where NAPLS will be beneficial over conventional PLS. For each dataset, specific attention is given to the analysis of outliers, influential values and the linearity between X and Y, using graphical techniques. Lastly, the performance of the NAPLS algorithm is evaluated for various Principal components analysis Regression analysis Ridge regression (Statistics)

Search results