Spelling suggestions: "subject:"[een] PARTIAL LEAST SQUARES"" "subject:"[enn] PARTIAL LEAST SQUARES""
151 |
Essays on nonlinear time series analysis and health economicsOvanfors, Anna January 2006 (has links)
Diss. Stockholm : Handelshögskolan, 2006 S. 1-125 : 4 uppsatser
|
152 |
Explorative Multivariate Data Analysis of the Klinthagen Limestone Quarry Data / Utforskande multivariat analys av Klinthagentäktens projekteringsdataBergfors, Linus January 2010 (has links)
The today quarry planning at Klinthagen is rough, which provides an opportunity to introduce new exciting methods to improve the quarry gain and efficiency. Nordkalk AB, active at Klinthagen, wishes to start a new quarry at a nearby location. To exploit future quarries in an efficient manner and ensure production quality, multivariate statistics may help gather important information. In this thesis the possibilities of the multivariate statistical approaches of Principal Component Analysis (PCA) and Partial Least Squares (PLS) regression were evaluated on the Klinthagen bore data. PCA data were spatially interpolated by Kriging, which also was evaluated and compared to IDW interpolation. Principal component analysis supplied an overview of the variables relations, but also visualised the problems involved when linking geophysical data to geochemical data and the inaccuracy introduced by lacking data quality. The PLS regression further emphasised the geochemical-geophysical problems, but also showed good precision when applied to strictly geochemical data. Spatial interpolation by Kriging did not result in significantly better approximations than the less complex control interpolation by IDW. In order to improve the information content of the data when modelled by PCA, a more discrete sampling method would be advisable. The data quality may cause trouble, though with sample technique of today it was considered to be of less consequence. Faced with a single geophysical component to be predicted from chemical variables further geophysical data need to complement existing data to achieve satisfying PLS models. The stratified rock composure caused trouble when spatially interpolated. Further investigations should be performed to develop more suitable interpolation techniques.
|
153 |
Identifying Factors Influencing The Acceptance Of Processes: An Empirical Investigation Using The Structural Equation Modeling ApproachDegerli, Mustafa 01 July 2012 (has links) (PDF)
In this research, it was mainly aimed to develop an acceptance model for processes, namely the process acceptance model (PAM). For this purpose, a questionnaire, comprising 3-part and 81-question, was developed to collect quantitative and qualitative data from people having relationships with certain process-focused models and/or standards (CMMI, ISO 15504, ISO 9001, ISO 27001, AQAP-160, AQAP-2110, and/or AS 9100). To revise and refine the questionnaire, expert reviews were ensured, and a pilot study was conducted with 60 usable responses. After reviews, refinements and piloting, the questionnaire was deployed to collect data and in-total 368 usable responses were collected from the people. Here, collected data were screened concerning incorrectly entered data, missing data, outliers and normality, and reliability and validity of the questionnaire were ensured. Partial least squares structural equation modeling (PLS SEM) was applied to develop the PAM. In this context, exploratory and confirmatory factor analyses were applied, and the initial model was estimated and evaluated. The initial model was modified as required by PLS SEM, and confirmatory factor analysis was repeated, and the modified final model was estimated and evaluated. Consequently, the PAM, with 18 factors and their statistically significant relationships, was developed. Furthermore, descriptive statistics and t-tests were applied to discover some interesting, meaningful, and important points to be taken into account regarding the acceptance of processes. Moreover, collected quantitative data were analyzed, and three additional factors were discovered regarding the acceptance of processes. Besides, a checklist to test and/or promote the acceptance of processes was established.
|
154 |
Multivariate data analysis using spectroscopic data of fluorocarbon alcohol mixtures / Nothnagel, C.Nothnagel, Carien January 2012 (has links)
Pelchem, a commercial subsidiary of Necsa (South African Nuclear Energy Corporation), produces a range of commercial fluorocarbon products while driving research and development initiatives to support the fluorine product portfolio. One such initiative is to develop improved analytical techniques to analyse product composition during
development and to quality assure produce.
Generally the C–F type products produced by Necsa are in a solution of anhydrous HF, and cannot be directly analyzed with traditional techniques without derivatisation. A technique such as vibrational spectroscopy, that can analyze these products directly without further preparation, will have a distinct advantage. However, spectra of mixtures of similar compounds are complex and not suitable for traditional quantitative regression analysis.
Multivariate data analysis (MVA) can be used in such instances to exploit the complex nature of spectra to extract quantitative information on the composition of mixtures.
A selection of fluorocarbon alcohols was made to act as representatives for fluorocarbon compounds. Experimental design theory was used to create a calibration range of mixtures
of these compounds. Raman and infrared (NIR and ATR–IR) spectroscopy were used to
generate spectral data of the mixtures and this data was analyzed with MVA techniques by
the construction of regression and prediction models. Selected samples from the mixture
range were chosen to test the predictive ability of the models.
Analysis and regression models (PCR, PLS2 and PLS1) gave good model fits (R2 values larger
than 0.9). Raman spectroscopy was the most efficient technique and gave a high prediction
accuracy (at 10% accepted standard deviation), provided the minimum mass of a
component exceeded 16% of the total sample.
The infrared techniques also performed well in terms of fit and prediction. The NIR spectra were subjected to signal saturation as a result of using long path length sample cells. This was shown to be the main reason for the loss in efficiency of this technique compared to Raman and ATR–IR spectroscopy.
It was shown that multivariate data analysis of spectroscopic data of the selected
fluorocarbon compounds could be used to quantitatively analyse mixtures with the
possibility of further optimization of the method. The study was a representative study
indicating that the combination of MVA and spectroscopy can be used successfully in the
quantitative analysis of other fluorocarbon compound mixtures. / Thesis (M.Sc. (Chemistry))--North-West University, Potchefstroom Campus, 2012.
|
155 |
Multivariate data analysis using spectroscopic data of fluorocarbon alcohol mixtures / Nothnagel, C.Nothnagel, Carien January 2012 (has links)
Pelchem, a commercial subsidiary of Necsa (South African Nuclear Energy Corporation), produces a range of commercial fluorocarbon products while driving research and development initiatives to support the fluorine product portfolio. One such initiative is to develop improved analytical techniques to analyse product composition during
development and to quality assure produce.
Generally the C–F type products produced by Necsa are in a solution of anhydrous HF, and cannot be directly analyzed with traditional techniques without derivatisation. A technique such as vibrational spectroscopy, that can analyze these products directly without further preparation, will have a distinct advantage. However, spectra of mixtures of similar compounds are complex and not suitable for traditional quantitative regression analysis.
Multivariate data analysis (MVA) can be used in such instances to exploit the complex nature of spectra to extract quantitative information on the composition of mixtures.
A selection of fluorocarbon alcohols was made to act as representatives for fluorocarbon compounds. Experimental design theory was used to create a calibration range of mixtures
of these compounds. Raman and infrared (NIR and ATR–IR) spectroscopy were used to
generate spectral data of the mixtures and this data was analyzed with MVA techniques by
the construction of regression and prediction models. Selected samples from the mixture
range were chosen to test the predictive ability of the models.
Analysis and regression models (PCR, PLS2 and PLS1) gave good model fits (R2 values larger
than 0.9). Raman spectroscopy was the most efficient technique and gave a high prediction
accuracy (at 10% accepted standard deviation), provided the minimum mass of a
component exceeded 16% of the total sample.
The infrared techniques also performed well in terms of fit and prediction. The NIR spectra were subjected to signal saturation as a result of using long path length sample cells. This was shown to be the main reason for the loss in efficiency of this technique compared to Raman and ATR–IR spectroscopy.
It was shown that multivariate data analysis of spectroscopic data of the selected
fluorocarbon compounds could be used to quantitatively analyse mixtures with the
possibility of further optimization of the method. The study was a representative study
indicating that the combination of MVA and spectroscopy can be used successfully in the
quantitative analysis of other fluorocarbon compound mixtures. / Thesis (M.Sc. (Chemistry))--North-West University, Potchefstroom Campus, 2012.
|
156 |
政府績效管理資訊化的交易成本分析:以「政府計畫管理資訊網」為例 / Information and communication technologies (ICTs) and government performance management: A case study of GPMnet in Taiwan謝叔芳, Hsieh, Hsu Fang Unknown Date (has links)
自1980年代政府再造潮流以來,績效管理及資訊通信技術業已成為政府提昇績效的重要工具,在此一背景下,我國亦於民國94年完成「政府計畫管理資訊網(GPMnet)」整合,用以協助執行績效管理作業。不過,由於資訊科技涵蓋面向相當寬廣,影響層面頗為廣泛,因此也引發樂觀、悲觀及務實主義等不同立場的爭辯,其運用成效確實有待進一步的評估。在相關文獻的基礎上,本研究採用交易成本理論途徑,首先透過問卷調查瞭解GPMnet使用者的態度及行為偏好,其次則經由訪談資料進一步解析資訊通信科技對於政府績效管理成本的增加與減少。
本研究採取混合方法論(mixed methodology)進行研究設計,兼採量化資料及質化資料蒐集分析。量化資料部分,以GPMnet使用者為分析單位進行問卷調查,回收148份有效樣本;質化資料部分,依主辦、主管、會審及研考等4項權限功能,選取8位GPMnet使用者進行訪談,以了解不同權限受訪者使用GPMnet的經驗與看法。
資料分析部分,本研究以偏最小平方法分析問卷資料,調查結果分析顯示,GPMnet系統使用的交易成本認知與態度、主觀系統績效有顯著負向關係;不確定性、資產專屬、使用頻率與交易成本之假設則未獲實證資料支持。此外,訪談資料分析發現,制度環境下,因受限於現行不同機關有不同資訊系統、GPMnet多個子系統,以及紙本流程仍然存在的情況下,使用GPMnet執行績效管理作業會增加行政成本負擔;此外,在實際使用的情形之下,因為系統可以保存過去資料、提供清楚欄位、網路化傳遞、進行進度控管及主動公開資訊等功能,減少了行政作業交易成本。相對的,也造成學習時間不符成本、溝通費時、校對、資訊過載、介面不友善及系統不穩定等負面影響,增加績效管理作業的交易成本。
最後,本研究建議在學術研究上,結構模式的觀察變項應更謹慎設計,資訊系統評估理論應重視成本觀點。至於在實務面則應全面落實電子化績效管理,在GPMnet系統資源環境更應進行資料備份,以減少資訊的過度負荷。 / Governments invest much more attention, time, and money on performance management and evaluation on the public sector today than ever before. To better utilize agency program management systems under the Executive Yuan, the Research, Development and Evaluation Commission (RDEC) has completed the planning of the "Policy Program Management Information System" (Government Program network, GPMnet). The system is a common service platform created to integrate various policy implementation management information systems to enhance the performance of different agencies in program management. However, the performance of GPMnet needs to be evaluated. In order to evaluate the system, this study introduces an empirical research which focuses on a transaction cost approach that has often been used to support the idea of information and communication technology and its positive impact on the economic system.
The data was collected by mixed methodology, combining quantitative data from 148 users and eight interviews with a semi-structured questionnaire. The Partial Least Squares was used to analyze the quantitative data. According to the research findings, information-related problems represent only some of the elements contributing to the transaction costs. These costs also emerge due to the institutional factors contributing to their growths. The study of the consequences associated with ICT design and its implementation, based on the transaction cost theory, should therefore consider the costs of ICTs.
|
157 |
In silico tools in risk assessment : of industrial chemicals in general and non-dioxin-like PCBs in particularStenberg, Mia January 2012 (has links)
Industrial chemicals in European Union produced or imported in volumes above 1 tonne annually, necessitate a registration within REACH. A common problem, concerning these chemicals, is deficient information and lack of data for assessing the hazards posed to human health and the environment. Animal studies for the type of toxicological information needed are both expensive and time consuming, and to that an ethical aspect is added. Alternative methods to animal testing are thereby requested. REACH have called for an increased use of in silico tools for non-testing data as structure-activity relationships (SARs), quantitative structure-activity relationships (QSARs), and read-across. The main objective of the studies underlying this thesis is related to explore and refine the use of in silico tools in a risk assessment context of industrial chemicals. In particular, try to relate properties of the molecular structure to the toxic effect of the chemical substance, by using principles and methods of computational chemistry. The initial study was a survey of all industrial chemicals; the Industrial chemical map was created. A part of this map was identified including chemicals of potential concern. Secondly, the environmental pollutants, polychlorinated biphenyls (PCBs) were examined and in particular the non-dioxin-like PCBs (NDL-PCBs). A set of 20 NDL-PCBs was selected to represent the 178 PCB congeners with three to seven chlorine substituents. The selection procedure was a combined process including statistical molecular design for a representative selection and expert judgements to be able to include congeners of specific interest. The 20 selected congeners were tested in vitro in as much as 17 different assays. The data from the screening process was turned into interpretable toxicity profiles with multivariate methods, used for investigation of potential classes of NDL-PCBs. It was shown that NDL-PCBs cannot be treated as one group of substances with similar mechanisms of action. Two groups of congeners were identified. A group including in general lower chlorinated congeners with a higher degree of ortho substitution showed a higher potency in more assays (including all neurotoxic assays). A second group included abundant congeners with a similar toxic profile that might contribute to a common toxic burden. To investigate the structure-activity pattern of PCBs effect on DAT in rat striatal synaptosomes, ten additional congeners were selected and tested in vitro. NDL-PCBs were shown to be potent inhibitors of DAT binding. The congeners with highest DAT inhibiting potency were tetra- and penta-chlorinated with 2-3 chlorine atoms in ortho-position. The model was not able to distinguish the congeners with activities in the lower μM range, which could be explained by a relatively unspecific response for the lower ortho chlorinated PCBs. / Den europeiska kemikalielagstiftningen REACH har fastställt att kemikalier som produceras eller importeras i en mängd över 1 ton per år, måste registreras och riskbedömmas. En uppskattad siffra är att detta gäller för 30 000 kemikalier. Problemet är dock att data och information ofta är otillräcklig för en riskbedömning. Till stor del har djurförsök använts för effektdata, men djurförsök är både kostsamt och tidskrävande, dessutom kommer den etiska aspekten in. REACH har därför efterfrågat en undersökning av möjligheten att använda in silico verktyg för att bidra med efterfrågad data och information. In silico har en ungefärlig betydelse av i datorn, och innebär beräkningsmodeller och metoder som används för att få information om kemikaliers egenskaper och toxicitet. Avhandlingens syfte är att utforska möjligheten och förfina användningen av in silico verktyg för att skapa information för riskbedömning av industrikemikalier. Avhandlingen beskriver kvantitativa modeller framtagna med kemometriska metoder för att prediktera, dvs förutsäga specifika kemikaliers toxiska effekt. I den första studien (I) undersöktes 56 072 organiska industrikemikalier. Med multivariata metoder skapades en karta över industrikemikalierna som beskrev dess kemiska och fysikaliska egenskaper. Kartan användes för jämförelser med kända och potentiella miljöfarliga kemikalier. De mest kända miljöföroreningarna visade sig ha liknande principal egenskaper och grupperade i kartan. Genom att specialstudera den delen av kartan skulle man kunna identifiera fler potentiellt farliga kemiska substanser. I studie två till fyra (II-IV) specialstuderades miljögiftet PCB. Tjugo PCBs valdes ut så att de strukturellt och fysiokemiskt representerade de 178 PCB kongenerna med tre till sju klorsubstituenter. Den toxikologiska effekten hos dessa 20 PCBs undersöktes i 17 olika in vitro assays. De toxikologiska profilerna för de 20 testade kongenerna fastställdes, dvs vilka som har liknande skadliga effekter och vilka som skiljer sig åt. De toxicologiska profilerna användes för klassificering av PCBs. Kvantitativa modeller utvecklades för prediktioner, dvs att förutbestämma effekter hos ännu icke testade PCBs, och för att få ytterligare kunskap om strukturella egenskaper som ger icke önskvärda effekter i människa och natur. Information som kan användas vid en framtida riskbedömning av icke-dioxinlika PCBs. Den sista studien (IV) är en struktur-aktivitets studie som undersöker de icke-dioxinlika PCBernas hämmande effekt av signalsubstansen dopamin i hjärnan.
|
158 |
The development of FT-Raman techniques to quantify the hydrolysis of Cobalt (III) nitrophenylphosphate complexes using multivariate data analysisTshabalala, Oupa Samuel 03 1900 (has links)
The FT-Raman techniques were developed to quantify reactions that
follow on mixing aqueous solutions of bis-(1,3-diaminopropane)diaquacobalt(
III) ion ([Co(tn)2(0H)(H20)]2+) and p-nitrophenylphosphate
(PNPP).
For the development and validation of the kinetic modelling
technique, the well-studied inversion of sucrose was utilized. Rate
constants and concentrations could be estimated using calibration
solutions and modelling methods. It was found that the results
obtained are comparable to literature values. Hence this technique
could be further used for the [Co(tn)2(0H)(H20)]2+ assisted
hydrolysis of PNPP.
It was found that rate constants where the pH is maintained at 7.30
give results which differ from those where the pH is started at 7.30
and allowed to change during the reaction. The average rate
constant for 2:1 ([Co(tn)2(0H)(H20)]2+:PNPP reactions was found to
be approximately 3 x 104 times the unassisted PNPP hydrolysis rate. / Chemistry / M. Sc. (Chemistry)
|
159 |
Inférence statistique en grande dimension pour des modèles structurels. Modèles linéaires généralisés parcimonieux, méthode PLS et polynômes orthogonaux et détection de communautés dans des graphes. / Statistical inference for structural models in high dimension. Sparse generalized linear models, PLS through orthogonal polynomials and community detection in graphsBlazere, Melanie 01 July 2015 (has links)
Cette thèse s'inscrit dans le cadre de l'analyse statistique de données en grande dimension. Nous avons en effet aujourd'hui accès à un nombre toujours plus important d'information. L'enjeu majeur repose alors sur notre capacité à explorer de vastes quantités de données et à en inférer notamment les structures de dépendance. L'objet de cette thèse est d'étudier et d'apporter des garanties théoriques à certaines méthodes d'estimation de structures de dépendance de données en grande dimension.La première partie de la thèse est consacrée à l'étude de modèles parcimonieux et aux méthodes de type Lasso. Après avoir présenté les résultats importants sur ce sujet dans le chapitre 1, nous généralisons le cas gaussien à des modèles exponentiels généraux. La contribution majeure à cette partie est présentée dans le chapitre 2 et consiste en l'établissement d'inégalités oracles pour une procédure Group Lasso appliquée aux modèles linéaires généralisés. Ces résultats montrent les bonnes performances de cet estimateur sous certaines conditions sur le modèle et sont illustrés dans le cas du modèle Poissonien. Dans la deuxième partie de la thèse, nous revenons au modèle de régression linéaire, toujours en grande dimension mais l'hypothèse de parcimonie est cette fois remplacée par l'existence d'une structure de faible dimension sous-jacente aux données. Nous nous penchons dans cette partie plus particulièrement sur la méthode PLS qui cherche à trouver une décomposition optimale des prédicteurs étant donné un vecteur réponse. Nous rappelons les fondements de la méthode dans le chapitre 3. La contribution majeure à cette partie consiste en l'établissement pour la PLS d'une expression analytique explicite de la structure de dépendance liant les prédicteurs à la réponse. Les deux chapitres suivants illustrent la puissance de cette formule aux travers de nouveaux résultats théoriques sur la PLS . Dans une troisième et dernière partie, nous nous intéressons à la modélisation de structures au travers de graphes et plus particulièrement à la détection de communautés. Après avoir dressé un état de l'art du sujet, nous portons notre attention sur une méthode en particulier connue sous le nom de spectral clustering et qui permet de partitionner les noeuds d'un graphe en se basant sur une matrice de similarité. Nous proposons dans cette thèse une adaptation de cette méthode basée sur l'utilisation d'une pénalité de type l1. Nous illustrons notre méthode sur des simulations. / This thesis falls within the context of high-dimensional data analysis. Nowadays we have access to an increasing amount of information. The major challenge relies on our ability to explore a huge amount of data and to infer their dependency structures.The purpose of this thesis is to study and provide theoretical guarantees to some specific methods that aim at estimating dependency structures for high-dimensional data. The first part of the thesis is devoted to the study of sparse models through Lasso-type methods. In Chapter 1, we present the main results on this topic and then we generalize the Gaussian case to any distribution from the exponential family. The major contribution to this field is presented in Chapter 2 and consists in oracle inequalities for a Group Lasso procedure applied to generalized linear models. These results show that this estimator achieves good performances under some specific conditions on the model. We illustrate this part by considering the case of the Poisson model. The second part concerns linear regression in high dimension but the sparsity assumptions is replaced by a low dimensional structure underlying the data. We focus in particular on the PLS method that attempts to find an optimal decomposition of the predictors given a response. We recall the main idea in Chapter 3. The major contribution to this part consists in a new explicit analytical expression of the dependency structure that links the predictors to the response. The next two chapters illustrate the power of this formula by emphasising new theoretical results for PLS. The third and last part is dedicated to graphs modelling and especially to community detection. After presenting the main trends on this topic, we draw our attention to Spectral Clustering that allows to cluster nodes of a graph with respect to a similarity matrix. In this thesis, we suggest an alternative to this method by considering a $l_1$ penalty. We illustrate this method through simulations.
|
160 |
Novas estratégias para seleção de variáveis por intervalos em problemas de classificaçãoFernandes, David Douglas de Sousa 26 August 2016 (has links)
Submitted by Maike Costa (maiksebas@gmail.com) on 2017-06-20T13:50:43Z
No. of bitstreams: 1
arquivototal.pdf: 7102668 bytes, checksum: abe19d798ad952073affbf4950f62d29 (MD5) / Made available in DSpace on 2017-06-20T13:50:43Z (GMT). No. of bitstreams: 1
arquivototal.pdf: 7102668 bytes, checksum: abe19d798ad952073affbf4950f62d29 (MD5)
Previous issue date: 2016-08-26 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / In Analytical Chemistry it has been recurring in the literature the use of analytical signals recorded on multiple sensors combined with subsequent chemometric modeling for developing new analytical methodologies. For this purpose, it uses generally multivariate instrumental techniques as spectrometry ultraviolet-visible or near infrared, voltammetry, etc. In this scenario, the analyst is faced with the option of selecting individual variables or variable intervals so to avoid or reduce multicollinearity problems. A well-known strategy for selection of variable intervals is to divide the set of instrumental responses into equal width intervals and select the best interval based on the performance of the prediction of a unique range in the regression by Partial Least Squares (iPLS). On the other hand, the use of interval selection for classification purposes has received relatively little attention. A common practice is to use the iPLS regression method with the coded class indices as response variables to be predicted; that is the basic idea behind the release of the Discriminant Analysis by Partial Least Squares (PLS-DA) for classification. In other words, interval selection for classification purposes has no development of native functions (algorithms). Thus, in this work it is proposed two new strategies in classification problems using interval selection by the Successive Projections Algorithm. The first strategy is named Successive Projections Algorithm for selecting intervals in Discriminant Analysis Partial Least Squares (iSPA-PLS-DA), while the second strategy is called Successive Projections Algorithm for selecting intervals in Soft and Independent Modeling by Class Analogy (iSPA-SIMCA). The performance of the proposed algorithms was evaluated in three case studies: classification of vegetable oils according to the type of raw material and the expiration date using data obtained by square wave voltammetry; classification of unadulterated biodiesel/diesel blends (B5) and adulterated with soybean oil (OB5) using spectral data obtained in the ultraviolet-visible region; and classification of vegetable oils with respect to the expiration date using spectral data obtained in the near infrared region. The proposed iSPA-PLS-DA and iSPA-SIMCA algorithms provided good results in the three case studies, with correct classification rates always greater than or equal to those obtained by PLS-DA and SIMCA models using all variables, iPLS-DA and iSIMCA with a single selected interval, as well as SPA-LDA and GA-LDA with selection of individual variables. Therefore, the proposed iSPA-PLS-DA and iSPA-SIMCA algorithms can be considered as promising approaches for use in classification problems employing interval selection. In a more general point of view, the possibility of using interval selection without loss of the classification accuracy can be considered a very useful tool for the construction of dedicated instruments (e.g. LED-based photometers) for use in routine and in situ analysis. / Em Química Analítica tem sido recorrente na literatura o uso de sinais analíticos registrados em múltiplos sensores combinados com posterior modelagem quimiométrica para desenvolvimento de novas metodologias analíticas. Para esta finalidade, geralmente se faz uso de técnicas instrumentais multivariadas como a espectrometrias no ultravioleta-visível ou no infravermelho próximo, voltametria, etc. Neste cenário, o analista se depara com a opção de selecionar variáveis individuais ou intervalos de variáveis de modo de evitar ou diminuir problemas de multicolinearidade. Uma estratégia bem conhecida para seleção de intervalos de variáveis consiste em dividir o conjunto de respostas instrumentais em intervalos de igual largura e selecionar o melhor intervalo com base no critério de desempenho de predição de um único intervalo em regressão por Mínimos Quadrados Parciais (iPLS). Por outro lado, o uso da seleção de intervalo para fins de classificação tem recebido relativamente pouca atenção. Uma prática comum consiste em utilizar o método de regressão iPLS com os índices de classe codificados como variáveis de resposta a serem preditos, que é a idéia básica por trás da versão da Análise Discriminante por Mínimos Quadrados Parciais (PLS-DA) para a classificação. Em outras palavras, a seleção de intervalos para fins de classificação não possui o desenvolvimento de funções nativas (algoritmos). Assim, neste trabalho são propostas duas novas estratégias em problemas de classificação que usam seleção de intervalos de variáveis empregando o Algoritmo das Projeções Sucessivas. A primeira estratégia é denominada de Algoritmo das Projeções Sucessivas para seleção intervalos em Análise Discriminante por Mínimos Quadrados Parciais (iSPA-PLS-DA), enquanto a segunda estratégia é denominada de Algoritmo das Projeções Sucessivas para a seleção de intervalos em Modelagem Independente e Flexível por Analogia de Classe (iSPA-SIMCA). O desempenho dos algoritmos propostos foi avaliado em três estudos de casos: classificação de óleos vegetais com relação ao tipo de matéria-prima e ao prazo de validade utilizando dados obtidos por voltametria de onda quadrada; classificação de misturas biodiesel/diesel não adulteradas (B5) e adulteradas com óleo de soja (OB5) empregando dados espectrais obtidos na região do ultravioleta-visível; e classificação de óleos vegetais com relação ao prazo de validade usando dados espectrais obtidos na região do infravermelho próximo. Os algoritmos iSPA-PLS-DA e iSPA-SIMCA propostos forneceram bons resultados nos três estudos de caso, com taxas de classificação corretas sempre iguais ou superiores àquelas obtidas pelos modelos PLS-DA e SIMCA utilizando todas as variáveis, iPLS-DA e iSIMCA com um único intervalo selecionado, bem como SPA-LDA e GA-LDA com seleção de variáveis individuais. Portanto, os algoritmos iSPA-PLS-DA e iSPA-SIMCA propostos podem ser consideradas abordagens promissoras para uso em problemas de classificação empregando seleção de intervalos de variáveis. Num contexto mais geral, a possibilidade de utilização de seleção de intervalos de variáveis sem perda da precisão da classificação pode ser considerada uma ferramenta bastante útil para a construção de instrumentos dedicados (por exemplo, fotômetros a base de LED) para uso em análise de rotina e de campo.
|
Page generated in 0.0423 seconds