Global ETD Search

1	Metodologia baseada em NIRS e Quimiometria para a determinação de parâmetros de qualidade da quitosana para fins biomédicos Guimarães, Pedro Queiroz 25 September 2017 (has links) Submitted by Jean Medeiros (jeanletras@uepb.edu.br) on 2017-11-23T13:48:59Z No. of bitstreams: 1 PDF - Pedro Queiroz Guimarães.pdf: 23217365 bytes, checksum: a7631f54ea50ef598ce3865a5efbf3d0 (MD5) / Approved for entry into archive by Secta BC (secta.csu.bc@uepb.edu.br) on 2017-12-06T18:43:59Z (GMT) No. of bitstreams: 1 PDF - Pedro Queiroz Guimarães.pdf: 23217365 bytes, checksum: a7631f54ea50ef598ce3865a5efbf3d0 (MD5) / Made available in DSpace on 2017-12-06T18:44:11Z (GMT). No. of bitstreams: 1 PDF - Pedro Queiroz Guimarães.pdf: 23217365 bytes, checksum: a7631f54ea50ef598ce3865a5efbf3d0 (MD5) Previous issue date: 2017-09-25 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / Chitosan is a biomaterial in which the main quality characteristics are molar mass (MM) and degree of deacetylation (DD), that influence almost all of its functional properties. Thus, it is essential to determine both, to the supply of a quality raw material. The standard methodologies used to determine these parameters are viscosimetry and medium infrared spectroscopy, which, although accurate, present some operational difficulties. A feasible alternative to overcome these problems is the development of methodologies based on the near infrared (NIR) spectroscopy and chemometrics, proposed in this work. To develop the multivariate model, it was necessary to increase the variation of the parameters of interest. Thus, ten batches of chitosan were produced varying the deacetylation times in 3, 4, 5, 6 and 7 hours, producing 5 samples per batch and 50 samples in total. All samples were characterized in terms of DD and MM, according to the reference methodologies. The same samples were also analyzed by NIR spectroscopy. The NIR spectra of the samples were recorded in triplicates in the spectral range of 9,000 to 4,000 cm , using 32 scans and 8 cm¹ resolution, totalizing 150 spectra. To build the models, several spectral pre-processings were evaluated in relation to the predictive capacity. The calibration (100 samples) and prediction (50 samples) sets were selected with the assistance of the SPXY algorithm. The predictive capacity of the built models using the full spectral range of work was also evaluated and compared with those built using the variables selected by algorithms of variable selection. The evaluation of the predictive capacity of the models was performed by the analysis of figures of merit. Based on these parameters, it was verified that the best spectral pre-processing was the 1st derivative with window of 5 and 1st order polynomial for the DD. For the MM, the best predictive performance was shown by EMSC. In general, it was verified that the models built using the regression coefficients generated by the Martens' Uncertainty Test (Jack-knife coefficients), presented better predictive performance than the models built with all the spectral variables or with the spectral variables selected by the Successive Projections Algorithm (SPA). The prediction errors obtained for DD and MM were 1.85% and 29.08 KDa, respectively. The error obtained for DD is smaller than the allowed for the reference method. However, for the molar mass, the model did not show satisfactory performance. Therefore, it is clear the viability of the applied methodologies based on NIR spectroscopy and chemometrics to determine the DD in chitosan for biomedical purposes, produced by CERTBIO. / A quitosana é um biomaterial que tem como principais características de qualidade a massa molar (MM) e o grau de desacetilação (GD), que influenciam praticamente todas as suas propriedades funcionais. Deste modo, é imprescindível a determinação de ambas pa ra o fornecimento de uma matéria-prima de qualidade. As metodologias padrão utilizadas para determinação destes parâmetros são a viscosimetria e a espectroscopia de infravermelho médio, que, apesar de precisas e exatas, apresentam algumas dificuldades operacionais. Uma alternativa viável para contornar esses problemas, é o desenvolvimento de metodologias baseadas na espectroscopia no infravermelho próximo (NIR) e quimiometria, proposta neste trabalho. Para o desenvolvimento do modelo multivariado foi necessária a ampliação da variação dos parâmetros de interesse. Para isto, foram produzidos dez lotes de quitosana, variando-se os tempos de desacetilação em: 3, 4, 5, 6 e 7 horas, sendo produzidas 5 amostras por lote e 50 amostras no total. Todas as mostras foram caracterizadas em termos de GD e MM de acordo com as metodologias de referências. As mesmas amostras também foram analisadas por espectroscopia NIR. Os espectros NIR das amostras foram registrados, em triplicatas, na faixa espectral de 9.000 a 4.000 cm , utilizando-se 32 varreduras e resolução de 8 cm¹, totalizando 150 espectros. Para a construção dos modelos, vários pré -processamentos espectrais foram avaliados em relação a capacidade preditiva. Os conjuntos de calibração (100 amostras) e predição (50 amostras) foram selecionados com o auxílio do algoritmo SPXY. Também foi avaliada e comparada a capacidade preditiva dos modelos construídos utilizando toda a faixa espectral de trabalho com aqueles construídos utilizando as variáveis selecionadas por algoritmos de seleção de variáveis. A avaliação da capacidade preditiva dos modelos foi realizada pela análise de figuras de mérito. Com base nestes parâmetros foi constatado que o melhor pré-processamento espectral foi a 1ª derivada com janela de 5 e polinômio de 1ª ordem para o GD. Para a MM, a melhor performance preditiva foi mostrada pelo EMSC. De forma geral, foi constatado que os modelos construídos utilizando os coeficientes de regressão gerados pelo teste de incerteza de Martens (coeficientes Jack-Knife) apresentavam melhor performance preditiva que os modelos construídos com todas as variáveis espectrais ou com as variáveis espectrais selecionadas pelo Algoritmo das Projeções Sucessivas (SPA). Os erros de predição obtidos para GD e MM foram 1,85% e 29,08 KDa, respectivamente. O erro obtido para GD é menor que o permitido para o método de referência. No entanto para a massa molar, o modelo não mostrou um desempenho satisfatório. Deste modo, fica clara a viabilidade da aplicação das metodologias baseadas na espectroscopia NIR e quimiometria para a determinação do GD na quitosana para fins biomédicos, produzida pelo CERTBIO. SPXY Quimiometria Quitosana Espectroscopia no infravermelho próximo Chitosan Chemometrics Near Infrared Spectroscopy SPXY CIENCIAS EXATAS E DA TERRA::QUIMICA
2	A Comparative study of data splitting algorithms for machine learning model selection Birba, Delwende Eliane January 2020 (has links) Data splitting is commonly used in machine learning to split data into a train, test, or validation set. This approach allows us to find the model hyper-parameter and also estimate the generalization performance. In this research, we conducted a comparative analysis of different data partitioning algorithms on both real and simulated data. Our main objective was to address the question of how the choice of data splitting algorithm can improve the estimation of the generalization performance. Data splitting algorithms used in this study were variants of k-fold, Kennard-Stone, SPXY ( sample set partitioning based on joint x-y distance), and random sampling algorithm. Each algorithm divided the data into two subset, training/validation. The training set was used to fit the model and validation for the evaluation. We then analyzed the different data splitting algorithms based on the generalization performances estimated from the validation and the external test set. From the result, we noted that the important determinant for a good generalization is the size of the dataset. For all the data sample methods applied on small data set, the gap between the performance estimated on the validation and test set was significant. However, we noted that the gap reduced when there was more data in training or validation. Too many or few data in the training set can also lead to bad model performance. So it is importance to have a reasonable balance between the training/validation set sizes. In our study, KS and SPXY was the splitting algorithm with poor model performance estimation. Indeed these methods select the most representative samples to train the model, and poor representative samples are left for model performance estimation. / Datapartitionering används vanligtvis i maskininlärning för att dela data i en tränings, test eller valideringsuppsättning. Detta tillvägagångssätt gör det möjligt för oss att hitta hyperparametrar för modellen och även uppskatta generaliseringsprestanda. I denna forskning genomförde vi en jämförande analys av olika datapartitionsalgoritmer på både verkliga och simulerade data. Vårt huvudmål var att undersöka frågan om hur valet avdatapartitioneringsalgoritm kan förbättra uppskattningen av generaliseringsprestanda. Datapartitioneringsalgoritmer som användes i denna studie var varianter av k-faldig korsvalidering, Kennard-Stone (KS), SPXY (partitionering baserat på gemensamt x-y-avstånd) och bootstrap-algoritm. Varje algoritm användes för att dela upp data i två olika datamängder: tränings- och valideringsdata. Vi analyserade sedan de olika datapartitioneringsalgoritmerna baserat på generaliseringsprestanda uppskattade från valideringen och den externa testuppsättningen. Från resultatet noterade vi att det avgörande för en bra generalisering är storleken på data. För alla datapartitioneringsalgoritmer som använts på små datamängder var klyftan mellan prestanda uppskattad på valideringen och testuppsättningen betydande. Vi noterade emellertid att gapet minskade när det fanns mer data för träning eller validering. För mycket eller för litet data i träningsuppsättningen kan också leda till dålig prestanda. Detta belyser vikten av att ha en korrekt balans mellan storlekarna på tränings- och valideringsmängderna. I vår studie var KS och SPXY de algoritmer med sämst prestanda. Dessa metoder väljer de mest representativa instanserna för att träna modellen, och icke-representativa instanser lämnas för uppskattning av modellprestanda. K-fold cross-validation Kennard-Stone algorithm data splitting bootstrap overfitting SPXY k-faldig korsvalidering korsvalidering Kennard-Stone-algoritm datapartitionering bootstrap överanpassning SPXY Computer and Information Sciences Data- och informationsvetenskap

Search results

Metodologia baseada em NIRS e Quimiometria para a determinação de parâmetros de qualidade da quitosana para fins biomédicos

A Comparative study of data splitting algorithms for machine learning model selection