Global ETD Search

151	Using machine learning techniques to simplify mobile interfaces Sigman, Matthew Stephen 19 April 2013 (has links) This paper explores how known machine learning techniques can be applied in unique ways to simplify software and therefore dramatically increase its usability. As software has increased in popularity, its complexity has increased in lockstep, to a point where it has become burdensome. By shifting the focus from the software to the user, great advances can be achieved by way of simplification. The example problem used in this report is well known: suggest local dining choices tailored to a specific person based on known habits and those of similar people. By analyzing past choices and applying likely probabilities, assumptions can be made to reduce user interaction, allowing the user to realize the benefits of the software faster and more frequently. This is accomplished with Java Servlets, Apache Mahout machine learning libraries, and various third party resources to gather dimensions on each recommendation. / text Linear interpolation Machine learning Mahout Software design Mobile apps Prediction Item-based recommendations k-Nearest Neighbor Collaborative filtering Quadratic programming
152	Multivariate fault detection and visualization in the semiconductor industry Chamness, Kevin Andrew 28 August 2008 (has links) Not available / text Nearest neighbor analysis (Statistics) Multivariate analysis Process control--Statistical methods
153	Classification of Genotype and Age by Spatial Aspects of RPE Cell Morphology Boring, Michael 12 August 2014 (has links) Age related macular degeneration (AMD) is a public health concern in an aging society. The retinal pigment epithelium (RPE) layer of the eye is a principal site of pathogenesis for AMD. Morphological characteristics of the cells in the RPE layer can be used to discriminate age and disease status of individuals. In this thesis three genotypes of mice of various ages are used to study the predictive abilities of these characteristics. The disease state is represented by two mutant genotypes and the healthy state by the wild-type. Classification analysis is applied to the RPE morphology from the different spatial regions of the RPE layer. Variable reduction is accomplished by principal component analysis (PCA) and classification analysis by the k-nearest neighbor (k-NN) algorithm. In this way the differential ability of the spatial regions to predict age and disease status by cellular variables is explored. Age related macular degeneration (AMD) Retinal pigment epithelium (RPE) K-nearest neighbor algorithm (k-NN) Classification Principal component analysis (PCA)
154	An Extended Study On The Alu Insertion Polymorphisms In Anatolian Human Population Sekeryapan, Ceran 01 September 2005 (has links) (PDF) In the present study, for estimating the Central Asia contribution to the Anatolia, nine Alu insertion polymorphisms (ACE, PV92, FXIIIB, APO, A25, B65, TPA25, D1, HS4.32 ) in 100 individuals from Anatolia were examined. Alu insertion frequency for these loci were calculated as 0,410 / 0,220 / 0,579 / 0,963 / 0,067 / 0,667 / 0,390 / 0,427 / and 0,637 respectively and they were found to be in Hardy-Weinberg equilibrium (p&lt / 0,05). Observed insertion frequencies of each loci were compared with those of the previous observations (Din&ccedil / , 2003 / Comas et al., 2004) and it was found that the present study results were not different than those obtained by Comas et al. (2004). Thus, these two data were pooled (N = 143) and used to examine genetic relationships between populations from Eurasia and Africa. Pairwise Fst statistics indicated that there is higher genetic similarity between Anatolia and all of the Balkans and some of the Caucasian populations. Neighbor Joining (NJ) tree based on Reynold&rsquo / s genetic distances and Principal Component Analysis (PCA) both grouped the Anatolian populations with Balkans and some of the Caucasian populations and show clear differentiation of Asian populations from the Anatolian population. The relative genetic contribution of Central Asian genes to the current Anatolian gene pool was quantified using Admix analysis, considering for comparison populations of Balkans (Greek, Romania, Albania and Hungarian) and Central Asia (Uighur, Uzbeks, Tajicks, Kazaks, Kyrgyzes, Dungans). Estimates suggest roughly 28 % contribution from Asia to Anatolia in concordance with the previous estimation (Benedetto et al., 2001). QH Genetics 426-470
155	Algorithmes pour la dynamique moléculaire restreinte de manière adaptative / Algorithms for adaptively restrained molecular dynamics Singh, Krishna Kant 08 November 2017 (has links) Les méthodes de dynamique moléculaire (MD pour Molecular Dynamics en anglais) sont utilisées pour simuler des systèmes volumineux et complexes. Cependant, la simulation de ce type de systèmes sur de longues échelles temporelles demeure un problème coûteux en temps de calcul. L'étape la plus coûteuse des méthodes de MD étant la mise à jour des forces entre les particules. La simulation de particules restreintes de façon adaptative (ARMD pour Adaptively Restrained Molecular Dynamics en anglais) est une nouvelle approche permettant d'accélérer le processus de simulation en réduisant le nombre de calculs de forces effectués à chaque pas de temps. La méthode ARMD fait varier l'état des degrés de liberté en position en les activants ou en les désactivants de façon adaptative au cours de la simulation. Du fait, que le calcul des forces dépend majoritairement de la distance entre les atomes, ce calcul peut être évité entre deux particules dont les degrés de liberté en position sont désactivés. En revanche, le calcul des forces pour les particules actives (i.e. celles dont les degrés de liberté en position sont actifs) est effectué. Afin d'exploiter au mieux l'adaptabilité de la méthode ARMD, nous avons conçu de nouveaux algorithmes permettant de calculer et de mettre à jour les forces de façon plus efficace. Nous avons développé des algorithmes permettant de construire et de mettre à jour des listes de voisinage de manière incrémentale. En particulier, nous avons travaillé sur un algorithme de mise à jour incrémentale des forces en un seul passage deux fois plus rapide que l'ancien algorithme également incrémental mais qui nécessitait deux passages. Les méthodes proposées ont été implémentées et validées dans le simulateur de MD appelé LAMMPS, mais elles peuvent s'appliquer à n'importe quel autre simulateur de MD. Nous avons validé nos algorithmes pour différents exemples sur les ensembles NVE et NVT. Dans l'ensemble NVE, la méthode ARMD permet à l'utilisateur de jouer sur le précision pour accélérer la vitesse de la simulation. Dans l'ensemble NVT, elle permet de mesurer des grandeurs statistiques plus rapidement. Finalement, nous présentons des algorithmes parallèles pour la mise à jour incrémentale en un seul passage permettant d'utiliser la méthode ARMD avec le standard Message Passage Interface (MPI). / Molecular Dynamics (MD) is often used to simulate large and complex systems. Although, simulating such complex systems for the experimental time scales are still computationally challenging. In fact, the most computationally extensive step in MD is the computation of forces between particles. Adaptively Restrained Molecular Dynamics (ARMD) is a recently introduced particles simulation method that switches positional degrees of freedom on and off during simulation. Since force computations mainly depend upon the inter-atomic distances, the force computation between particles with positional degrees of freedom off~(restrained particles) can be avoided. Forces involving active particles (particles with positional degrees of freedom on) are computed.In order to take advantage of adaptability of ARMD, we designed novel algorithms to compute and update forces efficiently. We designed algorithms not only to construct neighbor lists, but also to update them incrementally. Additionally, we designed single-pass incremental force update algorithm that is almost two times faster than previously designed two-pass incremental algorithm. These proposed algorithms are implemented and validated in the LAMMPS MD simulator, however, these algorithms can be applied to other MD simulators. We assessed our algorithms on different and diverse benchmarks in both microcanonical ensemble (NVE) and canonical (NVT) ensembles. In the NVE ensemble, ARMD allows users to trade between precision and speed while, in the NVT ensemble, it makes it possible to compute statistical averages faster. In Last, we introduce parallel algorithms for single-pass incremental force computations to take advantage of adaptive restraints using the Message Passage Interface (MPI) standard. Simulation adaptative Dynamique moléculaire Parallélisation Adaptive Simulation Molecular Dynamics Parallelization Active Neighbor List Single-Pass algorithm Incremental algorithms 004
156	Método de mineração de dados para diagnóstico de câncer de mama baseado na seleção de variáveis / A data mining method for breast cancer diagnosis based on selected features Holsbach, Nicole January 2012 (has links) A presente dissertação propõe métodos para mineração de dados para diagnóstico de câncer de mama (CM) baseado na seleção de variáveis. Partindo-se de uma revisão sistemática, sugere-se um método para a seleção de variáveis para classificação das observações (pacientes) em duas classes de resultado, benigno ou maligno, baseado na análise citopatológica de amostras de célula da mama de pacientes. O método de seleção de variáveis para categorização das observações baseia-se em 4 passos operacionais: (i) dividir o banco de dados original em porções de treino e de teste, e aplicar a ACP (Análise de Componentes Principais) na porção de treino; (ii) gerar índices de importância das variáveis baseados nos pesos da ACP e na percentagem da variância explicada pelos componentes retidos; (iii) classificar a porção de treino utilizando as técnicas KVP (k-vizinhos mais próximos) ou AD (Análise Discriminante). Em seguida eliminar a variável com o menor índice de importância, classificar o banco de dados novamente e calcular a acurácia de classificação; continuar tal processo iterativo até restar uma variável; e (iv) selecionar o subgrupo de variáveis responsável pela máxima acurácia de classificação e classificar a porção de teste utilizando tais variáveis. Quando aplicado ao WBCD (Wisconsin Breast Cancer Database), o método proposto apresentou acurácia média de 97,77%, retendo uma média de 5,8 variáveis. Uma variação do método é proposta, utilizando quatro diferentes tipos de kernels polinomiais para remapear o banco de dados original; os passos (i) a (iv) acima descritos são então aplicados aos kernels propostos. Ao aplicar-se a variação do método ao WBCD, obteve-se acurácia média de 98,09%, retendo uma média de 17,24 variáveis de um total de 54 variáveis geradas pelo kernel polinomial recomendado. O método proposto pode auxiliar o médico na elaboração do diagnóstico, selecionando um menor número de variáveis (envolvidas na tomada de decisão) com a maior acurácia, obtendo assim o maior acerto possível. / This dissertation presents a data mining method for breast cancer (BC) diagnosis based on selected features. We first carried out a systematic literature review, and then suggested a method for feature selection and classification of observations, i.e., patients, into benign or malignant classes based on patients’ breast tissue measures. The proposed method relies on four operational steps: (i) split the original dataset into training and testing sets and apply PCA (Principal Component Analysis) on the training set; (ii) generate attribute importance indices based on PCA weights and percent of variance explained by the retained components; (iii) classify the training set using KNN (k-Nearest Neighbor) or DA (Discriminant Analysis) techniques, eliminate irrelevant features and compute the classification accuracy. Next, eliminate the feature with the lowest importance index, classify the dataset, and re-compute the accuracy. Continue such iterative process until one feature is left; and (iv) choose the subset of features yielding the maximum classification accuracy, and classify the testing set based on those features. When applied to the WBCD (Wisconsin Breast Cancer Database), the proposed method led to average 97.77% accurate classifications while retaining average 5.8 features. One variation of the proposed method is presented based on four different types of polynomial kernels aimed at remapping the original database; steps (i) to (iv) are then applied to such kernels. When applied to the WBCD, the proposed modification increased average accuracy to 98.09% while retaining average of 17.24 features from the 54 variables generated by the recommended kernel. The proposed method can assist the physician in making the diagnosis, selecting a smaller number of variables (involved in the decision-making) with greater accuracy, thereby obtaining the highest possible accuracy. Análise multivariada Mineração de dados Neoplasias mamárias : Diagnóstico Feature selection Breast cancer diagnosis K-nearest neighbor Discriminant Kernel
157	Método de mineração de dados para diagnóstico de câncer de mama baseado na seleção de variáveis / A data mining method for breast cancer diagnosis based on selected features Holsbach, Nicole January 2012 (has links) A presente dissertação propõe métodos para mineração de dados para diagnóstico de câncer de mama (CM) baseado na seleção de variáveis. Partindo-se de uma revisão sistemática, sugere-se um método para a seleção de variáveis para classificação das observações (pacientes) em duas classes de resultado, benigno ou maligno, baseado na análise citopatológica de amostras de célula da mama de pacientes. O método de seleção de variáveis para categorização das observações baseia-se em 4 passos operacionais: (i) dividir o banco de dados original em porções de treino e de teste, e aplicar a ACP (Análise de Componentes Principais) na porção de treino; (ii) gerar índices de importância das variáveis baseados nos pesos da ACP e na percentagem da variância explicada pelos componentes retidos; (iii) classificar a porção de treino utilizando as técnicas KVP (k-vizinhos mais próximos) ou AD (Análise Discriminante). Em seguida eliminar a variável com o menor índice de importância, classificar o banco de dados novamente e calcular a acurácia de classificação; continuar tal processo iterativo até restar uma variável; e (iv) selecionar o subgrupo de variáveis responsável pela máxima acurácia de classificação e classificar a porção de teste utilizando tais variáveis. Quando aplicado ao WBCD (Wisconsin Breast Cancer Database), o método proposto apresentou acurácia média de 97,77%, retendo uma média de 5,8 variáveis. Uma variação do método é proposta, utilizando quatro diferentes tipos de kernels polinomiais para remapear o banco de dados original; os passos (i) a (iv) acima descritos são então aplicados aos kernels propostos. Ao aplicar-se a variação do método ao WBCD, obteve-se acurácia média de 98,09%, retendo uma média de 17,24 variáveis de um total de 54 variáveis geradas pelo kernel polinomial recomendado. O método proposto pode auxiliar o médico na elaboração do diagnóstico, selecionando um menor número de variáveis (envolvidas na tomada de decisão) com a maior acurácia, obtendo assim o maior acerto possível. / This dissertation presents a data mining method for breast cancer (BC) diagnosis based on selected features. We first carried out a systematic literature review, and then suggested a method for feature selection and classification of observations, i.e., patients, into benign or malignant classes based on patients’ breast tissue measures. The proposed method relies on four operational steps: (i) split the original dataset into training and testing sets and apply PCA (Principal Component Analysis) on the training set; (ii) generate attribute importance indices based on PCA weights and percent of variance explained by the retained components; (iii) classify the training set using KNN (k-Nearest Neighbor) or DA (Discriminant Analysis) techniques, eliminate irrelevant features and compute the classification accuracy. Next, eliminate the feature with the lowest importance index, classify the dataset, and re-compute the accuracy. Continue such iterative process until one feature is left; and (iv) choose the subset of features yielding the maximum classification accuracy, and classify the testing set based on those features. When applied to the WBCD (Wisconsin Breast Cancer Database), the proposed method led to average 97.77% accurate classifications while retaining average 5.8 features. One variation of the proposed method is presented based on four different types of polynomial kernels aimed at remapping the original database; steps (i) to (iv) are then applied to such kernels. When applied to the WBCD, the proposed modification increased average accuracy to 98.09% while retaining average of 17.24 features from the 54 variables generated by the recommended kernel. The proposed method can assist the physician in making the diagnosis, selecting a smaller number of variables (involved in the decision-making) with greater accuracy, thereby obtaining the highest possible accuracy. Análise multivariada Mineração de dados Neoplasias mamárias : Diagnóstico Feature selection Breast cancer diagnosis K-nearest neighbor Discriminant Kernel
158	Método de mineração de dados para diagnóstico de câncer de mama baseado na seleção de variáveis / A data mining method for breast cancer diagnosis based on selected features Holsbach, Nicole January 2012 (has links) A presente dissertação propõe métodos para mineração de dados para diagnóstico de câncer de mama (CM) baseado na seleção de variáveis. Partindo-se de uma revisão sistemática, sugere-se um método para a seleção de variáveis para classificação das observações (pacientes) em duas classes de resultado, benigno ou maligno, baseado na análise citopatológica de amostras de célula da mama de pacientes. O método de seleção de variáveis para categorização das observações baseia-se em 4 passos operacionais: (i) dividir o banco de dados original em porções de treino e de teste, e aplicar a ACP (Análise de Componentes Principais) na porção de treino; (ii) gerar índices de importância das variáveis baseados nos pesos da ACP e na percentagem da variância explicada pelos componentes retidos; (iii) classificar a porção de treino utilizando as técnicas KVP (k-vizinhos mais próximos) ou AD (Análise Discriminante). Em seguida eliminar a variável com o menor índice de importância, classificar o banco de dados novamente e calcular a acurácia de classificação; continuar tal processo iterativo até restar uma variável; e (iv) selecionar o subgrupo de variáveis responsável pela máxima acurácia de classificação e classificar a porção de teste utilizando tais variáveis. Quando aplicado ao WBCD (Wisconsin Breast Cancer Database), o método proposto apresentou acurácia média de 97,77%, retendo uma média de 5,8 variáveis. Uma variação do método é proposta, utilizando quatro diferentes tipos de kernels polinomiais para remapear o banco de dados original; os passos (i) a (iv) acima descritos são então aplicados aos kernels propostos. Ao aplicar-se a variação do método ao WBCD, obteve-se acurácia média de 98,09%, retendo uma média de 17,24 variáveis de um total de 54 variáveis geradas pelo kernel polinomial recomendado. O método proposto pode auxiliar o médico na elaboração do diagnóstico, selecionando um menor número de variáveis (envolvidas na tomada de decisão) com a maior acurácia, obtendo assim o maior acerto possível. / This dissertation presents a data mining method for breast cancer (BC) diagnosis based on selected features. We first carried out a systematic literature review, and then suggested a method for feature selection and classification of observations, i.e., patients, into benign or malignant classes based on patients’ breast tissue measures. The proposed method relies on four operational steps: (i) split the original dataset into training and testing sets and apply PCA (Principal Component Analysis) on the training set; (ii) generate attribute importance indices based on PCA weights and percent of variance explained by the retained components; (iii) classify the training set using KNN (k-Nearest Neighbor) or DA (Discriminant Analysis) techniques, eliminate irrelevant features and compute the classification accuracy. Next, eliminate the feature with the lowest importance index, classify the dataset, and re-compute the accuracy. Continue such iterative process until one feature is left; and (iv) choose the subset of features yielding the maximum classification accuracy, and classify the testing set based on those features. When applied to the WBCD (Wisconsin Breast Cancer Database), the proposed method led to average 97.77% accurate classifications while retaining average 5.8 features. One variation of the proposed method is presented based on four different types of polynomial kernels aimed at remapping the original database; steps (i) to (iv) are then applied to such kernels. When applied to the WBCD, the proposed modification increased average accuracy to 98.09% while retaining average of 17.24 features from the 54 variables generated by the recommended kernel. The proposed method can assist the physician in making the diagnosis, selecting a smaller number of variables (involved in the decision-making) with greater accuracy, thereby obtaining the highest possible accuracy. Análise multivariada Mineração de dados Neoplasias mamárias : Diagnóstico Feature selection Breast cancer diagnosis K-nearest neighbor Discriminant Kernel
159	Metric space indexing for nearest neighbor search in multimedia context : Indexação de espaços métricos para busca de vizinho mais próximo em contexto multimídia / Indexação de espaços métricos para busca de vizinho mais próximo em contexto multimídia Silva, Eliezer de Souza da, 1988- 26 August 2018 (has links) Orientador: Eduardo Alves do Valle Junior / Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de Computação / Made available in DSpace on 2018-08-26T08:10:33Z (GMT). No. of bitstreams: 1 Silva_EliezerdeSouzada_M.pdf: 2350845 bytes, checksum: dd31928bd19312563101a08caea74d63 (MD5) Previous issue date: 2014 / Resumo: A crescente disponibilidade de conteúdo multimídia é um desafio para a pesquisa em Recuperação de Informação. Usuários querem não apenas ter acesso aos documentos multimídia, mas também obter semântica destes documentos, de modo que a capacidade de encontrar um conteúdo específico em grandes coleções de documentos textuais e não textuais é fundamental. Nessas grandes escalas, sistemas de informação multimídia de recuperação devem contar com a capacidade de executar a busca por semelhança de forma eficiente. No entanto, documentos multimídia são muitas vezes representados por descritores multimídia representados por vetores de alta dimensionalidade, ou por outras representações complexas em espaços métricos. Fornecer a possibilidade de uma busca por similaridade eficiente para esse tipo de dados é extremamente desafiador. Neste projeto, vamos explorar uma das famílias mais citado de soluções para a busca de similaridade, o Hashing Sensível à Localidade (LSH - Locality-sensitive Hashing em inglês), que se baseia na criação de funções de hash que atribuem, com maior probabilidade, a mesma chave para os dados que são semelhantes. O LSH está disponível apenas para um punhado funções de distância, mas, quando disponíveis, verificou-se ser extremamente eficiente para arquiteturas com custo de acesso uniforme aos dados. A maioria das funções LSH existentes são restritas a espaços vetoriais. Propomos dois métodos novos para o LSH, generalizando-o para espaços métricos quaisquer utilizando particionamento métrico (centróides aleatórios e k-medoids). Apresentamos uma comparação com os métodos LSH bem estabelecidos em espaços vetoriais e com os últimos concorrentes novos métodos para espaços métricos. Desenvolvemos uma modelagem teórica do comportamento probalístico dos algoritmos propostos e demonstramos algumas relações e limitantes para a probabilidade de colisão de hash. Dentre os algoritmos propostos para generelizar LSH para espaços métricos, esse desenvolvimento teórico é novo. Embora o problema seja muito desafiador, nossos resultados demonstram que ela pode ser atacado com sucesso. Esta dissertação apresentará os desenvolvimentos do método, a formulação teórica e a discussão experimental dos métodos propostos / Abstract: The increasing availability of multimedia content poses a challenge for information retrieval researchers. Users want not only have access to multimedia documents, but also make sense of them --- the ability of finding specific content in extremely large collections of textual and non-textual documents is paramount. At such large scales, Multimedia Information Retrieval systems must rely on the ability to perform search by similarity efficiently. However, Multimedia Documents are often represented by high-dimensional feature vectors, or by other complex representations in metric spaces. Providing efficient similarity search for that kind of data is extremely challenging. In this project, we explore one of the most cited family of solutions for similarity search, the Locality-Sensitive Hashing (LSH), which is based upon the creation of hashing functions which assign, with higher probability, the same key for data that are similar. LSH is available only for a handful distance functions, but, where available, it has been found to be extremely efficient for architectures with uniform access cost to the data. Most existing LSH functions are restricted to vector spaces. We propose two novel LSH methods (VoronoiLSH and VoronoiPlex LSH) for generic metric spaces based on metric hyperplane partitioning (random centroids and K-medoids). We present a comparison with well-established LSH methods in vector spaces and with recent competing new methods for metric spaces. We develop a theoretical probabilistic modeling of the behavior of the proposed algorithms and show some relations and bounds for the probability of hash collision. Among the algorithms proposed for generalizing LSH for metric spaces, this theoretical development is new. Although the problem is very challenging, our results demonstrate that it can be successfully tackled. This dissertation will present the developments of the method, theoretical and experimental discussion and reasoning of the methods performance / Mestrado / Engenharia de Computação / Mestre em Engenharia Elétrica Método K-vizinho mais próximo Hashing (Computação) Estruturas de dados (Computação) k-nearest neighbor Hashing (Computer science) Data structures (Computer)
160	Exploração de dados multivariados de fontes e extratos de antocianinas ultilizando análise de componentes princiaipais e método do vizinho mais proximo / Exploring multivariate data of sources and extracts of anthocyanins using principal components analysis and method of nearest neighbor Favaro, Martha Maria Andreotti, 1981- 20 August 2018 (has links) Orientador: Adriana Vitorino Rossi / Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Química / Made available in DSpace on 2018-08-20T02:46:28Z (GMT). No. of bitstreams: 1 Favaro_MarthaMariaAndreotti_D.pdf: 3734314 bytes, checksum: 08002efe51b2f18e9a942c3b818270b7 (MD5) Previous issue date: 2012 / Resumo: Antocianinas (ACYS) são corantes naturais responsáveis pela coloração de frutas, hortaliças, flores e grãos. Novas perspectivas de usos de antocianinas em diversos segmentos industriais estimulam estudos analíticos para sistematizar a identificação e a classificação de fontes e extratos desses corantes. Neste trabalho foram utilizadas fontes de ACYS como frutas típicas brasileiras: AMORA (Morus nigra), amora preta (Rubus sp.), jabuticaba (Myrciaria cauliflora), jambolão (Syzygium cumini), jussara (Euterpe edulis Mart.), morango (Fragaria x ananassa Duch) e uva (Vitis vinífera e Vitis vinífera L. Brasil); hortaliças: alface roxa (Lactuca sativa), berinjela (Solanum melongena), cebola roxa (Allium cepa), rabanete (Raphanus sativus), repolho roxo (Brassica oleraceae) e flores: beijo-turco (Impatiens walleriana), gerânio (Pelargonium hortorum e Pelargonium peltatum L.), hibisco (Hibiscus sinensis e Hibiscus syriacus) e hortênsia (Hydrangea macrophylla). A literatura descreve diversas técnicas para análise de ACYS em vegetais e seus extratos, com destaque para cromatografia líquida de alta eficiência (HPLC), espectrometria de massas (MS) e espectrofotometria (UV-Vis), sendo que todas elas foram aplicadas neste trabalho, incluindo-se espectrofotometria de reflectância e a técnica de eletromigração em capilares cromatografia eletrocinética micelar (MEKC). As ferramentas quimiométricas utilizadas no tratamento dos dados foram análise de componentes principais (PCA) e método do vizinho mais próximo (KNN). Os modelos quimiométricos de classificação obtidos apresentaram-se robustos com erros de previsão de menos de 30 % sendo possível identificar as fontes de ACYS, o solvente extrator, a idade dos extratos e dados sobre sua estabilidade e condições de armazenamento. Os resultados apontaram que dados obtidos de técnicas analíticas simples como espectrofotometria de absorção e sem necessidade de preparo de amostra como reflectância difusa na região do visível são comparáveis a resultados de técnicas mais sofisticadas e caras como HPLC e MEKC e até superam o potencial de algumas informações obtidas por MS / Abstract: Anthocyanins (ACYS) are natural dyes responsible for color in fruits, vegetables, flowers and grains. New perspectives for use of anthocyanins in various industries stimulate analytical studies to systematize the identification and classification of sources and extracts of these dyes. In this work, typical Brazilian fruits: mulberry (Morus nigra), blackberry (Rubus sp), jaboticaba (Myrciaria cauliflora), jambolan (Syzygium cumini), jussara fruit (Euterpe edulis Mart.), strawberry (Fragaria x ananassa Duch) and grapes (Vitis vinifera and Vitis vinifera L. 'Brazil'); vegetables: red lettuce (Lactuca sativa), eggplant (Solanum melongena), purple onion (Allium cepa), radish (Raphanus sativus), red cabbage (Brassica oleracea) and flowers, Buzy Lizzie (Impatiens walleriana), geranium (Pelargonium hortorum and Pelargonium peltatum L.), hibiscus (Hibiscus sinensis and Hibiscus syriacus) and hydrangea (Hydrangea macrophylla) were used as sources of ACYS. The literature describes several techniques for analyzing ACYS in vegetables and their extracts, with emphasis on high performance liquid chromatography (HPLC), mass spectrometry (MS) and spectrophotometry (UV-VIS). All of these techniques were applied in this work, including reflectance spectrophotometry and micellar electrokinetic chromatography (MEKC) which is one of the capillary electromigration techniques. The chemometric tools used in data handling were the principal component analysis (PCA) and the K-nearest neighbor method (KNN). The chemometric classification models obtained are robust with predict errors of less than 30 %. It is possible to identify the sources of ACYS, the extractor solvent, the age of the extracts, their stability and storage conditions. The results show that data obtained from simple analytical techniques such as absorption spectroscopy and diffuse reflectance in the visible region (sample preparation is not needed) are comparable to results of those obtained from sophisticated and expensive techniques such as HPLC and MEKC. These techniques also surpass the information obtained by MS / Doutorado / Quimica Analitica / Doutor em Ciências Antocianinas Quimiometria Análise de componentes principais Método K-vizinho mais próximo Anthocyanins Chemometrics Principal componet analysis K-nearest neighbor

Search results