Spelling suggestions: "subject:"multivariate data"" "subject:"multivariate mata""
101 |
Autorregulando e autodeterminando: duas formas de alunos de pós-graduação aprenderem a aprender contabilidade / Self-regulation and self-determined strategies - two ways graduate students learn to learn accountingRaimundo Nonato Lima Filho 01 April 2016 (has links)
O uso assertivo e eficiente das estratégias de aprendizagem depende, muitas vezes, da compreensão e consideração de aspectos psicológicos e motivacionais. O adequado emprego de estratégias de aprendizagem se reflete no desempenho acadêmico, no domínio de construtos e modelos e no amadurecimento crítico e científico. A presente tese defende que há uma relação entre as estratégias de aprendizagem autorregulada e as estratégias de aprendizagem autodeterminada predominantes em alunos de mestrado e doutorado em Contabilidade. O estudo se justifica, porquanto, porque além de inaugurar uma linha de pesquisa ainda inédita no contexto da Contabilidade Humana, seus resultados destacam um original entendimento da relação da aprendizagem com a regulação e a motivação pessoal. Tem como objetivo principal apresentar diagnóstico, dimensões e correlações das estratégias de aprendizagem autorregulada e aprendizagem autodeterminada de alunos de programas de pós-graduação stricto sensu em Contabilidade no Brasil. Participaram do survey 516 respondentes, sendo 383 mestrandos e 133 doutorandos. Foram aplicados dois instrumentos psicométricos: Self-Regulated Learning Strategies (SRLS) e Motivated Strategies for Learning Questionnaire (MSLQ). O modelo operacional de pesquisa delineou a formulação de oito hipóteses, sendo que a primeira delas sustenta a defesa da tese, enquanto as demais defendem a influência das variáveis idade, gênero, tipo de curso, estágio no curso, tipo de instituição de graduação, nota do curso atribuída pela Capes e graus de instrução dos pais nos níveis de Self-Regulated Learning (SRL) e Self-Determination Theory (SDT). A partir da análise multivariada dos dados, os resultados corroboraram a tese e a influência do gênero no nível de SRL. A metaconclusão desta tese ratifica os estudos referenciados, confirmando que a aprendizagem pode ser dominada e controlada pelo indivíduo, ao se adotar estratégias individuais de regulação e motivação. Uma importante contribuição desta pesquisa consiste em oferecer conclusões empíricas que podem ajudar docentes, discentes, pesquisadores, instituições de ensino e programas de pós-graduação a compreender mais sistematicamente os aspectos da aprendizagem autorregulada e da aprendizagem autodeterminada que caracterizam o aluno de Contabilidade. Limitações importantes deste estudo podem ser vistas como oportunidades para pesquisas futuras: a amostra envolve um público específico, a pesquisa survey pode apresentar vieses de método comum e a baixa participação de alunos de mestrado profissional. Estudos futuros poderão adotar outras estratégias metodológicas e/ou envolver amostras mais diversificadas ou em maior lastro temporal / Assertive and efficient use of learning strategies often depends of the understanding and consideration of psychological and motivational aspects. Appropriate use of learning strategies is reflected in the academic performance, in the appropriation of constructs and models and in the critical and scientific maturity. This dissertation argues that there is a relationship between predominating self-regulated learning strategies and self-determined learning strategies in accounting master\'s and doctorate students. The study can be justified in view of, apart from inaugurating a research line within the context of Human Accounting, their results highlight a unique understanding of the relationship of learning with regulation and personal motivation. Its main goal is to present a diagnosis, the dimensions and the correlations of self-regulated learning and self-determined learning strategies of graduate Accounting students in Brazil. Five hundred and sixteen respondents participated in the survey, comprising 383 master\'s and 133 doctoral students. Two psychometric instruments were applied: the Self-Regulated Learning Strategies (SRLS) and the Motivated Strategies for Learning Questionnaire (MSLQ). The operating model research outlined the formulation of eight hypotheses, being that the first of them supports the thesis, while the others investigate the influence in the levels of Self-Regulated Learning (SRL) and Self-Determination Theory (SDT) of age, gender, type of course, stage in the course, type of undergraduate institution (public or private), grade attributed by Capes to the course and parental formal education degrees. From the multivariate data analysis,the results support the thesis and that gender has influence in the SRL level. The metaconclusion of this thesis confirms the referenced studies, estating that learning can be dominated and controlled by individuals through the adoption of individual strategies of regulation and motivation. An important contribution of this study is to offer empirical conclusions that might help teachers, students themselves, researchers, educational institutions and graduate programs to understand more systematically the aspects of self-regulated learning and self-determined learning that characterize the Accounting graduate students. The major limitations of the present study can be seen as opportunities for future researches: the sample involves a particular audience, research can provide common methods bias and the low participation of professional master\'s degree students in the sample. Future studies can take further methodological strategies and/or involve more diversified samples or consider longitudinal approaches
|
102 |
La visualisation d’information pour les données massives : une approche par l’abstraction de données / Information visualization for big data : a data abstraction approachSansen, Joris 04 July 2017 (has links)
L’évolution et la démocratisation des technologies ont engendré une véritable explosion de l’information et notre capacité à générer des données et le besoin de les analyser n’a jamais été aussi important. Pourtant, les problématiques soulevées par l’accumulation de données (stockage, temps de traitement, hétérogénéité, vitesse de captation/génération, etc. ) sont d’autant plus fortes que les données sont massives, complexes et variées. La représentation de l’information, de part sa capacité à synthétiser et à condenser des données, se constitue naturellement comme une approche pour les analyser mais ne résout pas pour autant ces problèmes. En effet, les techniques classiques de visualisation sont rarement adaptées pour gérer et traiter cette masse d’informations. De plus,les problèmes que soulèvent le stockage et le temps de traitement se répercutent sur le système d’analyse avec par exemple, la distanciation de plus en plus forte entre la donnée et l’utilisateur : le lieu où elle sera stockée et traitée et l’interface utilisateur servant à l’analyse. Dans cette thèse nous nous intéressons à ces problématiques et plus particulièrement à l’adaptation des techniques de visualisation d’informations pour les données massives. Pour cela, nous nous intéressons tout d’abord à l’information de relation entre éléments, comment est-elle véhiculée et comment améliorer cette transmission dans le contexte de données hiérarchisées. Ensuite, nous nous intéressons à des données multivariées,dont la complexité à un impact sur les calculs possibles. Enfin, nous présentons les approches mises en oeuvre pour rendre nos méthodes compatibles avec les données massives. / The evolution and spread of technologies have led to a real explosion of information and our capacity to generate data and our need to analyze them have never been this strong. Still, the problems raised by such accumulation (storage, computation delays, diversity, speed of gathering/generation, etc. ) is as strong as the data are big, complex and varied. Information visualization,by its ability to summarize and abridge data was naturally established as appropriate approach. However, it does not solve the problem raised by Big Data. Actually, classical visualization techniques are rarely designed to handle such mass of information. Moreover, the problems raised by data storage and computation time have repercussions on the analysis system. For example,the increasing distance between the data and the analyst : the place where the data is stored and the place where the user will perform the analyses arerarely close. In this thesis, we focused on these issues and more particularly on adapting the information visualization techniques for Big Data. First of all focus on relational data : how does the existence of a relation between entity istransmitted and how to improve this transmission for hierarchical data. Then,we focus on multi-variate data and how to handle their complexity for the required computations. Finally, we present the methods we designed to make our techniques compatible with Big Data.
|
103 |
New tools for sample preparation and instrumental analysis of dioxins in environmental samplesDo, Lan January 2013 (has links)
Polychlorinated dibenzo-p-dioxins (PCDDs) and dibenzofurans (PCDFs), two groups of structurally related chlorinated aromatic hydrocarbons, are of high concern due to their global distribution and extreme toxicity. Since they occur at very low levels, their analysis is complex, challenging and hence there is a need for efficient, reliable and rapid alternative analytical methods. Developing such methods was the aim of the project this thesis is based upon. During the first years of the project the focus was on the first parts of the analytical chain (extraction and clean-up). A selective pressurized liquid extraction (SPLE) procedure was developed, involving in-cell clean-up to remove bulk co-extracted matrix components from sample extracts. It was further streamlined by employing a modular pressurized liquid extraction (M-PLE) system, which simultaneously extracts, cleans up and isolates planar PCDD/Fs in a single step. Both methods were validated using a wide range of soil, sediment and sludge reference materials. Using dichloromethane/n-heptane (DCM/Hp; 1/1, v/v) as a solvent, results statistically equivalent to or higher than the reference values were obtained, while an alternative, less harmful non-chlorinated solvent mixture - diethyl ether/n-heptane (DEE/Hp; 1/2, v/v) – yielded data equivalent to those values. Later, the focus of the work shifted to the final instrumental analysis. Six gas chromatography (GC) phases were evaluated with respect to their chromatographic separation of not just the 17 most toxic congeners (2,3,7,8-substituted PCDD/Fs), but all 136 tetra- to octaCDD/Fs. Three novel ionic liquid columns performed much better than previously tested commercially available columns. Supelco SLB-IL61 offered the best overall performance, successfully resolving 106 out of the 136 compounds, and 16 out of the 17 2,3,7,8-substituted PCDD/Fs. Another ionic liquid (SLB-IL111) column provided complementary separation. Together, the two columns separated 128 congeners. The work also included characterization of 22 GC columns’ selectivity and solute-stationary phase interactions. The selectivities were mapped using Principal Component Analysis (PCA) of all 136 PCDD/F’s retention times on the columns, while the interactions were probed by analyzing both the retention times and the substances’ physicochemical properties.
|
104 |
Multivariate non-invasive measurements of skin disordersNyström, Josefina January 2006 (has links)
The present thesis proposes new methods for obtaining objective and accurate diagnoses in modern healthcare. Non-invasive techniques have been used to examine or diagnose three different medical conditions, namely neuropathy among diabetics, radiotherapy induced erythema (skin redness) among breast cancer patients and diagnoses of cutaneous malignant melanoma. The techniques used were Near-InfraRed spectroscopy (NIR), Multi Frequency Bio Impedance Analysis of whole body (MFBIA-body), Laser Doppler Imaging (LDI) and Digital Colour Photography (DCP). The neuropathy for diabetics was studied in papers I and II. The first study was performed on diabetics and control subjects of both genders. A separation was seen between males and females and therefore the data had to be divided in order to obtain good models. NIR spectroscopy was shown to be a viable technique for measuring neuropathy once the division according to gender was made. The second study on diabetics, where MFBIA-body was added to the analysis, was performed on males exclusively. Principal component analysis showed that healthy reference subjects tend to separate from diabetics. Also, diabetics with severe neuropathy separate from persons less affected. The preliminary study presented in paper III was performed on breast cancer patients in order to investigate if NIR, LDI and DCP were able to detect radiotherapy induced erythema. The promising results in the preliminary study motivated a new and larger study. This study, presented in papers IV and V, intended to investigate the measurement techniques further but also to examine the effect that two different skin lotions, Essex and Aloe vera have on the development of erythema. The Wilcoxon signed rank sum test showed that DCP and NIR could detect erythema, which is developed during one week of radiation treatment. LDI was able to detect erythema developed during two weeks of treatment. None of the techniques could detect any differences between the two lotions regarding the development of erythema. The use of NIR to diagnose cutaneous malignant melanoma is presented as unpublished results in this thesis. This study gave promising but inconclusive results. NIR could be of interest for future development of instrumentation for diagnosis of skin cancer.
|
105 |
Netzorientierte Fuzzy-Pattern-Klassifikation nichtkonvexer ObjektmengenmorphologienHempel, Arne-Jens 06 September 2011 (has links)
Die Arbeit ordnet sich in das Gebiet der unscharfen Klassifikation ein und stellt im Detail eine Weiterführung der Forschung zur Fuzzy-Pattern-Klassifikation dar. Es handelt sich dabei um eine leistungsfähige systemtheoretische Methodik zur klassifikatorischen Modellierung komplexer, hochdimensionaler, technischer oder nichttechnischer Systeme auf der Basis von metrischen Messgrößen und/oder nichtmetrischen Experten-Bewertungen. Die Beschreibung der Unschärfe von Daten, Zuständen und Strukturen wird hierbei durch einen einheitlichen Typ einer Zugehörigkeitsfunktion des Potentialtyps realisiert. Ziel der Betrachtungen ist die weiterführende Nutzung des bestehenden Klassenmodells zur unscharfen Beschreibung nichtkonvexer Objektmengenmorphologien. Ausgehend vom automatischen datengetriebenen Aufbau der konvexen Klassenbeschreibung, deren vorteilhaften Eigenschaften sowie Defiziten wird im Rahmen der Arbeit eine Methodik vorgestellt, die eine Modellierung beliebiger Objektmengenmorphologien erlaubt, ohne das bestehende Klassifikationskonzept zu verlassen.
Kerngedanken des Vorgehens sind:
1.) Die Aggregation von Fuzzy-Pattern-Klassen auf der Basis so genannter komplementärer Objekte.
2.) Die sequentielle Verknüpfung von Fuzzy-Pattern-Klassen und komplementären Klassen im Sinne einer unscharfen Mengendifferenz.
3.) Die Strukturierung des Verknüpfungsprozesses durch die Clusteranalyse von Komplementärobjektmengen und damit der Verwendung von Konfigurationen aus komplementären Fuzzy-Pattern-Klassen.
Das dabei gewonnene nichtkonvexe Fuzzy-Klassifikationsmodell impliziert eine Vernetzung von Fuzzy-Klassifikatoren in Form von Klassifikatorbäumen. Im Ergebnis entstehen Klassifikatorstrukturen mit hoher Transparenz, die - neben der üblichen zustandsorientierten klassifikatorischen Beschreibung in den Einzelklassifikatoren - zusätzliche Informationen über den Ablauf der Klassifikationsentscheidungen erfassen. Der rechnergestützte Entwurf und die Eigenschaften der entstehenden Klassifikatorstruktur werden an akademischen Teststrukturen und realen Daten demonstriert. Die im Rahmen der Arbeit dargestellte Methodik wird in Zusammenhang mit dem Fuzzy-Pattern-Klassifikationskonzept realisiert, ist jedoch aufgrund ihrer Allgemeingültigkeit auf eine beliebige datenbasierte konvexe Klassenbeschreibung übertragbar. / This work contributes to the field of fuzzy classification. It dedicates itself to the subject of "Fuzzy-Pattern-Classification", a versatile method applied for classificatory modeling of complex, high dimensional systems based on metric and nonmetric data, i.e. sensor readings or expert statements. Uncertainties of data, their associated morphology and therewith classificatory states are incorporated in terms of fuzziness using a uniform and convex type of membership function.
Based on the properties of the already existing convex Fuzzy-Pattern-Class models and their automatic, data-driven setup a method for modeling nonconvex relations without leaving the present classification concept is introduced.
Key points of the elaborated approach are:
1.) The aggregation of Fuzzy-Pattern-Classes with the help of so called complementary objects.
2.) The sequential combination of Fuzzy-Pattern-Classes and complementary Fuzzy-Pattern-Classes in terms of a fuzzy set difference.
3.) A clustering based structuring of complementary Fuzzy-Pattern-Classes and therewith a structuring of the combination process.
A result of this structuring process is the representation of the resulting nonconvex fuzzy classification model in terms of a classifier tree. Such a nonconvex Fuzzy-Classifier features high transparency, which allows a structured understanding of the classificatory decision in working mode.
Both the automatic data-based design as well as properties of such tree-like fuzzy classifiers will be illustrated with the help of academic and real word data.
Even though the proposed method is introduced for a specific type of membership function, the underlying idea may be applied to any convex membership function.
|
106 |
Monitoring Kraft Recovery Boiler Fouling by Multivariate Data AnalysisEdberg, Alexandra January 2018 (has links)
This work deals with fouling in the recovery boiler at Montes del Plata, Uruguay. Multivariate data analysis has been used to analyze the large amount of data that was available in order to investigate how different parameters affect the fouling problems. Principal Component Analysis (PCA) and Partial Least Square Projection (PLS) have in this work been used. PCA has been used to compare average values between time periods with high and low fouling problems while PLS has been used to study the correlation structures between the variables and consequently give an indication of which parameters that might be changed to improve the availability of the boiler. The results show that this recovery boiler tends to have problems with fouling that might depend on the distribution of air, the black liquor pressure or the dry solid content of the black liquor. The results also show that multivariate data analysis is a powerful tool for analyzing these types of fouling problems. / Detta arbete handlar om inkruster i sodapannan pa Montes del Plata, Uruguay. Multivariat dataanalys har anvands for att analysera den stora datamangd som fanns tillganglig for att undersoka hur olika parametrar paverkar inkrusterproblemen. Principal·· Component Analysis (PCA) och Partial Least Square Projection (PLS) har i detta jobb anvants. PCA har anvants for att jamfora medelvarden mellan tidsperioder med hoga och laga inkrusterproblem medan PLS har anvants for att studera korrelationen mellan variablema och darmed ge en indikation pa vilka parametrar som kan tankas att andras for att forbattra tillgangligheten pa sodapannan. Resultaten visar att sodapannan tenderar att ha problem med inkruster som kan hero pa fdrdelningen av luft, pa svartlutens tryck eller pa torrhalten i svartluten. Resultaten visar ocksa att multivariat dataanalys ar ett anvandbart verktyg for att analysera dessa typer av inkrusterproblem.
|
107 |
Enterprise Business Alignment Using Quality Function Deployment, Multivariate Data Analysis And Business Modeling ToolsGammoh, Diala 01 January 2010 (has links)
This dissertation proposes two novel ideas to enhance the business strategy alignment to customer needs. The proposed business alignment clock is a new illustration to the relationships between customer requirements, business strategies, capabilities and processes. To line up the clock and reach the needed alignment for the enterprise, a proposed clock mechanism is introduced. The mechanism integrates the Enterprise Business Architecture (EBA) with the House of Quality (HoQ). The relationship matrix inside the body of the house is defined using multivariate data analysis techniques to accurately measure the strength of the relationships rather than defining them subjectively. A statistical tool, multivariate data analysis, can be used to overcome the ambiguity in quantifying the relationships in the house of quality matrix. The framework is proposed in the basic conceptual model context of the EBA showing different levels of the enterprise architecture; the goals, the capabilities and the value stream architecture components. In the proposed framework, the goals and the capabilities are inputs to two houses of quality, in which the alignment between customer needs and business goals, and the alignment between business goals and capabilities are checked in the first house and the second house, respectively. The alignment between the business capabilities and the architecture components (workflows, events and environment) is checked in a third HoQ using the performance indicators of the value stream architecture components, which may result in infrastructure expansion, software development or process improvement to reach the needed alignment by the enterprise. The value of the model was demonstrated using the Accreditation Board of Engineering and Technology (ABET) process at the Industrial Engineering and Management Systems department at the University of Central Florida. The assessment of ABET criteria involves an evaluation of the extent to which the program outcomes are being achieved and results in decisions and actions to improve the Industrial Engineering program at the University of Central Florida. The proposed framework increases the accuracy of measuring the extent to which the program learning outcomes have been achieved at the department. The process of continuous alignment between the educational objectives and customer needs becomes more vital by the rapid change of customer requirements that are obtained from both internal and external constituents (students, faculty, alumni, and employers in the first place).
|
108 |
Novel chemometric proposals for advanced multivariate data analysis, processing and interpretationVitale, Raffaele 03 November 2017 (has links)
The present Ph.D. thesis, primarily conceived to support and reinforce the relation between academic and industrial worlds, was developed in collaboration with Shell Global Solutions (Amsterdam, The Netherlands) in the endeavour of applying and possibly extending well-established latent variable-based approaches (i.e. Principal Component Analysis - PCA - Partial Least Squares regression - PLS - or Partial Least Squares Discriminant Analysis - PLSDA) for complex problem solving not only in the fields of manufacturing troubleshooting and optimisation, but also in the wider environment of multivariate data analysis. To this end, novel efficient algorithmic solutions are proposed throughout all chapters to address very disparate tasks, from calibration transfer in spectroscopy to real-time modelling of streaming flows of data. The manuscript is divided into the following six parts, focused on various topics of interest:
Part I - Preface, where an overview of this research work, its main aims and justification is given together with a brief introduction on PCA, PLS and PLSDA;
Part II - On kernel-based extensions of PCA, PLS and PLSDA, where the potential of kernel techniques, possibly coupled to specific variants of the recently rediscovered pseudo-sample projection, formulated by the English statistician John C. Gower, is explored and their performance compared to that of more classical methodologies in four different applications scenarios: segmentation of Red-Green-Blue (RGB) images, discrimination of on-/off-specification batch runs, monitoring of batch processes and analysis of mixture designs of experiments;
Part III - On the selection of the number of factors in PCA by permutation testing, where an extensive guideline on how to accomplish the selection of PCA components by permutation testing is provided through the comprehensive illustration of an original algorithmic procedure implemented for such a purpose;
Part IV - On modelling common and distinctive sources of variability in multi-set data analysis, where several practical aspects of two-block common and distinctive component analysis (carried out by methods like Simultaneous Component Analysis - SCA - DIStinctive and COmmon Simultaneous Component Analysis - DISCO-SCA - Adapted Generalised Singular Value Decomposition - Adapted GSVD - ECO-POWER, Canonical Correlation Analysis - CCA - and 2-block Orthogonal Projections to Latent Structures - O2PLS) are discussed, a new computational strategy for determining the number of common factors underlying two data matrices sharing the same row- or column-dimension is described, and two innovative approaches for calibration transfer between near-infrared spectrometers are presented;
Part V - On the on-the-fly processing and modelling of continuous high-dimensional data streams, where a novel software system for rational handling of multi-channel measurements recorded in real time, the On-The-Fly Processing (OTFP) tool, is designed;
Part VI - Epilogue, where final conclusions are drawn, future perspectives are delineated, and annexes are included. / La presente tesis doctoral, concebida principalmente para apoyar y reforzar la relación entre la academia y la industria, se desarrolló en colaboración con Shell Global Solutions (Amsterdam, Países Bajos) en el esfuerzo de aplicar y posiblemente extender los enfoques ya consolidados basados en variables latentes (es decir, Análisis de Componentes Principales - PCA - Regresión en Mínimos Cuadrados Parciales - PLS - o PLS discriminante - PLSDA) para la resolución de problemas complejos no sólo en los campos de mejora y optimización de procesos, sino también en el entorno más amplio del análisis de datos multivariados. Con este fin, en todos los capítulos proponemos nuevas soluciones algorítmicas eficientes para abordar tareas dispares, desde la transferencia de calibración en espectroscopia hasta el modelado en tiempo real de flujos de datos.
El manuscrito se divide en las seis partes siguientes, centradas en diversos temas de interés:
Parte I - Prefacio, donde presentamos un resumen de este trabajo de investigación, damos sus principales objetivos y justificaciones junto con una breve introducción sobre PCA, PLS y PLSDA;
Parte II - Sobre las extensiones basadas en kernels de PCA, PLS y PLSDA, donde presentamos el potencial de las técnicas de kernel, eventualmente acopladas a variantes específicas de la recién redescubierta proyección de pseudo-muestras, formulada por el estadista inglés John C. Gower, y comparamos su rendimiento respecto a metodologías más clásicas en cuatro aplicaciones a escenarios diferentes: segmentación de imágenes Rojo-Verde-Azul (RGB), discriminación y monitorización de procesos por lotes y análisis de diseños de experimentos de mezclas;
Parte III - Sobre la selección del número de factores en el PCA por pruebas de permutación, donde aportamos una guía extensa sobre cómo conseguir la selección de componentes de PCA mediante pruebas de permutación y una ilustración completa de un procedimiento algorítmico original implementado para tal fin;
Parte IV - Sobre la modelización de fuentes de variabilidad común y distintiva en el análisis de datos multi-conjunto, donde discutimos varios aspectos prácticos del análisis de componentes comunes y distintivos de dos bloques de datos (realizado por métodos como el Análisis Simultáneo de Componentes - SCA - Análisis Simultáneo de Componentes Distintivos y Comunes - DISCO-SCA - Descomposición Adaptada Generalizada de Valores Singulares - Adapted GSVD - ECO-POWER, Análisis de Correlaciones Canónicas - CCA - y Proyecciones Ortogonales de 2 conjuntos a Estructuras Latentes - O2PLS). Presentamos a su vez una nueva estrategia computacional para determinar el número de factores comunes subyacentes a dos matrices de datos que comparten la misma dimensión de fila o columna y dos planteamientos novedosos para la transferencia de calibración entre espectrómetros de infrarrojo cercano;
Parte V - Sobre el procesamiento y la modelización en tiempo real de flujos de datos de alta dimensión, donde diseñamos la herramienta de Procesamiento en Tiempo Real (OTFP), un nuevo sistema de manejo racional de mediciones multi-canal registradas en tiempo real;
Parte VI - Epílogo, donde presentamos las conclusiones finales, delimitamos las perspectivas futuras, e incluimos los anexos. / La present tesi doctoral, concebuda principalment per a recolzar i reforçar la relació entre l'acadèmia i la indústria, es va desenvolupar en col·laboració amb Shell Global Solutions (Amsterdam, Països Baixos) amb l'esforç d'aplicar i possiblement estendre els enfocaments ja consolidats basats en variables latents (és a dir, Anàlisi de Components Principals - PCA - Regressió en Mínims Quadrats Parcials - PLS - o PLS discriminant - PLSDA) per a la resolució de problemes complexos no solament en els camps de la millora i optimització de processos, sinó també en l'entorn més ampli de l'anàlisi de dades multivariades. A aquest efecte, en tots els capítols proposem noves solucions algorítmiques eficients per a abordar tasques dispars, des de la transferència de calibratge en espectroscopia fins al modelatge en temps real de fluxos de dades.
El manuscrit es divideix en les sis parts següents, centrades en diversos temes d'interès:
Part I - Prefaci, on presentem un resum d'aquest treball de recerca, es donen els seus principals objectius i justificacions juntament amb una breu introducció sobre PCA, PLS i PLSDA;
Part II - Sobre les extensions basades en kernels de PCA, PLS i PLSDA, on presentem el potencial de les tècniques de kernel, eventualment acoblades a variants específiques de la recentment redescoberta projecció de pseudo-mostres, formulada per l'estadista anglés John C. Gower, i comparem el seu rendiment respecte a metodologies més clàssiques en quatre aplicacions a escenaris diferents: segmentació d'imatges Roig-Verd-Blau (RGB), discriminació i monitorització de processos per lots i anàlisi de dissenys d'experiments de mescles;
Part III - Sobre la selecció del nombre de factors en el PCA per proves de permutació, on aportem una guia extensa sobre com aconseguir la selecció de components de PCA a través de proves de permutació i una il·lustració completa d'un procediment algorítmic original implementat per a la finalitat esmentada;
Part IV - Sobre la modelització de fonts de variabilitat comuna i distintiva en l'anàlisi de dades multi-conjunt, on discutim diversos aspectes pràctics de l'anàlisis de components comuns i distintius de dos blocs de dades (realitzat per mètodes com l'Anàlisi Simultània de Components - SCA - Anàlisi Simultània de Components Distintius i Comuns - DISCO-SCA - Descomposició Adaptada Generalitzada en Valors Singulars - Adapted GSVD - ECO-POWER, Anàlisi de Correlacions Canòniques - CCA - i Projeccions Ortogonals de 2 blocs a Estructures Latents - O2PLS). Presentem al mateix temps una nova estratègia computacional per a determinar el nombre de factors comuns subjacents a dues matrius de dades que comparteixen la mateixa dimensió de fila o columna, i dos plantejaments nous per a la transferència de calibratge entre espectròmetres d'infraroig proper;
Part V - Sobre el processament i la modelització en temps real de fluxos de dades d'alta dimensió, on dissenyem l'eina de Processament en Temps Real (OTFP), un nou sistema de tractament racional de mesures multi-canal registrades en temps real;
Part VI - Epíleg, on presentem les conclusions finals, delimitem les perspectives futures, i incloem annexos. / Vitale, R. (2017). Novel chemometric proposals for advanced multivariate data analysis, processing and interpretation [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/90442
|
109 |
Quality by Design through multivariate latent structuresPalací López, Daniel Gonzalo 14 January 2019 (has links)
La presente tesis doctoral surge ante la necesidad creciente por parte de la mayoría de empresas, y en especial (pero no únicamente) aquellas dentro de los sectores farmacéu-tico, químico, alimentación y bioprocesos, de aumentar la flexibilidad en su rango ope-rativo para reducir los costes de fabricación, manteniendo o mejorando la calidad del producto final obtenido. Para ello, esta tesis se centra en la aplicación de los conceptos del Quality by Design para la aplicación y extensión de distintas metodologías ya exis-tentes y el desarrollo de nuevos algoritmos que permitan la implementación de herra-mientas adecuadas para el diseño de experimentos, el análisis multivariante de datos y la optimización de procesos en el ámbito del diseño de mezclas, pero sin limitarse ex-clusivamente a este tipo de problemas.
Parte I - Prefacio, donde se presenta un resumen del trabajo de investigación realiza-do y los objetivos principales que pretende abordar y su justificación, así como una introducción a los conceptos más importantes relativos a los temas tratados en partes posteriores de la tesis, tales como el diseño de experimentos o diversas herramientas estadísticas de análisis multivariado.
Parte II - Optimización en el diseño de mezclas, donde se lleva a cabo una recapitu-lación de las diversas herramientas existentes para el diseño de experimentos y análisis de datos por medios tradicionales relativos al diseño de mezclas, así como de algunas herramientas basadas en variables latentes, tales como la Regresión en Mínimos Cua-drados Parciales (PLS). En esta parte de la tesis también se propone una extensión del PLS basada en kernels para el análisis de datos de diseños de mezclas, y se hace una comparativa de las distintas metodologías presentadas. Finalmente, se incluye una breve presentación del programa MiDAs, desarrollado con la finalidad de ofrecer a sus usuarios la posibilidad de comparar de forma sencilla diversas metodologías para el diseño de experimentos y análisis de datos para problemas de mezclas.
Parte III - Espacio de diseño y optimización a través del espacio latente, donde se aborda el problema fundamental dentro de la filosofía del Quality by Design asociado a la definición del llamado 'espacio de diseño', que comprendería todo el conjunto de posibles combinaciones de condiciones de proceso, materias primas, etc. que garanti-zan la obtención de un producto con la calidad deseada. En esta parte también se trata el problema de la definición del problema de optimización como herramienta para la mejora de la calidad, pero también para la exploración y flexibilización de los procesos productivos, con el objeto de definir un procedimiento eficiente y robusto de optimiza-ción que se adapte a los diversos problemas que exigen recurrir a dicha optimización.
Parte IV - Epílogo, donde se presentan las conclusiones finales, la consecución de objetivos y posibles líneas futuras de investigación. En esta parte se incluyen además los anexos. / Aquesta tesi doctoral sorgeix davant la necessitat creixent per part de la majoria d'em-preses, i especialment (però no únicament) d'aquelles dins dels sectors farmacèutic, químic, alimentari i de bioprocessos, d'augmentar la flexibilitat en el seu rang operatiu per tal de reduir els costos de fabricació, mantenint o millorant la qualitat del producte final obtingut. La tesi se centra en l'aplicació dels conceptes del Quality by Design per a l'aplicació i extensió de diferents metodologies ja existents i el desenvolupament de nous algorismes que permeten la implementació d'eines adequades per al disseny d'ex-periments, l'anàlisi multivariada de dades i l'optimització de processos en l'àmbit del disseny de mescles, però sense limitar-se exclusivament a aquest tipus de problemes.
Part I- Prefaci, en què es presenta un resum del treball de recerca realitzat i els objec-tius principals que pretén abordar i la seua justificació, així com una introducció als conceptes més importants relatius als temes tractats en parts posteriors de la tesi, com ara el disseny d'experiments o diverses eines estadístiques d'anàlisi multivariada.
Part II - Optimització en el disseny de mescles, on es duu a terme una recapitulació de les diverses eines existents per al disseny d'experiments i anàlisi de dades per mit-jans tradicionals relatius al disseny de mescles, així com d'algunes eines basades en variables latents, tals com la Regressió en Mínims Quadrats Parcials (PLS). En aquesta part de la tesi també es proposa una extensió del PLS basada en kernels per a l'anàlisi de dades de dissenys de mescles, i es fa una comparativa de les diferents metodologies presentades. Finalment, s'inclou una breu presentació del programari MiDAs, que ofe-reix la possibilitat als usuaris de comparar de forma senzilla diverses metodologies per al disseny d'experiments i l'anàlisi de dades per a problemes de mescles.
Part III- Espai de disseny i optimització a través de l'espai latent, on s'aborda el problema fonamental dins de la filosofia del Quality by Design associat a la definició de l'anomenat 'espai de disseny', que comprendria tot el conjunt de possibles combina-cions de condicions de procés, matèries primeres, etc. que garanteixen l'obtenció d'un producte amb la qualitat desitjada. En aquesta part també es tracta el problema de la definició del problema d'optimització com a eina per a la millora de la qualitat, però també per a l'exploració i flexibilització dels processos productius, amb l'objecte de definir un procediment eficient i robust d'optimització que s'adapti als diversos pro-blemes que exigeixen recórrer a aquesta optimització.
Part IV- Epíleg, on es presenten les conclusions finals i la consecució d'objectius i es plantegen possibles línies futures de recerca arran dels resultats de la tesi. En aquesta part s'inclouen a més els annexos. / The present Ph.D. thesis is motivated by the growing need in most companies, and specially (but not solely) those in the pharmaceutical, chemical, food and bioprocess fields, to increase the flexibility in their operating conditions in order to reduce production costs while maintaining or even improving the quality of their products. To this end, this thesis focuses on the application of the concepts of the Quality by Design for the exploitation and development of already existing methodologies, and the development of new algorithms aimed at the proper implementation of tools for the design of experiments, multivariate data analysis and process optimization, specially (but not only) in the context of mixture design.
Part I - Preface, where a summary of the research work done, the main goals it aimed at and their justification, are presented. Some of the most relevant concepts related to the developed work in subsequent chapters are also introduced, such as those regarding design of experiments or latent variable-based multivariate data analysis techniques.
Part II - Mixture design optimization, in which a review of existing mixture design tools for the design of experiments and data analysis via traditional approaches, as well as some latent variable-based techniques, such as Partial Least Squares (PLS), is provided. A kernel-based extension of PLS for mixture design data analysis is also proposed, and the different available methods are compared to each other. Finally, a brief presentation of the software MiDAs is done. MiDAs has been developed in order to provide users with a tool to easily approach mixture design problems for the construction of Designs of Experiments and data analysis with different methods and compare them.
Part III - Design Space and optimization through the latent space, where one of the fundamental issues within the Quality by Design philosophy, the definition of the so-called 'design space' (i.e. the subspace comprised by all possible combinations of process operating conditions, raw materials, etc. that guarantee obtaining a product meeting a required quality standard), is addressed. The problem of properly defining the optimization problem is also tackled, not only as a tool for quality improvement but also when it is to be used for exploration of process flexibilisation purposes, in order to establish an efficient and robust optimization method in accordance with the nature of the different problems that may require such optimization to be resorted to.
Part IV - Epilogue, where final conclusions are drawn, future perspectives suggested, and annexes are included. / Palací López, DG. (2018). Quality by Design through multivariate latent structures [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/115489
|
110 |
Classification de données multivariées multitypes basée sur des modèles de mélange : application à l'étude d'assemblages d'espèces en écologie / Model-based clustering for multivariate and mixed-mode data : application to multi-species spatial ecological dataGeorgescu, Vera 17 December 2010 (has links)
En écologie des populations, les distributions spatiales d'espèces sont étudiées afin d'inférer l'existence de processus sous-jacents, tels que les interactions intra- et interspécifiques et les réponses des espèces à l'hétérogénéité de l'environnement. Nous proposons d'analyser les données spatiales multi-spécifiques sous l'angle des assemblages d'espèces, que nous considérons en termes d'abondances absolues et non de diversité des espèces. Les assemblages d'espèces sont une des signatures des interactions spatiales locales des espèces entre elles et avec leur environnement. L'étude des assemblages d'espèces peut permettre de détecter plusieurs types d'équilibres spatialisés et de les associer à l'effet de variables environnementales. Les assemblages d'espèces sont définis ici par classification non spatiale des observations multivariées d'abondances d'espèces. Les méthodes de classification basées sur les modèles de mélange ont été choisies afin d'avoir une mesure de l'incertitude de la classification et de modéliser un assemblage par une loi de probabilité multivariée. Dans ce cadre, nous proposons : 1. une méthode d'analyse exploratoire de données spatiales multivariées d'abondances d'espèces, qui permet de détecter des assemblages d'espèces par classification, de les cartographier et d'analyser leur structure spatiale. Des lois usuelles, telle que la Gaussienne multivariée, sont utilisées pour modéliser les assemblages, 2. un modèle hiérarchique pour les assemblages d'abondances lorsque les lois usuelles ne suffisent pas. Ce modèle peut facilement s'adapter à des données contenant des variables de types différents, qui sont fréquemment rencontrées en écologie, 3. une méthode de classification de données contenant des variables de types différents basée sur des mélanges de lois à structure hiérarchique (définies en 2.). Deux applications en écologie ont guidé et illustré ce travail : l'étude à petite échelle des assemblages de deux espèces de pucerons sur des feuilles de clémentinier et l'étude à large échelle des assemblages d'une plante hôte, le plantain lancéolé, et de son pathogène, l'oïdium, sur les îles Aland en Finlande / In population ecology, species spatial patterns are studied in order to infer the existence of underlying processes, such as interactions within and between species, and species response to environmental heterogeneity. We propose to analyze spatial multi-species data by defining species abundance assemblages. Species assemblages are one of the signatures of the local spatial interactions between species and with their environment. Species assemblages are defined here by a non spatial classification of the multivariate observations of species abundances. Model-based clustering procedures using mixture models were chosen in order to have an estimation of the classification uncertainty and to model an assemblage by a multivariate probability distribution. We propose : 1. An exploratory tool for the study of spatial multivariate observations of species abundances, which defines species assemblages by a model-based clustering procedure, and then maps and analyzes the spatial structure of the assemblages. Common distributions, such as the multivariate Gaussian, are used to model the assemblages. 2. A hierarchical model for abundance assemblages which cannot be modeled with common distributions. This model can be easily adapted to mixed mode data, which are frequent in ecology. 3. A clustering procedure for mixed-mode data based on mixtures of hierarchical models. Two ecological case-studies guided and illustrated this work: the small-scale study of the assemblages of two aphid species on leaves of Citrus trees, and the large-scale study of the assemblages of a host plant, Plantago lanceolata, and its pathogen, the powdery mildew, on the Aland islands in south-west Finland
|
Page generated in 0.0954 seconds