Global ETD Search

1	Méthodologie pour la détection de défaillance des procédés de fabrication par ACP : application à la production de dispositifs semi-conducteurs / PCA Methodology for Production Process Fault Detection : Application to Semiconductor Manufacturing Processes Thieullen, Alexis 09 July 2014 (has links) L'objectif de cette thèse est le développement d'une méthodologie pour la détection de défauts appliquée aux équipements de production de semi-conducteurs. L'approche proposée repose sur l'Analyse en Composantes Principales (ACP) pour construire un modèle représentatif du fonctionnement nominal d'un équipement. Pour cela, notre méthodologie consiste à exploiter l'ensemble des mesures disponibles, collectées via les capteurs internes et externes au cours desopérations de fabrication pour chaque plaque manufacturée. Nous avons développé un module de pré-traitement permettant de transformer les mesures collectées en données interprétables par l'ACP, tout en filtrant l'information considérée comme non-désirable induite par la présence de valeurs aberrantes et perturbant la construction du modèle. Nous avons combiné des extensions de l'ACP linéaire et notamment l'ACP multiway, l'ACP filtrée ainsi que l'ACP récursive, de façon à adapter la modélisation aux caractéristiques des systèmes. L'utilisation d'un filtre par moyenne mobile exponentielle nous permet de considéré la dynamique du système au cours de la réalisation d'une opération. L'ACP récursive est employée pour adapter le modèle aux changements de comportement du système après certains événements (maintenance, redémarrage, etc.).Les différentes méthodes sont illustrées à l'aide de données réelles, collectées sur un équipement actuellement exploité par STMicroelectronics Rousset. Nous proposons également une application plus générale de la méthode pour différents types d'équipement et sur une période plus importante, de façon à montrer l'intérêt industriel et la performance de cette approche. / This thesis focus on developping a fault detection methodology for semiconductor manufacturing equipment. The proposed approach is based on Principal Components Analysis (PCA) to build a representative model of equipment in adequat operating conditions. Our method exploits collected measurements from equipement sensors, for each processed wafer. However, regarding the industrial context and processes, we have to consider additional problems: collected signals from sensors exhibit different length, or durations. This is a limitation considering PCA. We have also to consider synchronization and alignment problems; semiconductor manufacturing equipment are almost dynamic, with strong temporal correlations between sensor measurements all along processes. To solve the first point, we developped a data preprocessing module to transform raw data from sensors into a convenient dataset for PCA application. The interest is to identify outliers data and products, that can affect PCA modelling. This step is based on expert knowledge, statistical analysis, and Dynamic Time Warping, a well-known algorithm from signal processing. To solve the second point, we propose a combination multiway PCA with the use of an EWMA filter to consider process dynamic. A recursive approach is employed to adapt our PCA model to specific events that can occur on equipment, e.g. maintenance, restart, etc.All the steps of our methodology are illustrated with data from a chemical vapor deposition tool exploited in STMicroelectroics Rousset fab. Finally, the efficiency and industrial interest of the proposed methodologies are verified by considering multiple equipment types on longer operating periods. Semi-Conducteurs Détection de défauts ACP multiway ACP récursive EWMA DTW Semiconductor Fault detection Multiway PCA Recursive PCA EWMA DTW
2	Det multifunktionella mötesrummet : Hur formgivningen av ett mötesrum kan underlätta för flervägskommunikation Landelius, Emma January 2016 (has links) This thesis is a study about how a conference room can be formed to facilitate for multiway communication. The room will facilitate meetings in different forms. This study uses a meeting room in Västerås for the leading team of “Habiliteringscentrum Västmanland” as an example. In the current situation the room doesn’t meet the needs of the leading team to support multiway communication and the room is perceived as uninspiring. In order to support the leading team to have multiway communication, it’s very important to have a conference room that reflects the newly introduced ways of working. The current design and use of the room has been studied through an architectural analysis, interview, observation and a questionnaire. With support from literature and the results from the architectural analysis, interview, observation and the questionnaire I have created a design proposal. Through color, work material and furniture adapted to the requirements of the leading team, the new design proposal for the room creates space for multiway communication and a possibility to stimulate the team to a creative way of thinking. The design suggested support multiway communication that can lead to more effective meetings with better results for their continued work. / Detta arbete är en studie i hur ett mötesrum kan utformas för att underlätta för flervägskommunikation. Arbetet har utgått ifrån ett mötesrum i Västerås för ledningsgruppen av Habiliteringscentrum Västmanland. Rummet ska underlätta för att ha möten i olika former. I nuläget uppfyller inte rummet ledningsgruppens behov av möten i olika former, arbetsmiljön idag är bristfällig och rummet upplevs som oinspirerande. För att ledningsgruppen ska kunna ha möten i form av flervägskommunikation är ett arbetsrum som är stödjande för deras arbetsmetoder av stor betydelse. Genom rumsanalys, intervju, observation och mailenkät har det befintliga rummet och användarnas behov studerats. Med stöd av litteratur och resultaten av rumsanalys, intervju, observation och mailenkät har jag skapat ett gestaltningsförslag. Genom färger, arbetsmaterial och möbler anpassade efter ledningsgruppens behov skapar rummet plats för flervägskommunikation och ger möjlighet för att stimulera ledningsgruppen till ett kreativt tankesätt. Spatial design information design meeting room multiway communication creativity Rumslig gestaltning informationsdesign mötesrum flervägskommunikation kreativitet
3	New methods in mixture analysis Botana Alcalde, Adolfo January 2011 (has links) The quest for a complete understanding of mixtures is a challenge which has stimulated the development of several techniques. One of the most powerful NMR-based techniques is known as Diffusion-Ordered SpectroscopY (DOSY), in which it is possible to distinguish the NMR spectra of chemical species with different hydrodynamic radii, i.e. with different self-diffusion coefficients. It allows the study of intact mixtures, providing information on the interactions within the mixture and saving time and money compared to other techniques. Unfortunately, DOSY is not very effective when signals overlap and/or the diffusion coefficients are very similar. This drawback has led to the development of new methods to overcome this problem. The present investigation is focused on developing some of these. Most DOSY datasets show multiplet phase distortions caused by J-modulation. These distortions not only hinder the interpretation of spectra, but also increase the overlap between signals. The addition of a 45º purging pulse immediately before the onset of acquisition is proposed as a way to remove the unwanted distortions. Most DOSY experiments use 1H detection, because of the higher sensitivity which is generally achieved. However, acquiring spectra with other nuclei such as 13C can reduce overlap problems. Two new sequences have been developed to maximize the sensitivity of heteronuclear DOSY experiments. In order to increase resolving power, it is also possible to incorporate another variable into diffusion experiments as a further dimension. If this results in an approximately trilinear dataset (each dimension varying independently), it is possible to extract physically meaningful information for each component using multivariate statistical methods. This is explored for the cases where the new variable is either the relaxation behaviour or the concentration variation (which can be measured during a reaction or in a set of samples with different concentrations for each component). PARAllel FACtor (PARAFAC) analysis can obtain the spectra, diffusional decay and relaxation evolution or kinetics for each of the components. In a completely different approach, the separation power of liquid chromatography has been combined in a novel way with the NMR potential for elucidating structures. NMR has been used previously as a precise way to measure average flow velocities, even in porous media. Using this capability to detect the different average velocities of solutes that occur in chromatographic columns ought to provide a new way of analysing mixtures with the same potential as LC-NMR, but faster and more simple. In such a flow system, a chromatographic column is introduced into the NMR probe and a 2D dataset is acquired and Fourier transformed to obtain the velocity distribution for each of the detected NMR signals. 538
4	Contributions a l’analyse de données multivoie : algorithmes et applications / Contributions to multiway analysis : algorithms and applications Lechuga lopez, Olga 03 July 2017 (has links) Nous proposons d’étendre des méthodes statistiques classiques telles que l’analyse discriminante, la régression logistique, la régression de Cox, et l’analyse canonique généralisée régularisée au contexte des données multivoie, pour lesquelles, chaque individu est décrit par plusieurs instances de la même variable. Les données ont ainsi naturellement une structure tensorielle. Contrairement à leur formulation standard, une contrainte structurelle est imposée. L’intérêt de cette contrainte est double: d’une part elle permet une étude séparée de l’influence des variables et de l’influence des modalités, conduisant ainsi à une interprétation facilitée des modèles. D’autre part, elle permet de restreindre le nombre de coefficients à estimer, et ainsi de limiter à la fois la complexité calculatoire et le phénomène de sur-apprentissage. Des stratégies pour gérer les problèmes liés au grande dimension des données sont également discutées. Ces différentes méthodes sont illustrées sur deux jeux de données réelles: (i) des données de spectroscopie d’une part et (ii) des données d’imagerie par résonance magnétique multimodales d’autre part, pour prédire le rétablissement à long terme de patients ayant souffert d’un traumatisme cranien. Dans ces deux cas les méthodes proposées offrent de bons résultats quand ont compare des résultats obtenus avec les approches standards. / In this thesis we develop a framework for the extension of commonly used linear statistical methods (Fisher Discriminant Analysis, Logistical Regression, Cox regression and Regularized Canonical Correlation Analysis) to the multiway context. In contrast to their standard formulation, their multiway generalization relies on structural constraints imposed to the weight vectors that integrate the original tensor structure of the data within the optimization process. This structural constraint yields a more parsimonious and interpretable model. Different strategies to deal with high dimensionality are also considered. The application of these algorithms is illustrated on two real datasets: (i) serving for the discrimination of spectroscopy data for which all methods where tested and (ii) to predict the long term recovery of patients after traumatic brain injury from multi-modal brain Magnetic Resonance Imaging. In both datasets our methods yield valuable results compared to the standard approach. Tableau multivoie Analyse de données Classification multibloc Multiway array Data analysis High dimensional Classification
5	Use of multivariate statistical methods for the analysis of metabolomic data Hervás Marín, David 12 November 2019 (has links) [ES] En las últimas décadas los avances tecnológicos han tenido como consecuencia la generación de una creciente cantidad de datos en el campo de la biología y la biomedicina. A día de hoy, las así llamadas tecnologías "ómicas", como la genómica, epigenómica, transcriptómica o metabolómica entre otras, producen bases de datos con cientos, miles o incluso millones de variables. El análisis de datos ómicos presenta una serie de complejidades tanto metodoló-gicas como computacionales que han llevado a una revolución en el desarrollo de nuevos métodos estadísticos específicamente diseñados para tratar con este tipo de datos. A estas complejidades metodológicas hay que añadir que, en la mayor parte de los casos, las restricciones logísticas y/o económicas de los proyectos de investigación suelen conllevar que los tamaños muestrales en estas bases de datos con tantas variables sean muy bajos, lo cual no hace sino empeorar las dificultades de análisis, ya que se tienen muchísimas más variables que observaciones. Entre las técnicas desarrolladas para tratar con este tipo de datos podemos encontrar algunas basadas en la penalización de los coeficientes, como lasso o elastic net, otras basadas en técnicas de proyección sobre estructuras latentes como PCA o PLS y otras basadas en árboles o combinaciones de árboles como random forest. Todas estas técnicas funcionan muy bien sobre distintos datos ómicos presentados en forma de matriz (IxJ). Sin embargo, en ocasiones los datos ómicos pueden estar expandidos, por ejemplo, al tomar medidas repetidas en el tiempo sobre los mismos individuos, encontrándonos con estructuras de datos que ya no son matrices, sino arrays tridimensionales o three-way (IxJxK). En estos casos, la mayoría de las técnicas citadas pierden parte de su aplicabilidad, quedando muy pocas opciones viables para el análisis de este tipo de estructuras de datos. Una de las técnicas que sí es útil para el análisis de estructuras three-way es N-PLS, que permite ajustar modelos predictivos razonablemente precisos, así como interpretarlos mediante distintos gráficos. Sin embargo, relacionado con el problema de la escasez de tamaño muestral relativa al desorbitado número de variables, aparece la necesidad de realizar una selección de variables relacionadas con la variable respuesta. Esto es especialmente cierto en el ámbito de la biología y la biomedicina, ya que no solo se quiere poder predecir lo que va a suceder, sino entender por qué sucede, qué variables están implicadas y, a poder ser, no tener que volver a recoger los cientos de miles de variables para realizar una nueva predicción, sino utilizar unas cuantas, las más importantes, para poder diseñar kits predictivos coste/efectivos de utilidad real. Por ello, el objetivo principal de esta tesis es mejorar las técnicas existentes para el análisis de datos ómicos, específicamente las encaminadas a analizar datos three-way, incorporando la capacidad de selección de variables, mejorando la capacidad predictiva y mejorando la interpretabilidad de los resultados obtenidos. Todo ello se implementará además en un paquete de R completamente documentado, que incluirá todas las funciones necesarias para llevar a cabo análisis completos de datos three-way. El trabajo incluido en esta tesis por tanto, consta de una primera parte teórico-conceptual de desarrollo de la idea del algoritmo, así como su puesta a punto, validación y comprobación de su eficacia; de una segunda parte empírico-práctica de comparación de los resultados del algoritmo con otras metodologías de selección de variables existentes, y de una parte adicional de programación y desarrollo de software en la que se presenta todo el desarrollo del paquete de R, su funcionalidad y capacidades de análisis. El desarrollo y validación de la técnica, así como la publicación del paquete de R, ha permitido ampliar las opciones actuales para el análisis / [CA] En les últimes dècades els avançaments tecnològics han tingut com a conseqüència la generació d'una creixent quantitat de dades en el camp de la biologia i la biomedicina. A dia d'avui, les anomenades tecnologies "òmiques", com la genòmica, epigenòmica, transcriptòmica o metabolòmica entre altres, produeixen bases de dades amb centenars, milers o fins i tot milions de variables. L'anàlisi de dades 'òmiques' presenta una sèrie de complexitats tant metodolò-giques com computacionals que han portat a una revolució en el desenvolupament de nous mètodes estadístics específicament dissenyats per a tractar amb aquest tipus de dades. A aquestes complexitats metodològiques cal afegir que, en la major part dels casos, les restriccions logístiques i / o econòmiques dels projectes de recerca solen comportar que les magnituts de les mostres en aquestes bases de dades amb tantes variables siguen molt baixes, el que no fa sinó empitjorar les dificultats d'anàlisi, ja que es tenen moltíssimes més variables que observacions Entre les tècniques desenvolupades per a tractar amb aquest tipus de dades podem trobar algunes basades en la penalització dels coeficients, com lasso o elastic net, altres basades en tècniques de projecció sobre estructures latents com PCA o PLS i altres basades en arbres o combinacions d'arbres com random forest. Totes aquestes tècniques funcionen molt bé sobre diferents dades 'òmiques' presentats en forma de matriu (IxJ), però, en ocasions les dades òmiques poden estar expandits, per exemple, cuan ni ha mesures repetides en el temps sobre els mateixos individus, trobant-se amb estructures de dades que ja no són matrius, sinó arrays tridimensionals o three-way (IxJxK). En aquestos casos, la majoria de les tècniques mencionades perden tota o bona part de la seua aplicabilitat, quedant molt poques opcions viables per a l'anàlisi d'aquest tipus d'estructures de dades. Una de les tècniques que sí que és útil per a l'anàlisi d'estructures three-way es N-PLS, que permet ajustar models predictius raonablement precisos, així com interpretar-los mitjançant diferents gràfics. No obstant això, relacionat amb el problema de l'escassetat de mostres relativa al desorbitat nombre de variables, apareix la necessitat de realitzar una selecció de variables relacionades amb la variable resposta. Això és especialment cert en l'àmbit de la biologia i la biomedicina, ja que no només es vol poder predir el que va a succeir, sinó entendre per què passa, quines variables estan implicades i, si pot ser, no haver de tornar a recollir els centenars de milers de variables per realitzar una nova predicció, sinó utilitzar unes quantes, les més importants, per poder dissenyar kits predictius cost / efectius d'utilitat real. Per això, l'objectiu principal d'aquesta tesi és millorar les tècniques existents per a l'anàlisi de dades òmiques, específicament les encaminades a analitzar dades three-way, incorporant la capacitat de selecció de variables, millorant la capacitat predictiva i millorant la interpretabilitat dels resultats obtinguts. Tot això s'implementarà a més en un paquet de R completament documentat, que inclourà totes les funcions necessàries per a dur a terme anàlisis completes de dades three-way. El treball inclòs en aquesta tesi per tant, consta d'una primera part teorica-conceptual de desenvolupament de la idea de l'algoritme, així com la seua posada a punt, validació i comprovació de la seua eficàcia, d'una segona part empíric-pràctica de comparació dels resultats de l'algoritme amb altres metodologies de selecció de variables existents i d'una part adicional de programació i desenvolupament de programació en la qual es presenta tot el desenvolupament del paquet de R, la seua funcionalitat i capacitats d'anàlisi. El desenvolupament i validació de la tècnica, així com la publicació del paquet de R, ha permès ampliar les opcions actuals per a l'anàlis / [EN] In the last decades, advances in technology have enabled the gathering of an increasingly amount of data in the field of biology and biomedicine. The so called "-omics" technologies such as genomics, epigenomics, transcriptomics or metabolomics, among others, produce hundreds, thousands or even millions of variables per data set. The analysis of 'omic' data presents different complexities that can be methodological and computational. This has driven a revolution in the development of new statistical methods specifically designed for dealing with these type of data. To this methodological complexities one must add the logistic and economic restrictions usually present in scientific research projects that lead to small sample sizes paired to these wide data sets. This makes the analyses even harder, since there is a problem in having many more variables than observations. Among the methods developed to deal with these type of data there are some based on the penalization of the coefficients, such as lasso or elastic net, others based on projection techniques, such as PCA or PLS, and others based in regression or classification trees and ensemble methods such as random forest. All these techniques work fine when dealing with different 'omic' data in matrix format (IxJ), but sometimes, these IxJ data sets can be expanded by taking, for example, repeated measurements at different time points for each individual, thus having IxJxK data sets that raise more methodological complications to the analyses. These data sets are called three-way data. In this cases, the majority of the cited techniques lose all or a good part of their applicability, leaving very few viable options for the analysis of this type of data structures. One useful tool for analyzing three-way data, when some Y data structure is to be predicted, is N-PLS. N-PLS reduces the inclusion of noise in the models and obtains more robust parameters when compared to PLS while, at the same time, producing easy-to-understand plots. Related to the problem of small sample sizes and exorbitant variable numbers, comes the issue of variable selection. Variable selection is essential for facilitating biological interpretation of the results when analyzing 'omic' data sets. Often, the aim of the study is not only predicting the outcome, but also understanding why it is happening and also what variables are involved. It is also of interest being able to perform new predictions without having to collect all the variables again. Because all of this, the main goal of this thesis is to improve the existing methods for 'omic' data analysis, specifically those for dealing with three-way data, incorporating the ability of variable selection, improving predictive capacity and interpretability of results. All this will be implemented in a fully documented R package, that will include all the necessary functions for performing complete analyses of three-way data. The work included in this thesis consists in a first theoretical-conceptual part where the idea and development of the algorithm takes place, as well as its tuning, validation and assessment of its performance. Then, a second empirical-practical part comes where the algorithm is compared to other variable selection methodologies. Finally, an additional programming and software development part is presented where all the R package development takes place, and its functionality and capabilities are exposed. The development and validation of the technique, as well as the publication of the R package, has opened many future research lines. / Hervás Marín, D. (2019). Use of multivariate statistical methods for the analysis of metabolomic data [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/130847 Analisis multivariante Modelos multiway Análisis de datos metabolómicos QUIMICA ORGANICA ESTADISTICA E INVESTIGACION OPERATIVA
6	Génération automatique d'implémentation distribuée à partir de modèles formels de processus concurrents asynchrones / Automatic Distributed Code Generation from Formal Models of Asynchronous Concurrent Processes Evrard, Hugues 10 July 2015 (has links) LNT est un langage formel de spécification récent, basé sur les algèbres de processus, où plusieurs processus concurrents et asynchrones peuvent interagir par rendez-vous multiple, c'est-à-dire à deux ou plus, avec échange de données. La boite à outils CADP (Construction and Analysis of Distributed Processes) offre plusieurs techniques relatives à l'exploration d'espace d'états, comme le model checking, pour vérifier formellement une spécification LNT. Cette thèse présente une méthode de génération d'implémentation distribuée à partir d'un modèle formel LNT décrivant une composition parallèle de processus. En s'appuyant sur CADP, nous avons mis au point le nouvel outil DLC (Distributed LNT Compiler), capable de générer, à partir d'une spécification LNT, une implémentation distribuée en C qui peut ensuite être déployée sur plusieurs machines distinctes reliées par un réseau. Pour implémenter de manière correcte et efficace les rendez-vous multiples avec échange de données entre processus distants, nous avons élaboré un protocole de synchronisation qui regroupe différentes approches proposées par le passé. Nous avons mis au point une méthode de vérification de ce type de protocole qui, en utilisant LNT et CADP, permet de détecter des boucles infinies ou des interblocages dus au protocole, et de vérifier que le protocole réalise des rendez-vous cohérents par rapport à une spécification donnée. Cette méthode nous a permis d'identifier de possibles interblocages dans un protocole de la littérature, et de vérifier le bon comportement de notre propre protocole. Nous avons aussi développé un mécanisme qui permet, en embarquant au sein d'une implémentation des procédures C librement définies par l'utilisateur, de mettre en place des interactions entre une implémentation générée et d'autres systèmes de son environnement. Enfin, nous avons appliqué DLC au nouvel algorithme de consensus Raft, qui nous sert de cas d'étude, notamment pour mesurer les performances d'une implémentation générée par DLC. / LNT is a recent formal specification language, based on process algebras, where several concurrent asynchronous processes can interact by multiway rendezvous (i.e., involving two or more processes), with data exchange. The CADP (Construction and Analysis of Distributed Processes) toolbox offers several techniques related to state space exploration, like model checking, to formally verify an LNT specification. This thesis introduces a distributed implementation generation method, starting from an LNT formal model of a parallel composition of processes. Taking advantage of CADP, we developed the new DLC (Distributed LNT Compiler) tool, which is able to generate, from an LNT specification, a distributed implementation in C that can be deployed on several distinct machines linked by a network. In order to handle multiway rendezvous with data exchange between distant processes in a correct and efficient manner, we designed a synchronization protocol that gathers different approaches suggested in the past. We set up a verification method for this kind of protocol, which, using LNT and CADP, can detect livelocks or deadlocks due to the protocol, and also check that the protocol leads to valid interactions with respect to a given specification. This method allowed us to identify possible deadlocks in a protocol from the literature, and to verify the good behavior of our own protocol. We also designed a mechanism that enables the final user, by embedding user-defined C procedures into the implementation, to set up interactions between the generated implementation and other systems in the environment. Finally, we used the new consensus algorithm Raft as a case study, in particular to measure the performances of an implementation generated by DLC. Systèmes Distribués Méthodes Formelles Algèbre de Processus Compilation Rendez-Vous Multiple Distributed Systems Formal Methods Process Algebra Compilation Multiway Rendezvous 004
7	Algoritmo das projeções sucessivas para seleção de variáveis em calibração de segunda ordem Gomes, Adriano de Araújo 29 June 2015 (has links) Submitted by Maike Costa (maiksebas@gmail.com) on 2016-05-12T12:35:36Z No. of bitstreams: 1 arquivo total.pdf: 5933598 bytes, checksum: f90080e0529915a4c5c37308259bee89 (MD5) / Made available in DSpace on 2016-05-12T12:35:36Z (GMT). No. of bitstreams: 1 arquivo total.pdf: 5933598 bytes, checksum: f90080e0529915a4c5c37308259bee89 (MD5) Previous issue date: 2015-06-29 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / In this work it was developed a new strategy for intervals selection using the successive projections algorithm (SPA) coupled to N-PLS and U-PLS models, both with residual bilinearização (RBL) as a post-calibration step. The new algorithm coupled to N-PLS/RBL models was evaluated in two cases of studies. The first was simulated data for quantitation of two analytes (A and B) in the presence of a single interfering. On the second study was conducted a quantitation of ofloxacin in water in the presence of interferents (ciprofloxacin and danofloxacin) by means of liquid chromatography with diode array detection (LC-DAD) data modeling. The results were compared to the N-PLS/RBL model and the variables selection with the genetic algorithm (GA-N-PLS/RBL). In the first case of study (simulated data) were observed RMSEP values (x 10-3 in arbitrary units) for the analytes A and B in the order of 6.7 to 47.6; 10.6 to 11.4; and 6.0 to 14.0 for the N-PLS/RBL, Ga-N-PLS/RBL and the proposed method, respectively. On the second case of study (HPLC-DAD data) RMSEP value (mg/L) of 0.72 (N-PLS/RBL); 0.70 (GA-N-PLS/RBL) and 0.64 (iSPA N-PLS/RBL) were obtained. When combined with the U-PLS/RBL, the new algorithm was evaluated in the EEM modeling in the presence of inner filter effect. Simulated data and quantitation of phenylephrine in the presence of acetaminophen in water sample and interferences (ibuprofen and acetylsalicylic acid) were used as a case of studies. The results were compared to the U-PLS/RBL and e twell established method PARAFAC. For simulated data was observed the following RMSEP values (in arbitrary units) 1.584; 0.077 and 0.066 for PARAFAC; U-PLS/RBL and the proposed method, respectively. In the quantitation of phenylephrine the found RMSEP (in μg/L) were of 0.164 (PARAFAC); 0.089 (U-PLS/RBL) and 0.069 (ISPA-U-PLS/RBL). In all cases it was shown that variables selection is a useful tool capable of improving accuracy when compared with the respective global models (model without variables selection) leading to more parsimonious models. It was observed in all cases, that the sensitivity loss promoted by variables selection is compensated by using more selective channels, justifying the obtained RMSEP smaller values. Finally, it was also observed that the models based on variables selection such as the proposed method were free from significant bias at 95% confidence. / Neste trabalho foi desenvolvida uma nova estratégia para seleção de intervalos empregando o algoritmo das projeções sucessivas (SPA) acoplado a modelos N-PLS e U-PLS, ambos com etapa pós-calibração de bilinearização residual (RBL). O novo algoritmo acoplado a modelos N-PLS/RBL, foi avaliado em dois estudos de casos. O primeiro envolvendo dados simulados para quantificação de dois analitos (A e B) na presença de um único interferente. No segundo foi conduzida a quantificação de ofloxacina em água na presença de interferentes (ciprofloxacina e danofloxacina) por meio da modelagem de dados cromatografia liquida com detecção por arranjo de diodos (LC-DAD). Os resultados obtidos foram comparados ao modelo N-PLS/RBL e a seleção de variáveis com o algoritmo genético (GA-N-PLS/RBL). No primeiro estudo de caso (dados simulados) foram observados valores de RMSEP (x 10-3 em unidades arbitrárias) para os analitos A e B da ordem de 6,7 e 47,6; 10,6 e 11,4; 6,0 e 14,0 para o N-PLS/RBL, GA-N-PLS/RBL e o método proposto, respectivamente. No segundo estudo de caso (dados HPLC-DAD) valores de RMSEP (em mg/L) de 0,72 (N-PLS/RBL); 0,70 (GA-N-PLS/RBL) e 0,64 (iSPA-N-PLS/RBL) foram obtidos. Quando combinado com o U-PLS/RBL o novo algoritmo foi avaliado na modelagem de EEM em presença efeito de filtro interno. Dados simulados e a quantificação de fenilefrina na presença de paracetamol em amostras de água e interferentes (Ibuprofeno e ácido acetil salicílico) foram usados como estudos de caso. Os resultados obtidos foram comparados ao modelo U-PLS/RBL e ao bem estabelecido método PARAFAC. Para dados simulados foram observado os seguintes valores de RMSEP (em unidades arbitrarias) 1,584; 0,077 e 0,066 para o PARAFAC; U-PLS/RBL e método proposto, respectivamente. Na quantificação de fenilefrina os RMSEP (em μg/L) encontrados foram de 0,164 (PARAFAC); 0,089 (U-PLS/RBL) e 0,069 (iSPA-U-PLS/RBL). Em todos os casos foi demostrado que seleção de variáveis é uma ferramenta útil capaz de melhorar a acurácia quando comparados aos respectivos modelos globais (modelo sem seleção de variáveis) e tornar os modelos mais parcimoniosos. Foi observado ainda para todos os casos, que a perda de sensibilidade promovida pela seleção de variáveis é compensada pelo uso de canais mais seletivos, justificando os menores valores de RMSEP obtidos. E por fim, foi também observado que os modelos baseados em seleção de variáveis como o método proposto foram isentos de bias significativos a 95% de confiança. Seleção de intervalos Dados multivias Efeito de filtro interno Vantagem de segunda ordem Multiway data Inner filter effect Second order advantage Intervals selection CIENCIAS EXATAS E DA TERRA::QUIMICA
8	Efficient processing of multiway spatial join queries in distributed systems / Processamento eficiente de consultas de multi-junção espacial em sistemas distribuídos Oliveira, Thiago Borges de 29 November 2017 (has links) Submitted by Franciele Moreira (francielemoreyra@gmail.com) on 2017-12-12T16:13:05Z No. of bitstreams: 2 Tese - Thiago Borges de Oliveira - 2017.pdf: 1684209 bytes, checksum: f64b32084ca6b13a58109e4d2cffe541 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Approved for entry into archive by Luciana Ferreira (lucgeral@gmail.com) on 2017-12-13T09:33:57Z (GMT) No. of bitstreams: 2 Tese - Thiago Borges de Oliveira - 2017.pdf: 1684209 bytes, checksum: f64b32084ca6b13a58109e4d2cffe541 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Made available in DSpace on 2017-12-13T09:33:57Z (GMT). No. of bitstreams: 2 Tese - Thiago Borges de Oliveira - 2017.pdf: 1684209 bytes, checksum: f64b32084ca6b13a58109e4d2cffe541 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Previous issue date: 2017-11-29 / Multiway spatial join is an important type of query in spatial data processing, and its efficient execution is a requirement to move spatial data analysis to scalable platforms as has already happened with relational and unstructured data. In this thesis, we provide a set of comprehensive models and methods to efficiently execute multiway spatial join queries in distributed systems. We introduce a cost-based optimizer that is able to select a good execution plan for processing such queries in distributed systems taking into account: the partitioning of data based on the spatial attributes of datasets; the intra-operator level of parallelism, which enables high scalability; and the economy of cluster resources by appropriately scheduling the queries before execution. We propose a cost model based on relevant metadata about the spatial datasets and the data distribution, which identifies the pattern of costs incurred when processing a query in this environment. We formalized the distributed multiway spatial join plan scheduling problem as a bi-objective linear integer model, considering the minimization of both the makespan and the communication cost as objectives. Three methods are proposed to compute schedules based on this model that significantly reduce the resource consumption required to process a query. Although targeting multiway spatial join query scheduling, these methods can be applied to other kinds of problems in distributed systems, notably problems that require both the alignment of data partitions and the assignment of jobs to machines. Additionally, we propose a method to control the usage of resources and increase system throughput in the presence of constraints on the network or processing capacity. The proposed cost-based optimizer was able to select good execution plans for all queries in our experiments, using public datasets with a significant range of sizes and complex spatial objects. We also present an execution engine that is capable of performing the queries with near-linear scalability with respect to execution time. / A multi-junção espacial é um tipo importante de consulta usada no processamento de dados espaciais e sua execução eficiente é um requisito para mover a análise de dados espaciais para plataformas escaláveis, assim como aconteceu com dados relacionais e não estruturados. Nesta tese, propomos um conjunto de modelos e métodos para executar eficientemente consultas de multi-junção espacial em sistemas distribuídos. Apresentamos um otimizador baseado em custos que seleciona um bom plano de execução levando em consideração: o particionamento de dados com base nos atributos espaciais dos datasets; o nível de paralelismo intra-operador que proporciona alta escalabilidade; e o escalonamento das consultas antes da execução que resulta em economia de recursos computacionais. Propomos um modelo de custo baseado em metadados dos datasets e da distribuição de dados, que identifica o padrão de custos incorridos no processamento de uma consulta neste ambiente. Formalizamos o problema de escalonamento de planos de execução da multi-junção espacial distribuída como um modelo linear inteiro bi-objetivo, que minimiza tanto o custo de processamento quanto o custo de comunicação. Propomos três métodos para gerar escalonamentos a partir deste modelo, os quais reduzem significativamente o consumo de recursos no processamento das consultas. Embora projetados para o escalonamento da multi-junção espacial, esses métodos podem também ser aplicados a outros tipos de problemas em sistemas distribuídos, que necessitam do alinhamento de partições de dados e da distribuição de tarefas a máquinas de forma balanceada. Além disso, propomos um método para controlar o uso de recursos e aumentar a vazão do sistema na presença de restrições nas capacidades da rede ou de processamento. O otimizador proposto foi capaz de selecionar bons planos de execução para todas as consultas em nossos experimentos, as quais usaram datasets públicos com uma variedade significativa de tamanhos e de objetos espaciais complexos. Apresentamos também uma máquina de execução, capaz de executar as consultas com escalabilidade próxima de linear em relação ao tempo de execução. Distributed multiway spatial join Cost-based optimizer Job scheduling Histograms Multi-junção espacial distribuída Otimizador baseado em custos Escalonamento de tarefas Histogramas
9	Décomposition tensorielle de signaux luminescents émis par des biosenseurs bactériens pour l'identification de Systèmes Métaux-Bactéries / Tensor decomposition approach for identifying bacteria-metals systems Caland, Fabrice 17 September 2013 (has links) La disponibilité et la persistance à l'échelle locale des métaux lourds pourraient être critiques notamment pour l'usage futur des zones agricoles ou urbaines, au droit desquelles de nombreux sites industriels se sont installés dans le passé. La gestion de ces situations environnementales complexes nécessitent le développement de nouvelles méthodes d'analyse peu invasives (capteurs environnementaux), comme celles utilisant des biosenseurs bactériens, afin d'identifier et d'évaluer directement l'effet biologique et la disponibilité chimique des métaux. Ainsi dans ce travail de thèse, nous avons cherché à identifier, à l'aide d'outils mathématiques de l'algèbre multilinéaire, les réponses de senseurs bactériens fluorescents dans des conditions environnementales variées, qu'il s'agisse d'un stress engendré par la présence à forte dose d'un métal ou d'une carence nutritive engendrée par son absence. Cette identification est fondée sur l'analyse quantitative à l'échelle d'une population bactérienne de signaux multidimensionnels. Elle repose en particulier sur (i) l'acquisition de données spectrales (fluorescence) multi-variées sur des suspensions de biosenseurs multicolores interagissant avec des métaux et sur (ii) le développement d'algorithme de décomposition tensoriels. Les méthodes proposées, développées et utilisées dans ce travail s'efforcent d'identifier « sans a priori» a minima, la réponse fonctionnelle de biosenseurs sous différentes conditions environnementales, par des méthodes de décomposition de tenseurs sous contraintes des signaux spectraux observables. Elles tirent parti de la variabilité des réponses systémiques et permettent de déterminer les sources élémentaires identifiant le système et leur comportement en fonction des paramètres extérieurs. Elles sont inspirées des méthodes CP et PARALIND . L'avantage de ce type d'approche, par rapport aux approches classiques, est l'identification unique des réponses des biosenseurs sous de faibles contraintes. Le travail a consisté à développer des algorithmes efficaces de séparations de sources pour les signaux fluorescents émis par des senseurs bactériens, garantissant la séparabilité des sources fluorescentes et l'unicité de la décomposition. Le point original de la thèse est la prise en compte des contraintes liées à la physique des phénomènes analysés telles que (i) la parcimonie des coefficients de mélange ou la positivité des signaux source, afin de réduire au maximum l'usage d'a priori ou (ii) la détermination non empirique de l'ordre de la décomposition (nombre de sources). Cette posture a permis aussi d'améliorer l'identification en optimisant les mesures physiques par l'utilisation de spectres synchrones ou en apportant une diversité suffisante aux plans d'expériences. L'usage des spectres synchrones s'est avéré déterminant à la fois pour améliorer la séparation des sources de fluorescence, mais aussi pour augmenter le rapport signal sur bruit des biosenseurs les plus faibles. Cette méthode d'analyse spectrale originale permet d'élargir fortement la gamme chromatique des biosenseurs fluorescents multicolores utilisables simultanément. Enfin, une nouvelle méthode d'estimation de la concentration de polluants métalliques présents dans un échantillon à partir de la réponse spectrale d'un mélange de biosenseurs non-spécifiques a été développée / Availability and persistence of heavy metals could be critical for future use of agricultural or urban areas, on which many industrial sites have installed in the past. The management of these complex environmental situations requiring the development of new analytical methods minimally invasive, such as bacterial biosensors, to identify and directly assess the biological effects and the chemical availability of metals. The aims of this thesis was to identify the responses of fluorescent bacterial sensors various environmental conditions, using mathematical tools of algebra multi-linear, whether stress caused by the presence of high dose of a metal or a nutrient deficiency caused by his absence. This identification is based on quantitative analysis of multidimensional signals at the bacterial population-scale. It is based in particular on (i) the acquisition of multivariate spectral data on suspensions of multicolored biosensors interacting with metals and (ii) the development of algorithms for tensor decomposition. The proposed methods, developed and used in this study attempt to identify functional response of biosensors without \textsl{a priori} by decomposition of tensor containing the spectral signals. These methods take advantage of the variability of systemic responses and allow to determine the basic sources identifying the system and their behavior to external factors. They are inspired by the CP and PARALIND methods. The advantage of this approach, compared to conventional approaches, is the unique identification of the responses of biosensors at low constraints. The work was to develop efficient algorithms for the source separation of fluorescent signals emitted by bacterial sensors, ensuring the sources separability and the uniqueness of the decomposition. The original point of this thesis is the consideration of the physical constraints of analyzed phenomena such as (i) the sparsity of mixing coefficients or positivity of sources signals in order to minimize the use of a priori or (ii) the non-empirical determination of the order of decomposition (number of sources).This posture has also improved the identification optimizing physical measurements by the use of synchronous spectra or providing sufficient diversity in design of experiments. The use of synchronous spectra proved crucial both to improve the separation of fluorescent sources, but also to increase the signal to noise ratio of the lowest biosensors. This original method of spectral analysis can greatly expand the color range of multicolored fluorescent biosensors used simultaneously. Finally, a new method of estimating the concentration of metal pollutants present in a sample from the spectral response of a mixture of non-specific biosensor was developed Biosenseurs Séparation de sources Autofluorescence Données multidimensionnelles Spectrofluorimétrie Pollution environnementale Algèbre multilinéaire Unicité Colinéarité Candecomp/parafac Biosensors Sources separation Autofluorescence Multiway data Spectrofluorimetry Environmental pollution Multilinear algebra 4-way array Uniqueness Collinear loadings Candecomp/parafac 543.5 628.55
10	Avaliação da segurança de sistemas de potência para múltiplas contingências usando árvore de decisão multicaminhos OLIVEIRA, Werbeston Douglas de 15 September 2017 (has links) Submitted by Carmen Torres (carmensct@globo.com) on 2018-02-09T18:08:56Z No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Tese_AvaliacaoSegurancaSistemas.pdf: 3858130 bytes, checksum: 2cbcf782498880ce489e50eb58e31bf7 (MD5) / Approved for entry into archive by Edisangela Bastos (edisangela@ufpa.br) on 2018-02-16T13:42:52Z (GMT) No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Tese_AvaliacaoSegurancaSistemas.pdf: 3858130 bytes, checksum: 2cbcf782498880ce489e50eb58e31bf7 (MD5) / Made available in DSpace on 2018-02-16T13:42:52Z (GMT). No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Tese_AvaliacaoSegurancaSistemas.pdf: 3858130 bytes, checksum: 2cbcf782498880ce489e50eb58e31bf7 (MD5) Previous issue date: 2017-09-15 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / CNPq - Conselho Nacional de Desenvolvimento Científico e Tecnológico / Eletronorte - Centrais Elétricas do Norte do Brasil S/A / A busca de formas eficazes para promover a operação segura de sistemas de potência e aumentar a compreensão dos operadores tem encorajado a pesquisa contínua de novas técnicas e métodos que possam ajudar nessa tarefa. Nesta tese, propõe-se uma abordagem para avaliar a segurança da operação de sistemas de potência para múltiplas contingências usando a técnica árvore de decisão multi-caminhos ou (MDT, do inglês “Multiway Decision Tree”). A MDT difere de outras técnicas de árvore de decisão por estabelecer, na etapa de treinamento, um valor de atributo categórico por ramo. Essa abordagem propõe o uso de topologias (contingências) como atributos categóricos. Desta forma, a MDT melhora a interpretabilidade em relação ao estado operacional do sistema de potência, pois o operador pode ver claramente as variáveis críticas para cada topologia, de modo que as regras da MDT possam ser usadas no auxílio à tomada de decisão. Essa abordagem proposta foi utilizada para avaliação da segurança em dois sistemas testes, o sistema IEEE 39 barras e o sistema da parte do Norte do Sistema Interligado Nacional (SIN), sendo que este último foi testado com dados reais de um dia de operação. A técnica proposta baseada em MDT demonstrou bom desempenho, utiliando um conjunto de regras simples e claras. Também foi realizada uma comparação dos resultados obtidos com outras técnicas baseadas em árvores de decisão e verificou-se que o MDT resultou em um procedimento mais simples para a classificação de segurança dos sistemas com boa precisão. / The search for effective ways to promote the secure operation of power systems and to increase its understanding by operators has encouraged continuous research for new techniques and methods that can help in this task. In this paper, it is proposed an approach to assess power system operation security for multiple contingencies using a multiway decision tree (MDT). The MDT differs from other decision tree techniques for establishing, in the training step, one value of the categorical attributes by branch. This approach proposes the use of topologies (contingencies) as categorical attributes. In this way, it improves interpretability regarding the power system operational state, as the operator can see clearly the critical variables for each topology, such that the MDT rules can be used in aiding the decision-making. This proposal was used for security assessment of two test systems, the IEEE 39-bus system and the Northern part of the Brazilian Interconnected Power System (BIPS), and BIPS was tested with real data from one day operation. The proposed MDT-based technique demonstrated superior performance, with a set of simple and clear rules. It was also performed a comparison of the obtained results with other techniques based on DT and it turned out that MDT has resulted in a simpler procedure for power system security classification with good accuracy. Mineração de dados (Computação) Árvores de decisão Estabilidade transitória Divisão multi-caminhos MDT - Multiway Decision Tree Segurança do sistema de potência Múltiplas contingências SISTEMAS ELÉTRICOS DE POTÊNCIA SISTEMAS DE ENERGIA ELÉTRICA

Search results