Global ETD Search

1	OnPLS : Orthogonal projections to latent structures in multiblock and path model data analysis Löfstedt, Tommy January 2012 (has links) The amounts of data collected from each sample of e.g. chemical or biological materials have increased by orders of magnitude since the beginning of the 20th century. Furthermore, the number of ways to collect data from observations is also increasing. Such configurations with several massive data sets increase the demands on the methods used to analyse them. Methods that handle such data are called multiblock methods and they are the topic of this thesis. Data collected from advanced analytical instruments often contain variation from diverse mutually independent sources, which may confound observed patterns and hinder interpretation of latent variable models. For this reason, new methods have been developed that decompose the data matrices, placing variation from different sources of variation into separate parts. Such procedures are no longer merely pre-processing filters, as they initially were, but have become integral elements of model building and interpretation. One strain of such methods, called OPLS, has been particularly successful since it is easy to use, understand and interpret. This thesis describes the development of a new multiblock data analysis method called OnPLS, which extends the OPLS framework to the analysis of multiblock and path models with very general relationships between blocks in both rows and columns. OnPLS utilises OPLS to decompose sets of matrices, dividing each matrix into a globally joint part (a part shared with all the matrices it is connected to), several locally joint parts (parts shared with some, but not all, of the connected matrices) and a unique part that no other matrix shares. The OnPLS method was applied to several synthetic data sets and data sets of “real” measurements. For the synthetic data sets, where the results could be compared to known, true parameters, the method generated global multiblock (and path) models that were more similar to the true underlying structures compared to models without such decompositions. I.e. the globally joint, locally joint and unique models more closely resembled the corresponding true data. When applied to the real data sets, the OnPLS models revealed chemically or biologically relevant information in all kinds of variation, effectively increasing the interpretability since different kinds of variation are distinguished and separately analysed. OnPLS thus improves the quality of the models and facilitates better understanding of the data since it separates and separately analyses different kinds of variation. Each kind of variation is purer and less tainted by other kinds. OnPLS is therefore highly recommended to anyone engaged in multiblock or path model data analysis. OnPLS OPLS O2PLS PLS Multivariate analysis Multiblock and path modelling Chemometrics
2	Prezentace kvality životního prostředí a jeho vlivu na lidský organismus pomocí geostatistických nástrojů, multidisciplinární přístup / Presentation of the environmental quality and its influence on the human organism through geostatistic tools, multidisciplinary approach Stramová, Andrea January 2018 (has links) Nowadays exists large amount of data which characterizes environment on base of different parametres. Due to existence of such large amount of data which is difficult to understand and comprehensively visualize, it is necessary to find tools for their proper visualization. The main goal of this thesis is present current environment in place of residence and show possible influence of environmental factors on health. For achieving the goal it was necessary to obtain individual sets of data and prepare them for each analysis and also for their presentation. In thesis there are described various aspects of environment in the area of Příbram region. Also in text is described in detail spatial determined medical data. Thesis contains O2PLS analysis which will create regression model for thyroid function parameters. Output of whole thesis is creating a web application that clearly presents input data (environmental and medical), geostatical and 02PLS analysis. Due to this work has been developed procedure how to prepare a analyze different type of data and also created detailed describtion of presentation results. Till now has not been proven what aspects can affect thyroid function. Many diseases are affected by the environment. This thesis focuses on influence of environment on thyroid function. Main...
3	Novel variable influence on projection (VIP) methods in OPLS, O2PLS, and OnPLS models for single- and multi-block variable selection : VIPOPLS, VIPO2PLS, and MB-VIOP methods Galindo-Prieto, Beatriz January 2017 (has links) Multivariate and multiblock data analysis involves useful methodologies for analyzing large data sets in chemistry, biology, psychology, economics, sensory science, and industrial processes; among these methodologies, partial least squares (PLS) and orthogonal projections to latent structures (OPLS®) have become popular. Due to the increasingly computerized instrumentation, a data set can consist of thousands of input variables which contain latent information valuable for research and industrial purposes. When analyzing a large number of data sets (blocks) simultaneously, the number of variables and underlying connections between them grow very much indeed; at this point, reducing the number of variables keeping high interpretability becomes a much needed strategy. The main direction of research in this thesis is the development of a variable selection method, based on variable influence on projection (VIP), in order to improve the model interpretability of OnPLS models in multiblock data analysis. This new method is called multiblock variable influence on orthogonal projections (MB-VIOP), and its novelty lies in the fact that it is the first multiblock variable selection method for OnPLS models. Several milestones needed to be reached in order to successfully create MB-VIOP. The first milestone was the development of a single-block variable selection method able to handle orthogonal latent variables in OPLS models, i.e. VIP for OPLS (denoted as VIPOPLS or OPLS-VIP in Paper I), which proved to increase the interpretability of PLS and OPLS models, and afterwards, was successfully extended to multivariate time series analysis (MTSA) aiming at process control (Paper II). The second milestone was to develop the first multiblock VIP approach for enhancement of O2PLS® models, i.e. VIPO2PLS for two-block multivariate data analysis (Paper III). And finally, the third milestone and main goal of this thesis, the development of the MB-VIOP algorithm for the improvement of OnPLS model interpretability when analyzing a large number of data sets simultaneously (Paper IV). The results of this thesis, and their enclosed papers, showed that VIPOPLS, VIPO2PLS, and MB-VIOP methods successfully assess the most relevant variables for model interpretation in PLS, OPLS, O2PLS, and OnPLS models. In addition, predictability, robustness, dimensionality reduction, and other variable selection purposes, can be potentially improved/achieved by using these methods. Variable influence on projection VIP MB-VIOP OPLS O2PLS OnPLS variable selection
4	Latent variable based computational methods for applications in life sciences : Analysis and integration of omics data sets Bylesjö, Max January 2008 (has links) With the increasing availability of high-throughput systems for parallel monitoring of multiple variables, e.g. levels of large numbers of transcripts in functional genomics experiments, massive amounts of data are being collected even from single experiments. Extracting useful information from such systems is a non-trivial task that requires powerful computational methods to identify common trends and to help detect the underlying biological patterns. This thesis deals with the general computational problems of classifying and integrating high-dimensional empirical data using a latent variable based modeling approach. The underlying principle of this approach is that a complex system can be characterized by a few independent components that characterize the systematic properties of the system. Such a strategy is well suited for handling noisy, multivariate data sets with strong multicollinearity structures, such as those typically encountered in many biological and chemical applications. The main foci of the studies this thesis is based upon are applications and extensions of the orthogonal projections to latent structures (OPLS) method in life science contexts. OPLS is a latent variable based regression method that separately describes systematic sources of variation that are related and unrelated to the modeling aim (for instance, classifying two different categories of samples). This separation of sources of variation can be used to pre-process data, but also has distinct advantages for model interpretation, as exemplified throughout the work. For classification cases, a probabilistic framework for OPLS has been developed that allows the incorporation of both variance and covariance into classification decisions. This can be seen as a unification of two historical classification paradigms based on either variance or covariance. In addition, a non-linear reformulation of the OPLS algorithm is outlined, which is useful for particularly complex regression or classification tasks. The general trend in functional genomics studies in the post-genomics era is to perform increasingly comprehensive characterizations of organisms in order to study the associations between their molecular and cellular components in greater detail. Frequently, abundances of all transcripts, proteins and metabolites are measured simultaneously in an organism at a current state or over time. In this work, a generalization of OPLS is described for the analysis of multiple data sets. It is shown that this method can be used to integrate data in functional genomics experiments by separating the systematic variation that is common to all data sets considered from sources of variation that are specific to each data set. / Funktionsgenomik är ett forskningsområde med det slutgiltiga målet att karakterisera alla gener i ett genom hos en organism. Detta inkluderar studier av hur DNA transkriberas till mRNA, hur det sedan translateras till proteiner och hur dessa proteiner interagerar och påverkar organismens biokemiska processer. Den traditionella ansatsen har varit att studera funktionen, regleringen och translateringen av en gen i taget. Ny teknik inom fältet har dock möjliggjort studier av hur tusentals transkript, proteiner och små molekyler uppträder gemensamt i en organism vid ett givet tillfälle eller över tid. Konkret innebär detta även att stora mängder data genereras även från små, isolerade experiment. Att hitta globala trender och att utvinna användbar information från liknande data-mängder är ett icke-trivialt beräkningsmässigt problem som kräver avancerade och tolkningsbara matematiska modeller. Denna avhandling beskriver utvecklingen och tillämpningen av olika beräkningsmässiga metoder för att klassificera och integrera stora mängder empiriskt (uppmätt) data. Gemensamt för alla metoder är att de baseras på latenta variabler: variabler som inte uppmätts direkt utan som beräknats från andra, observerade variabler. Detta koncept är väl anpassat till studier av komplexa system som kan beskrivas av ett fåtal, oberoende faktorer som karakteriserar de huvudsakliga egenskaperna hos systemet, vilket är kännetecknande för många kemiska och biologiska system. Metoderna som beskrivs i avhandlingen är generella men i huvudsak utvecklade för och tillämpade på data från biologiska experiment. I avhandlingen demonstreras hur dessa metoder kan användas för att hitta komplexa samband mellan uppmätt data och andra faktorer av intresse, utan att förlora de egenskaper hos metoden som är kritiska för att tolka resultaten. Metoderna tillämpas för att hitta gemensamma och unika egenskaper hos regleringen av transkript och hur dessa påverkas av och påverkar små molekyler i trädet poppel. Utöver detta beskrivs ett större experiment i poppel där relationen mellan nivåer av transkript, proteiner och små molekyler undersöks med de utvecklade metoderna. Chemometrics OPLS O2PLS K-OPLS kernel-based non-linear regression classification Populus Chemistry Kemi
5	Thermal formation and chlorination of dioxins and dioxin-like compounds Jansson, Stina January 2008 (has links) This thesis contributes to an increased understanding of the formation of dioxins and dioxin-like compounds in combustion processes. Although emissions to air from waste incineration facilities have been greatly reduced by the use of efficient air pollution control measures, the resulting residues (ashes and filters) are highly toxic and are classified as hazardous waste. The main objective of the work underlying this thesis was to elucidate the formation and chlorination pathways of dioxins and dioxin-like compounds in waste combustion flue gases in the temperature range 640-200°C in a representative, well-controlled laboratory-scale reactor using artificial municipal solid waste. This could contribute to the reduction of harmful emissions to air and also reduce the toxicity of waste incineration residues, thus reducing or even eliminating the need for costly and potentially hazardous after-treatment. A comparison of four different quenching profiles showed that the formation of polychlorinated dibenzo-p-dioxins (PCDD) and dibenzofurans (PCDF) was rapid and mainly occurred in the 640-400°C temperature region, with high dependency on sufficient residence time within a specific temperature region. Prolonged residence time at high temperatures (450/460°C) reduced the PCDD yields, even at lower temperatures along the post-combustion zone. PCDD, PCDF and PCN (polychlorinated naphthalene) isomer distribution patterns indicated contributions from chlorophenol condensation as well as chlorination reactions for all three classes of compounds. The formation of PCDDs was largely influenced by chlorophenol condensation and to some extent by chlorination reactions. For the PCDFs, chlorine substitution adjacent to the oxygen bridges was unfavoured, as demonstrated by the notably lower abundance of 1,9-substituted congeners. This was supported by bidirectional orthogonal partial least squares (O2PLS) modelling. The variable with the greatest influence on the distribution of PCDD congeners was the relative free energy (RΔGf). The O2PLS models displayed distinct clusters, dividing most of the homologues into two or three sub-groups of congeners which seemed to correspond to the probability of origination from chlorophenol condensation. The effects of injection of aromatic structures into the flue gas differed for each class of compounds. Injection of naphthalene increased the formation of monochlorinated naphthalene but the remaining homologues appeared to be unaffected. This was probably due to insufficient residence time at temperatures necessary for further chlorination. Injected dibenzo-p-dioxin was decomposed, chlorinated and re-condensated into PCDDs and PCDFs, whereas injection of dibenzofuran and fluorene reduced the PCDD levels in the flue gas. / Denna avhandling fokuserar på olika aspekter som kan bidra till en ökad förståelse av bildning av dioxiner och dioxin-lika föreningar i förbränningsprocesser. Även om utsläppen till luft från sopförbränningsanläggningar har minskat kraftigt tack vare effektiva rökgasreningsmetoder, så återstår problemet med mycket giftiga rökgasreningsprodukter (askor och filter), vilka klassificeras som farligt avfall. Det huvudsakliga syftet med arbetet bakom denna avhandling var att klarlägga bildnings- och kloreringsvägarna för dioxiner och dioxin-lika föreningar i temperaturintervallet 640-200°C i rökgaser från sopförbränning. Detta kan möjliggöra lösningar för ytterligare emissionsminskningar och en avgiftning av biprodukterna från avfallsförbränning, vilket minskar eller till och med eliminerar behovet av kostsam och riskfylld efterbehandling. Realistiska och välkontrollerade försök har utförts i en lab-skalereaktor där en artificiell hushållssopa har förbränts. En jämförelse av fyra olika temperatur- och uppehållstidsprofiler visade att bildning av polyklorerade dibenso-p-dioxiner (PCDD) och dibensofuraner (PCDF) sker snabbt och huvudsakligen inom temperaturintervallet 640-400°C. Bildningen var starkt beroende av en tillräckligt lång uppehållstid inom ett visst temperaturområde. En förlängd uppehållstid vid höga temperaturer (>450°C) resulterade i minskade halter av PCDD, vilka förhöll sig låga även senare i efterförbränningszonen. Isomermönstren av PCDD, PCDF och PCN (polyklorerade naftalener) visade alla tecken på att härröra från både klorfenolkondensation och kloreringsreaktioner. PCDD-mönstret visade tydliga indikationer på bildning från klorfenoler, och till mindre grad bildning via klorering. För PCDF var klorsubstitution i positioner angränsande till syrebryggan missgynnad, vilket bekräftades av multivariat modellering (O2PLS). Den variabel som starkast påverkade bildningen av PCDD var relativa fria energin (RΔGf). Modellerna visade på en distinkt gruppering av PCDD- och PCDF-kongenerna i två eller tre grupper för varje kloreringsgrad, och föreslås vara relaterad till sannolikheten för respektive kongen att bildas via klorfenolkondensation. Injektion av aromatiska kolstrukturer i rökgaskanalen gav upphov till skilda effekter. Injektion av naftalen ökade bildningen av monoklorerad naftalen medan resterande homologer inte verkade påverkas, sannolikt på grund av för kort uppehållstid för ytterligare klorering. Dibenso-p-dioxin spjälkades sannolikt till fenoliska fragment som klorerades och sedan återkondenserades till PCDD och PCDF, medan dibensofuran och fluoren kraftigt reducerade PCDD-koncentrationerna. dioxin dioxin-like compounds isomer distribution pattern homologue profile injection O2PLS PCDD PCDF PCN PCBz PCPh MSW combustion formation chlorination flue gas quench profiles Environmental chemistry Miljökemi

1

Page generated in 0.0275 seconds