Global ETD Search

71	Approaches to modelling functional time series with an application to electricity generation data Jin, Zehui January 2018 (has links) We study the half-hourly electricity generation by coal and by gas in the UK over a period of three years from 2012 to 2014. As a highly frequent time series, daily cycles along with seasonality and trend across days can be seen in the data for each fuel. Taylor (2003), Taylor et al. (2006), and Taylor (2008) studied time series of the similar features by introducing double seasonality into the methods for a single univariate time series. As we are interested in the continuous variation in the generation within a day, the half-hourly observations within a day are considered as a continuous function. In this way, a time series of half-hourly discrete observations is transformed into a time series of daily functions. The idea of a time series of functions can also seen in Shang (2013), Shang and Hyndman (2011) and Hyndman and Ullah (2007). We improve their methods in a few ways. Firstly, we identify the systematic effect due to the factors that take effect in a long term, such as weather and prices of fuels, and the intrinsic differences between the days of the week. The systematic effect is modeled and removed before we study the day-by-day random variation in the functions. Secondly, we extend functional principal component analysis (PCA), which was applied on one group of functions in Shang (2013), Shang and Hyndman (2011) and Hyndman and Ullah (2007), into partial common PCA, in order to consider the covariance structures of two groups of functions and their similarities. A test on the goodness of the approximation to the functions given by the common eigenfunctions is also proposed. The idea of bootstrapping residuals from the approximation seen in Shang (2014) is employed but is improved with non-overlapping blocks and moving blocks of residuals. Thirdly, we use a vector autoregressive (VAR) model, which is a multivariate approach, to model the scores on common eigenfunctions of a group such that the cross-correlation between the scores can be considered. We include Lasso penalties in the VAR model to select the significant covariates and refit the selection with ordinary least squares to reduce the bias. Our method is compared with the stepwise procedure by Pfaff (2007), and is proved to be less variable and more accurate on estimation and prediction. Finally, we propose the method to give the point forecasts of the daily functions. It is more complicated than the methods of Shang (2013), Shang and Hyndman (2011) and Hyndman and Ullah (2007) as the systematic effect needs to be included. An adjustment interval is also given along with a point forecast, which represents the range within which the true function might vary. Our methods to give the point forecast and the adjustment interval include the information updating after the training period, which is not considered in the classical predicting equations of VAR and GARCH seen in Tsay (2013) and Engle and Bollerslev (1986). 510
72	ASSESSMENT OF SPATIOTEMPORAL VARIATIONS OF GROUNDWATER LEVELS IN THE PLATTE RIVER BASIN USING DATA MINING Bista, Astha 01 August 2019 (has links) Rapid population growth and climate variability have been posing pressure on groundwater management, especially in regions dominated by irrigation agriculture. Effective management practices require a better understanding of groundwater dynamics and its contributing factors, such as recharge, groundwater-surface water interactions, soil and unsaturated zone characteristics. Although groundwater models can provide valuable insights into these questions, these models are often nonexistent or cost prohibitive. Cluster analysis Data mining Groundwater management PCA SSA
73	Spatio-temporal analysis of GRACE gravity field variations using the principal component analysis Anjasmara, Ira Mutiara January 2008 (has links) Gravity Recovery and Climate Experiment (GRACE) mission has amplified the knowledge of both static and time-variable part of the Earth’s gravity field. Currently, GRACE maps the Earth’s gravity field with a near-global coverage and over a five year period, which makes it possible to apply statistical analysis techniques to the data. The objective of this study is to analyse the most dominant spatial and temporal variability of the Earth’s gravity field observed by GRACE using a combination of analytical and statistical methods such as Harmonic Analysis (HA) and Principal Component Analysis (PCA). The HA is used to gain general information of the variability whereas the PCA is used to find the most dominant spatial and temporal variability components without having to introduce any presetting. The latter is an important property that allows for the detection of anomalous or a-periodic behaviour that will be useful for the study of various geophysical processes such as the effect from earthquakes. The analyses are performed for the whole globe as well as for the regional areas of: Sumatra- Andaman, Australia, Africa, Antarctica, South America, Arctic, Greenland, South Asia, North America and Central Europe. On a global scale the most dominant temporal variation is an annual signal followed by a linear trend. Similar results mostly associated to changing land hydrology and/or snow cover are obtained for most regional areas except over the Arctic and Antarctic where the secular trend is the prevailing temporal variability. / Apart from these well-known signals, this contribution also demonstrates that the PCA is able to reveal longer periodic and a-periodic signal. A prominent example for the latter is the gravity signal of the Sumatra-Andaman earthquake in late 2004. In an attempt to isolate these signals, linear trend and annual signal are removed from the original data and the PCA is once again applied to the reduced data. For a complete overview of these results the most dominant PCA modes for the global and regional gravity field solutions are presented and discussed.
74	Mining for Lung Cancer Biomarkers in Plasma Metabolomics Data / Sökande efter Biomarkörer för Lungcancer genom Analys av Metabolitdata Johnsson, Anna January 2010 (has links) <p>Lung cancer is the cancer form that has the highest mortality worldwide and inaddition the survival of lung cancer is very low. Only 15% of the patients are alivefive years from set diagnosis. More research is needed to understand the biologyof lung cancer and thus make it possible to discover the disease at an early stage.Early diagnosis leads to an increased chance of survival. In this thesis 179 lungcancer- and 116 control samples of blood serum were analyzed for identificationof metabolomic biomarkers. The control samples were derived from patients withbenign lung diseases.Data was gained from GC/TOF-MS analysis and analyzed with the help ofthe multivariate analysis methods PCA and OPLS/OPLS-DA. In this thesis it isinvestigated how to pre-treat and analyze the data in the best way in order todiscover biomarkers. One part of the aim was to give directions for how to selectsamples from a biobank for further biological validation of suspected biomarkers.Models for different stages of lung cancer versus control samples were computedand validated. The most influencing metabolites in the models were selected andconfoundings with other clinical characteristics like gender and hemoglobin levelswere studied. 13 lung cancer biomakers were identified and validated by raw dataand new OPLS models based solely upon the biomarkers.In summary the identified biomarkers are able to separate fairly good betweencontrol samples and late lung cancer, but are poor for separation of early lungcancer from control samples. The recommendation is to select controls and latelung cancer samples from the biobank for further confirmation of the biomarkers.NyckelordLung cancer is the cancer form that has the highest mortality worldwide and inaddition the survival of lung cancer is very low. Only 15% of the patients are alivefive years from set diagnosis. More research is needed to understand the biologyof lung cancer and thus make it possible to discover the disease at an early stage.Early diagnosis leads to an increased chance of survival. In this thesis 179 lungcancer- and 116 control samples of blood serum were analyzed for identificationof metabolomic biomarkers. The control samples were derived from patients withbenign lung diseases.Data was gained from GC/TOF-MS analysis and analyzed with the help ofthe multivariate analysis methods PCA and OPLS/OPLS-DA. In this thesis it isinvestigated how to pre-treat and analyze the data in the best way in order todiscover biomarkers. One part of the aim was to give directions for how to selectsamples from a biobank for further biological validation of suspected biomarkers.Models for different stages of lung cancer versus control samples were computedand validated. The most influencing metabolites in the models were selected andconfoundings with other clinical characteristics like gender and hemoglobin levelswere studied. 13 lung cancer biomakers were identified and validated by raw dataand new OPLS models based solely upon the biomarkers.In summary the identified biomarkers are able to separate fairly good betweencontrol samples and late lung cancer, but are poor for separation of early lungcancer from control samples. The recommendation is to select controls and latelung cancer samples from the biobank for further confirmation of the biomarkers.Nyckelord</p> metabolomics biomarkers multivariate data analysis PCA OPLS OPLS-DA Bioinformatics Bioinformatik
75	Video-based Fire Analysis and Animation Using Eigenfires Nikfetrat, Nima 31 October 2012 (has links) We introduce new approaches of modeling and synthesizing realistic-looking 2D fire animations using video-based techniques and statistical analysis. Our approaches are based on real footage of various small-scale fire samples with customized motions that we captured for this research, and the final results can be utilized as a sequence of images in video games, motion graphics and cinematic visual effects. Instead of conventional physically-based simulation, we utilize example-based principal component analysis (PCA) and take it to a new level by introducing “Eigenfires”, as a new way to represent the main features of various real fire samples. The visualization of Eigenfires helps animators to design the fire interactively through a more meaningful and convenient way in comparison to known procedural approaches or other video-based synthesis models. Our system enables artists to control real-life fire videos through motion transitions and loops by selecting any desired ranges of any video clips and then the system takes care of the remaining part that best represent a smooth transition. Instead of tricking the eyes with a basic blending only between similar shapes, our flexible fire transitions are capable of connecting various fire styles. Our techniques are also effective for data compressions, they can deliver real-time interactive recognition for high resolution images, very easy to implement, and requires little parameter tuning. Fire Flame Synthesis Procedural Reconstruction EigenFire Simulation PCA Recognition Animation
76	Estudi de la utilització dels mapes de potencial electrostàtic i de polarització com a descriptors moleculars Roset Cazalda, Mª Lourdes 18 November 2011 (has links) La Ingeniería molecular se basa en el conocimiento de las características estereoelectrónicas que definen el reconocimiento molecular, que es el resultado de una complementariedad, tanto geométrica como electrónica, entre diferentes entidades moleculares. La importancia de las diferentes contribuciones electrostáticas nos permite realizar un estudio teórico de predicción de la reactividad y otras propiedades moleculares a partir de cálculos de potencial electrostático y de polarización moleculares. El presente trabajo se basa en el estudio de la utilización de los mapas de potencial electrostático y de potencial de polarización como descriptores moleculares. En primer lugar se realiza un estudio del efecto de la base y de la metodología empleada en el cálculo de propiedades eléctricas de primer y segundo orden. El análisis se lleva a cabo con las moléculas de cianuro de hidrógeno, formaldehído y urea. Las bases utilizadas son del tipo doble zeta estándar, a las cuales se han añadido funciones de polarización y difusas. En particular, se han utilizado la base doble zeta 6-31G(d), les bases doble zeta aumentadas con uno o dos conjuntos de funciones de polarización : 6-31G(d,p) , 6-31G(2d,2p) y también se ha utilizado la base 6-311G++(2d,2p), que incluye funciones difusas. Los diferentes niveles de cálculo utilizan metodologías Hartree-Fock, MÆller-Plesset de segundo y cuarto orden y teoría del funcional de la densidad (DFT) : SCF, MP2, MP4, BLYP i B3LYP. Se analiza el efecto de los diferentes conjuntos de base a la contribución de la polarización a la energía de interacción, calculando para cada sistema propiedades de primer orden, como son los momentos dipolares y los momentos cuadrupolares, y propiedades de segundo orden, como la polarizabilidad y hiperpolarizabilidad moleculares. Seguidamente se evalúa el efecto de la base y el método de cálculo en la obtención de potenciales electrostáticos y de polarización moleculares. Se realiza un estudio comparativo de los mapas calculados con diferentes bases y metodologías, en concreto un estudio de la distribución espacial y un análisis de correlación entre las diferentes bases y metodologías. Un análisis de los mapas de polarización molecular a partir del cálculo de las diferencias de polarización relativas y las desviaciones estándar correspondientes nos permite un estudio comparativo de las diferentes metodologías y bases utilizadas. En particular se realiza un análisis comparativo entre diferentes métodos de cálculo con la base 6-311G++(2d,2p), tomando como referencia el cálculo MP4. Finalmente, se utilizan los mapas de potencial electrostático, de polarización y de interacción para el análisis de las características de reconocimiento molecular de un conjunto de compuestos bioactivos, a fin de analizar la importancia de la contribución de la polarización. Por este motivo, se elige para el estudio un conjunto de moléculas con una alta polarizabilidad, y en concreto, dos familias de compuestos con abundantes átomos de cloro y con una actividad tóxica definida, que forman parte de los grupos de dioxinas y furanos. Para ello se realiza el estudio de la inclusión de la polarización molecular como descriptor en la predicción de la actividad biológica de dioxinas y furanos, realizando el cálculo de potenciales electrostáticos y de polarización, un análisis de los mapas de potencial, y definiendo las principales zonas de interacción electrostática y de polarización molecular a partir de cálculos de componentes principales (PCA), así como la predicción de la actividad biológica en base a un estudio realizado mediante cálculos de mínimos cuadrados parciales (PLS). / Molecular engineering is based on the knowledge of the stereoelectronic features that define the molecular recognition, which is the result of the complementarity of geometric and electronic features between two different molecular entities. The importance of different electrostatic contributions allows us to make a theoretical prediction of molecular properties, from calculations of electrostatic potential and molecular polarization. This work is based on the study of the use of maps of electrostatic potential and polarization potential as molecular descriptors. First there is a study of the effect of the basis set and the methodology used in the calculation of first and second order electrical properties. The analysis was carried out with the molecules of hydrogen cyanide, formaldehyde and urea. The base sets used are the standard double-zeta, to which were added polarization and diffuse functions. In particular, we have used the double zeta basis 6-31G (d), double zeta basis augmented with one or two sets of polarization functions: 6-31G (d, p), 6-31G (2d, 2p) and also used the base 6-311G + + (2d, 2p), which includes diffuse functions. The different levels of calculation methodologies used Hartree-Fock, Moller-Plesset second and fourth order and the theory of density functional (DFT): SCF, MP2, MP4, BLYP B3LYP i. We analyze the effect of different sets based on the contribution of polarization to the interaction energy, calculated for each system of first order properties, such as dipolar moments and quadrupolar moments, and second order properties such as polarizability and molecular hyperpolarizability. Further, the effect of the basis and method on the calculation of the electrostatic potential and molecular polarization potentials is evaluated. For this purpose we performed a comparative study of the maps calculated with different basis sets and methodologies, in particular a study of the spatial distribution and correlation analysis between the different data bases and methodologies was done. An analysis of molecular polarization maps by calculating the differences in polarization and relative standard deviation allows for a comparative study of different methodologies and bases used. Specifically, a comparative analysis between different methods of calculating the base 6-311G + + (2d, 2p), calculated by reference to the MP4 was done. We use maps of electrostatic, polarization and interaction potentials for the analysis of molecular recognition features of a set of bioactive compounds, to discuss the importance of the contribution of polarization. For this reason, we choose to study a set of molecules with high polarizability, specifically, two families of compounds with heavy atoms of chlorine and defined toxic activity, which are part of groups of dioxins and furans. A study of the polarization maps as indicators of biological activity of dioxins and furans, based on the best methodology is done. The inclusion of polarization as a molecular descriptor for predicting biological activity of dioxins and furans was studied from the calculation of electrostatic potentials and polarization, an analysis of the potential maps, and defining the main areas of interaction molecular electrostatic and polarization interaction from calculations of principal components (PCA) and the prediction of biological activity based on a study by calculation of partial least squares (PLS). Polarització Descriptor molecular Toxicitat PCA PLS DFT 54
77	Mining for Lung Cancer Biomarkers in Plasma Metabolomics Data / Sökande efter Biomarkörer för Lungcancer genom Analys av Metabolitdata Johnsson, Anna January 2010 (has links) Lung cancer is the cancer form that has the highest mortality worldwide and inaddition the survival of lung cancer is very low. Only 15% of the patients are alivefive years from set diagnosis. More research is needed to understand the biologyof lung cancer and thus make it possible to discover the disease at an early stage.Early diagnosis leads to an increased chance of survival. In this thesis 179 lungcancer- and 116 control samples of blood serum were analyzed for identificationof metabolomic biomarkers. The control samples were derived from patients withbenign lung diseases.Data was gained from GC/TOF-MS analysis and analyzed with the help ofthe multivariate analysis methods PCA and OPLS/OPLS-DA. In this thesis it isinvestigated how to pre-treat and analyze the data in the best way in order todiscover biomarkers. One part of the aim was to give directions for how to selectsamples from a biobank for further biological validation of suspected biomarkers.Models for different stages of lung cancer versus control samples were computedand validated. The most influencing metabolites in the models were selected andconfoundings with other clinical characteristics like gender and hemoglobin levelswere studied. 13 lung cancer biomakers were identified and validated by raw dataand new OPLS models based solely upon the biomarkers.In summary the identified biomarkers are able to separate fairly good betweencontrol samples and late lung cancer, but are poor for separation of early lungcancer from control samples. The recommendation is to select controls and latelung cancer samples from the biobank for further confirmation of the biomarkers.NyckelordLung cancer is the cancer form that has the highest mortality worldwide and inaddition the survival of lung cancer is very low. Only 15% of the patients are alivefive years from set diagnosis. More research is needed to understand the biologyof lung cancer and thus make it possible to discover the disease at an early stage.Early diagnosis leads to an increased chance of survival. In this thesis 179 lungcancer- and 116 control samples of blood serum were analyzed for identificationof metabolomic biomarkers. The control samples were derived from patients withbenign lung diseases.Data was gained from GC/TOF-MS analysis and analyzed with the help ofthe multivariate analysis methods PCA and OPLS/OPLS-DA. In this thesis it isinvestigated how to pre-treat and analyze the data in the best way in order todiscover biomarkers. One part of the aim was to give directions for how to selectsamples from a biobank for further biological validation of suspected biomarkers.Models for different stages of lung cancer versus control samples were computedand validated. The most influencing metabolites in the models were selected andconfoundings with other clinical characteristics like gender and hemoglobin levelswere studied. 13 lung cancer biomakers were identified and validated by raw dataand new OPLS models based solely upon the biomarkers.In summary the identified biomarkers are able to separate fairly good betweencontrol samples and late lung cancer, but are poor for separation of early lungcancer from control samples. The recommendation is to select controls and latelung cancer samples from the biobank for further confirmation of the biomarkers.Nyckelord metabolomics biomarkers multivariate data analysis PCA OPLS OPLS-DA Bioinformatics Bioinformatik
78	Dry and wet atmospheric deposition of polycyclic aromatic hydrocarbons at a Kaohsiung coastal site. Chen, Kuan-Wei 26 December 2011 (has links) Polycyclic aromatic hydrocarbons (PAHs) are one of major classes of organic pollutants. As semi-volatile organic compounds, PAHs can be transported in the atmosphere and scavenged according to various processes (dry and wet deposition). Atmospheric deposition is an important pathway for the transfer of pollutants from atmosphere to the terrestrial and water surfaces. The objective of this research is to quantify the dry and wet deposition of the atmospheric PAHs in the Kaohsiung coastal area. Principal component analysis (PCA) and hierarchical cluster analysis (HCA) were also performed with diagnostic ratios to determine the potential sources of PAHs. The mean dry and wet deposition fluxes of atmospheric total suspended particles (TSP) during the study period (January-December 2010) were estimated to be 44.3 (6.60-384) and 211 (56.1-738) mg/m2d, respectively. The annual mean total PAH fluxes in dry and wet deposition were 1500 (749-3760) and 8470 (2280-46000) ng/m2d, respectively. Both concentrations and dry deposition fluxes of TSP were much higher during dust storm. During Ghost Month, however, they were comparable with other sampling events. By comparing with literatures, the total PAH concentrations of TSP were relatively low during Ghost Month, suggesting that wind direction and precipitation might be plausible. The ratio of PM2.5/PM10 had a relatively low value during dust storm, indicating that coarse particle might be predominant. In addition, during dust storm, both TSP dry deposition velocity and total PAH dry deposition velocity were higher than other sampling events. Our findings in this study showed that previous attempts in literature to estimate total PAH dry deposition fluxes by using TSP dry deposition velocity and PAH concentrations could lead to overestimate fluxes in the field. TSP dry deposition fluxes were positively correlated with atmospheric total TSP concentrations and TSP dry deposition velocity, but were correlated negatively with intensity of precipitation. In addition, TSP dry deposition velocity showed a positive correlation with TSP concentrations. Total PAH dry deposition fluxes were correlated positively with atmospheric total particulate concentrations and total PAH dry deposition velocity, but negatively with intensity of precipitation and temperature. However, TSP and total PAH fluxes in wet deposition were both correlated positively with intensity of precipitation. Diagnostic ratios showed that diesel exhaust was the main source of combustion-derived PAHs in the study. HCA and PCA analysis indicated that emissions from the ships and vehicles, and fuel used were the main sources of combustion-derived PAHs, while during special events, such as dust storm and Ghost Month, suggesting a different source of PAHs. dry deposition PCA wet deposition HCA polycyclic aromatic hydrocarbons
79	Aptamers as cross-reactive receptors : using binding patterns to discriminate biomolecules Stewart, Sara, 1980- 12 August 2015 (has links) Exploration into the use of aptamers as cross-reactive receptors was the focus of this work. Cross-reactivity is of interest for developing assays to identify complex targets and solutions. By exploiting the simple chemistries of aptamers, we hope to introduce a new class of receptors to the science of molecular discrimination. This manuscript first addresses the use designed aptamers for the identification of variants of HIV-1 reverse transcriptase. In this research aptamers were immobilized on a platform and were used to discriminate four variants of HIV-1 reverse transcriptase. It was found that not only could the array discriminate HIV-1 reverse transcriptase variants for which aptamers were designed, it would also discriminate variants for which no aptamers exist. A panel of aptamers was used to discriminate four separate cell lines, which were chosen as examples of complex targets. This aptamer panel was used to further explore the use of aptamers as cross-reactive sensors. Forty-six aptamers were selected from the literature that were designed to be specific to cells or molecules expected to be in the surface of cells. This panel showed differential binding patterns to each of the cell types, displaying cross-reactive behavior. During the course of this research, we also developed a novel ratiometric method of using aptamer count derived from next-generation sequencing as a method for discrimination. This is in lieu of the more commonly used fluorescent signals. Finally the use of multiple signals for pattern recognition routines was further explored by running various models using artificial data. Various situations were applied to replicate different possible situation which might arise when working with macromolecular interactions. The purpose of this was to advance the communities understanding and ability to interpret results from the pattern recognition methods of PCA and LDA. / text Receptor Aptamer PCA DA Pattern recognition RNA Cross-reactive
80	Functional Chemometrics: Automated Spectral Smoothing with Spatially Adaptive Splines Fernandes, Philip Manuel 02 October 2012 (has links) Functional data analysis (FDA) is a demonstrably effective, practical, and powerful method of data analysis, yet it remains virtually unheard of outside of academic circles and has almost no exposure to industry. FDA adds to the milieu of statistical methods by treating functions of one or more independent variables as data objects, analogous to the way in which discrete points are the data objects we are familiar with in conventional statistics. The first step in functional analysis is to “functionalize” the data, or convert discrete points into a system represented most times by continuous functions. Choosing the type of functions to use is data-dependent and often straightforward – for example, Fourier series lend themselves well to periodic systems, while splines offer great flexibility in approximating more irregular trends, such as chemical spectra. This work explores the question of how B-splines can be rapidly and reliably used to denoised infrared chemical spectra, a difficult problem not only because of the many parameters involved in generating a spline fit, but also due to the disparate nature of spectra in terms of shape and noise intensity. Automated selection of spline parameters is required to support high-throughput analysis, and the heteroscedastic nature of such spectra presents challenges for existing techniques. The heuristic knot placement algorithm of Li et al. (2005) for 1D object contours is extended to spectral fitting by optimizing the denoising step for a range of spectral types and signal/noise ratios, using the following criteria: robustness to types of spectra and noise conditions, parsimony of knots, low computational demand, and ease of implementation in high-throughput settings. Pareto-optimal filter configurations are determined using simulated data from factorial experimental designs. The improved heuristic algorithm uses wavelet transforms and provides improved performance in robustness, parsimony of knots and the quality of functional regression models used to correlate real spectral data with chemical composition. In practical applications, functional principal component regression models yielded similar or significantly improved results when compared with their discrete partial least squares counterparts. / Thesis (Master, Chemical Engineering) -- Queen's University, 2012-10-01 20:18:31.119

Search results