Global ETD Search

61	Přístupy k shlukování funkčních dat / Approaches to Functional Data Clustering Pešout, Pavel January 2007 (has links) Classification is a very common task in information processing and important problem in many sectors of science and industry. In the case of data measured as a function of a dependent variable such as time, the most used algorithms may not pattern each of the individual shapes properly, because they are interested only in the choiced measurements. For the reason, the presented paper focuses on the specific techniques that directly address the curve clustering problem and classifying new individuals. The main goal of this work is to develop alternative methodologies through the extension to various statistical approaches, consolidate already established algorithms, expose their modified forms fitted to demands of clustering issue and compare some efficient curve clustering methods thanks to reported extensive simulated data experiments. Last but not least is made, for the sake of executed experiments, comprehensive confrontation of effectual utility. Proposed clustering algorithms are based on two principles. Firstly, it is presumed that the set of trajectories may be probabilistic modelled as sequences of points generated from a finite mixture model consisting of regression components and hence the density-based clustering methods using the Maximum Likehood Estimation are investigated to recognize the most homogenous partitioning. Attention is paid to both the Maximum Likehood Approach, which assumes the cluster memberships to be some of the model parameters, and the probabilistic model with the iterative Expectation-Maximization algorithm, that assumes them to be random variables. To deal with the hidden data problem both Gaussian and less conventional gamma mixtures are comprehended with arranging for use in two dimensions. To cope with data with high variability within each subpopulation it is introduced two-level random effects regression mixture with the ability to let an individual vary from the template for its group. Secondly, it is taken advantage of well known K-Means algorithm applied to the estimated regression coefficients, though. The task of the optimal data fitting is devoted, because K-Means is not invariant to linear transformations. In order to overcome this problem it is suggested integrating clustering issue with the Markov Chain Monte Carlo approaches. What is more, this paper is concerned in functional discriminant analysis including linear and quadratic scores and their modified probabilistic forms by using random mixtures. Alike in K-Means it is shown how to apply Fisher's method of canonical scores to the regression coefficients. Experiments of simulated datasets are made that demonstrate the performance of all mentioned methods and enable to choose those with the most result and time efficiency. Considerable boon is the facture of new advisable application advances. Implementation is processed in Mathematica 4.0. Finally, the possibilities offered by the development of curve clustering algorithms in vast research areas of modern science are examined, like neurology, genome studies, speech and image recognition systems, and future investigation with incorporation with ubiquitous computing is not forbidden. Utility in economy is illustrated with executed application in claims analysis of some life insurance products. The goals of the thesis have been achieved.
62	Prévision non paramétrique de processus à valeurs fonctionnelles : application à la consommation d’électricité / Non parametric forecasting of functional-valued processes : application to the electricity load Cugliari, Jairo 24 October 2011 (has links) Nous traitons dans cette thèse le problème de la prédiction d’un processus stochastique à valeurs fonctionnelles. Nous commençons par étudier le modèle proposé par Antoniadis et al. (2006) dans le cadre d’une application pratique -la demande d’énergie électrique en France- où l’hypothèse de stationnarité semble ne pas se vérifier. L’écart du cadre stationnaire est double: d’une part, le niveau moyen de la série semble changer dans le temps, d’autre part il existe groupes dans les données qui peuvent être vus comme des classes de stationnarité.Nous explorons corrections qui améliorent la performance de prédiction. Les corrections visent à prendre en compte la présence de ces caractéristiques non stationnaires. En particulier, pour traiter l’existence de groupes, nous avons contraint le modèle de prévision à n’utiliser que les données qui appartiennent au même groupe que celui de la dernière observation disponible. Si le regroupement est connu, un simple post-traitement suffit pour obtenir des meilleures performances de prédiction.Si le regroupement en blocs est inconnu, nous proposons de découvrir le regroupement en utilisant des algorithmes d’analyse de classification non supervisée. La dimension infinie des trajectoires, pas nécessairement stationnaires, doit être prise en compte par l’algorithme. Nous proposons deux stratégies pour ce faire, toutes les deux basées sur les transformées en ondelettes. La première se base dans l’extraction d’attributs associés à la transformée en ondelettes discrète. L’extraction est suivie par une sélection des caractéristiques le plus significatives pour l’algorithme de classification. La seconde stratégie classifie directement les trajectoires à l’aide d’une mesure de dissimilarité sur les spectres en ondelettes. La troisième partie de la thèse est consacrée à explorer un modèle de prédiction alternatif qui intègre de l’information exogène. A cet effet, nous utilisons le cadre des processus Autorégressifs Hilbertiens. Nous proposons une nouvelle classe de processus que nous appelons processus Conditionnels Autorégressifs Hilbertiens (CARH). Nous développons l’équivalent des estimateurs par projection et par résolvant pour prédire de tels processus. / This thesis addresses the problem of predicting a functional valued stochastic process. We first explore the model proposed by Antoniadis et al. (2006) in the context of a practical application -the french electrical power demand- where the hypothesis of stationarity may fail. The departure from stationarity is twofold: an evolving mean level and the existence of groupsthat may be seen as classes of stationarity.We explore some corrections that enhance the prediction performance. The corrections aim to take into account the presence of these nonstationary features. In particular, to handle the existence of groups, we constraint the model to use only the data that belongs to the same group of the last available data. If one knows the grouping, a simple post-treatment suffices to obtain better prediction performances.If the grouping is unknown, we propose it from data using clustering analysis. The infinite dimension of the not necessarily stationary trajectories have to be taken into account by the clustering algorithm. We propose two strategies for this, both based on wavelet transforms. The first one uses a feature extraction approach through the Discrete Wavelet Transform combined with a feature selection algorithm to select the significant features to be used in a classical clustering algorithm. The second approach clusters directly the functions by means of a dissimilarity measure of the Continuous Wavelet spectra.The third part of thesis is dedicated to explore an alternative prediction model that incorporates exogenous information. For this purpose we use the framework given by the Autoregressive Hilbertian processes. We propose a new class of processes that we call Conditional Autoregressive Hilbertian (carh) and develop the equivalent of projection and resolvent classes of estimators to predict such processes. Processus autorégressifs hilbertiens Données fonctionnelles Ondelettes Prévision non paramétrique Consommation d’électricité Autoregressive hilbertian process Functional data Wavelets Nonparametric forecasting Electricity consumption
63	Um estudo de estresse através dos níveis de cortisol em crianças / A study of stress through cortisol levels in children Mendes, Karine Zanuto 26 May 2017 (has links) O nível de cortisol é considerado uma forma de medir o estresse de pessoas. Um estudo foi realizado a fim de verificar se crianças que trabalhavam nas ruas durante o dia tem estresse mais alto do que crianças que não trabalhavam. O nível de cortisol de uma pessoa pode ser considerado uma função crescente até atingir um máximo e depois decrescente (função quasicôncava). O cortisol das crianças foram coletados 4 vezes ao dia,sendo considerado dois grupos de crianças: aquelas que trabalham na rua e aquelas que ficavam em casa. Para a análise dos dados, foi considerada uma metanálise de um modelo de dados funcionais sob enfoque Bayesiano. Cada individuo é analisado por um modelo de dados funcionais e a metánalise foi usada para termos uma inferência para cada grupo. A geração de uma amostra da distribuição a posteriori foi obtida pelo o método de Gibbs com Metrópolis-Hasting. Na comparação das curvas calculamos a probabilidade a posteriori ponto-a-ponto da função do cortisol de um grupo ser maior do que a do outro. / The level of cortisol is considered as a measure of peoples stress. We perform an statistical analysis of the data from a study conducted to evaluate if children that work on the streets during the day have higher stress than children who does not work. The cortisol level of a person can be considered as an increasing function until reaching a maximum level and then decreasing to almost zero (quasi-concave function). Childrens cortisol were collected 4 times in one day, where two groups of children were considered: those who work in the street and those who stay at home. To analyse the data we considered a meta-analysis of a functional data model under Bayesian approach. Each individual is analysed by a functional data model, and then, a meta-analysis was used to have inference for each group. We used the Gibbs Metropolis-Hastings method to sample from the posteriori distribution. Also, we calculated the pointwise posterior probability of the cortisol function of one group being greater than the cortisol function of other group to compare the groups. Baysesian statistics Cortisol Cortisol Dados funcionais Eixo HPA Estatística Bayesiana Functional data HPA axis mata-analysis. Metanálise.
64	Análise de dados funcionais aplicada ao estudo de repetitividade e reprodutividade : ANOVA das distâncias Pedott, Alexandre Homsi January 2010 (has links) Esta dissertação apresenta um método adaptado do estudo de repetitividade e reprodutibilidade para analisar a capacidade e o desempenho de sistemas de medição, no contexto da análise de dados funcionais. Dado funcional é a variável de resposta dada por uma coleção de dados que formam um perfil ou uma curva. O método adaptado contribui para o avanço do estado da arte sobre a análise de sistemas de medição. O método proposto é uma alternativa ao uso de métodos tradicionais de análise, que usados de forma equivocada, podem deteriorar a qualidade dos produtos monitorados através de variáveis de resposta funcionais. O método proposto envolve a adaptação de testes de hipótese e da análise de variância de um e dois fatores usados em comparações de populações, na avaliação de sistemas de medições. A proposta de adaptação foi baseada na utilização de distâncias entre curvas. Foi usada a Distância de Hausdorff como uma medida de proximidade entre as curvas. A adaptação proposta à análise de variância foi composta de três abordagens. Os métodos adaptados foram aplicados a um estudo simulado de repetitividade e reprodutibilidade. O estudo foi estruturado para analisar cenários em que o sistema de medição foi aprovado e reprovado. O método proposto foi denominado de ANOVA das Distâncias. / This work presents a method to analyze a measurement system's performance in a functional data analysis context, based on repeatability and reproducibility studies. Functional data are a collection of data points organized as a profile or curve. The proposed method contributes to the state of the art on measurement system analysis. The method is an alternative to traditional methods often used mistakenly, leading to deterioration in the quality of products monitored through functional responses. In the proposed method we adapt hypothesis tests and one-way and two-way ANOVA to be used in measurement system analysis. The method is grounded on the use of distances between curves. For that matter the Hausdorff distance was chosen as a measure of proximity between curves. Three ANOVA approaches were proposed and applied in a simulated repeatability and reproducibility study. The study was structured to analyze scenarios in which the measurement system was approved or rejected. The proposed method was named ANOVA of the distances. Controle de qualidade Análise de dados funcionais Functional data analysis Measurement systems R & R studies ANOVA Functional ANOVA
65	Estimação de modelos geoestatísticos com dados funcionais usando ondaletas / Estimation of Geostatistical Models with Functional Data using Wavelets Sassi, Gilberto Pereira 03 March 2016 (has links) Com o recente avanço do poder computacional, a amostragem de curvas indexadas espacialmente tem crescido principalmente em dados ecológicos, atmosféricos e ambientais, o que conduziu a adaptação de métodos geoestatísticos para o contexto de Análise de Dados Funcionais. O objetivo deste trabalho é estudar métodos de krigagem para Dados Funcionais, adaptando os métodos de interpolação espacial em Geoestatística. Mais precisamente, em um conjunto de dados funcionais pontualmente fracamente estacionário e isotrópico, desejamos estimar uma curva em um ponto não monitorado no espaço buscando estimadores não viciados com erro quadrático médio mínimo. Apresentamos três abordagens para aproximar uma curva em sítio não monitorado, demonstramos resultados que simplificam o problema de otimização postulado pela busca de estimadores ótimos não viciados, implementamos os modelos em MATLAB usando ondaletas, que é mais adequada para captar comportamentos localizados, e comparamos os três modelos através de estudos de simulação. Ilustramos os métodos através de dois conjuntos de dados reais: um conjunto de dados de temperatura média diária das províncias marítimas do Canadá (New Brunswick, Nova Scotia e Prince Edward Island) coletados em 82 estações no ano 2000 e um conjunto de dados da CETESB (Companhia Ambiental do Estado de São Paulo) referentes ao índice de qualidade de ar MP10 em 22 estações meteorológicas na região metropolitana da cidade de São Paulo coletados no ano de 2014. / The advance of the computational power in last decades has been generating a considerable increase in datasets of spatially indexed curves, mainly in ecological, atmospheric and environmental data, what have leaded to adjustments of geostatistcs for the context of Functional Data Analysis. The goal of this work is to adapt the kriging methods from geostatistcs analysis to the framework of Functional Data Analysis. More precisely, we shall interpolate a curve in an unvisited spot searching for an unbiased estimator with minimum mean square error for a pointwise weakly stationary and isotropic functional dataset. We introduce three different approaches to estimate a curve in an unvisited spot, we demonstrate some results simplifying the optimization problem postulated by the optimality from these estimators, we implement the three models in MATLAB using wavelets and we compare them by simulation. We illustrate the ideas using two dataset: a real climatic dataset from Canadian maritime provinces (New Brunswick, Nova Scotia and Prince Edward Island) sampled at year 2000 in 82 weather station consisting of daily mean temperature and data from CETESB (environmental agency from the state of São Paulo, Brazil) sampled at 22 weather station in the metropolitan region of São Paulo city at year 2014 consisting of the air quality index PM10. Análise de dados funcionais Estatística espacial Functional Data Analysis Geoestatística Geostatistcs Krigagem Kriging MATLAB MATLAB Ondaletas Spatial Statistics Wavelets
66	Analytics for Novel Consumer Insights (A Three Essay Dissertation) Shrivastava, Utkarsh 03 July 2018 (has links) Both literature and practice have investigated how the vast amount of ever increasing customer information can inform marketing strategy and decision making. However, the customer data is often susceptible to modeling bias and misleading findings due to various factors including sample selection and unobservable variables. The available analytics toolkit has continued to develop but in the age of nearly perfect information, the customer decision making has also evolved. The dissertation addresses some of the challenges in deriving valid and useful consumer insights from customer data in the digital age. The first study addresses the limitations of traditional customer purchase measures to account of dynamic temporal variations in the customer purchase history. The study proposes a new approach for representation and summarization of customer purchases to improve promotion forecasts. The method also accounts for sample selection bias that arises due to biased selection of customers for the promotion. The second study investigates the impact of increasing internet penetration on the consumer choices and their response to marketing actions. Using the case study of physician’s drug prescribing, the study identifies how marketers can misallocate resources at the regional level by not accounting for variations in internet penetration. The third paper develops a data driven metric for measuring temporal variations in the brand loyalty. Using a network representation of brand and customer the study also investigates the spillover effects of manufacturer related information shocks on the brand’s loyalty. Direct Promotion Functional Data Analysis Quasi-Experiment Control Function Approach Counterfactual Simulations Databases and Information Systems Marketing
67	Wavelet-based Data Reduction and Mining for Multiple Functional Data Jung, Uk 12 July 2004 (has links) Advance technology such as various types of automatic data acquisitions, management, and networking systems has created a tremendous capability for managers to access valuable production information to improve their operation quality and efficiency. Signal processing and data mining techniques are more popular than ever in many fields including intelligent manufacturing. As data sets increase in size, their exploration, manipulation, and analysis become more complicated and resource consuming. Timely synthesized information such as functional data is needed for product design, process trouble-shooting, quality/efficiency improvement and resource allocation decisions. A major obstacle in those intelligent manufacturing system is that tools for processing a large volume of information coming from numerous stages on manufacturing operations are not available. Thus, the underlying theme of this thesis is to reduce the size of data in a mathematical rigorous framework, and apply existing or new procedures to the reduced-size data for various decision-making purposes. This thesis, first, proposes {it Wavelet-based Random-effect Model} which can generate multiple functional data signals which have wide fluctuations(between-signal variations) in the time domain. The random-effect wavelet atom position in the model has {it locally focused impact} which can be distinguished from other traditional random-effect models in biological field. For the data-size reduction, in order to deal with heterogeneously selected wavelet coefficients for different single curves, this thesis introduces the newly-defined {it Wavelet Vertical Energy} metric of multiple curves and utilizes it for the efficient data reduction method. The newly proposed method in this thesis will select important positions for the whole set of multiple curves by comparison between every vertical energy metrics and a threshold ({it Vertical Energy Threshold; VET}) which will be optimally decided based on an objective function. The objective function balances the reconstruction error against a data reduction ratio. Based on class membership information of each signal obtained, this thesis proposes the {it Vertical Group-Wise Threshold} method to increase the discriminative capability of the reduced-size data so that the reduced data set retains salient differences between classes as much as possible. A real-life example (Tonnage data) shows our proposed method is promising. Wavelet transformation Data mining Functional data Signal processing Wavelets (Mathematics) Automatic data collection systems Data mining Data reduction
68	Statistical computation and inference for functional data analysis Jiang, Huijing 09 November 2010 (has links) My doctoral research dissertation focuses on two aspects of functional data analysis (FDA): FDA under spatial interdependence and FDA for multi-level data. The first part of my thesis focuses on developing modeling and inference procedure for functional data under spatial dependence. The methodology introduced in this part is motivated by a research study on inequities in accessibility to financial services. The first research problem in this part is concerned with a novel model-based method for clustering random time functions which are spatially interdependent. A cluster consists of time functions which are similar in shape. The time functions are decomposed into spatial global and time-dependent cluster effects using a semi-parametric model. We also assume that the clustering membership is a realization from a Markov random field. Under these model assumptions, we borrow information across curves from nearby locations resulting in enhanced estimation accuracy of the cluster effects and of the cluster membership. In a simulation study, we assess the estimation accuracy of our clustering algorithm under a series of settings: small number of time points, high noise level and varying dependence structures. Over all simulation settings, the spatial-functional clustering method outperforms existing model-based clustering methods. In the case study presented in this project, we focus on estimates and classifies service accessibility patterns varying over a large geographic area (California and Georgia) and over a period of 15 years. The focus of this study is on financial services but it generally applies to any other service operation. The second research project of this part studies an association analysis of space-time varying processes, which is rigorous, computational feasible and implementable with standard software. We introduce general measures to model different aspects of the temporal and spatial association between processes varying in space and time. Using a nonparametric spatiotemporal model, we show that the proposed association estimators are asymptotically unbiased and consistent. We complement the point association estimates with simultaneous confidence bands to assess the uncertainty in the point estimates. In a simulation study, we evaluate the accuracy of the association estimates with respect to the sample size as well as the coverage of the confidence bands. In the case study in this project, we investigate the association between service accessibility and income level. The primary objective of this association analysis is to assess whether there are significant changes in the income-driven equity of financial service accessibility over time and to identify potential under-served markets. The second part of the thesis discusses novel statistical methodology for analyzing multilevel functional data including a clustering method based on a functional ANOVA model and a spatio-temporal model for functional data with a nested hierarchical structure. In this part, I introduce and compare a series of clustering approaches for multilevel functional data. For brevity, I present the clustering methods for two-level data: multiple samples of random functions, each sample corresponding to a case and each random function within a sample/case corresponding to a measurement type. A cluster consists of cases which have similar within-case means (level-1 clustering) or similar between-case means (level-2 clustering). Our primary focus is to evaluate a model-based clustering to more straightforward hard clustering methods. The clustering model is based on a multilevel functional principal component analysis. In a simulation study, we assess the estimation accuracy of our clustering algorithm under a series of settings: small vs. moderate number of time points, high noise level and small number of measurement types. We demonstrate the applicability of the clustering analysis to a real data set consisting of time-varying sales for multiple products sold by a large retailer in the U.S. My ongoing research work in multilevel functional data analysis is developing a statistical model for estimating temporal and spatial associations of a series of time-varying variables with an intrinsic nested hierarchical structure. This work has a great potential in many real applications where the data are areal data collected from different data sources and over geographic regions of different spatial resolution. Service distribution equity Multi-level data Model-based clustering Spatio-temporal Functional data analysis Multilevel models (Statistics) Markov random fields
69	Model Selection via Minimum Description Length Li, Li 10 January 2012 (has links) The minimum description length (MDL) principle originated from data compression literature and has been considered for deriving statistical model selection procedures. Most existing methods utilizing the MDL principle focus on models consisting of independent data, particularly in the context of linear regression. The data considered in this thesis are in the form of repeated measurements, and the exploration of MDL principle begins with classical linear mixed-effects models. We distinct two kinds of research focuses: one concerns the population parameters and the other concerns the cluster/subject parameters. When the research interest is on the population level, we propose a class of MDL procedures which incorporate the dependence structure within individual or cluster with data-adaptive penalties and enjoy the advantages of Bayesian information criteria. When the number of covariates is large, the penalty term is adjusted by data-adaptive structure to diminish the under selection issue in BIC and try to mimic the behaviour of AIC. Theoretical justifications are provided from both data compression and statistical perspectives. Extensions to categorical response modelled by generalized estimating equations and functional data modelled by functional principle components are illustrated. When the interest is on the cluster level, we use group LASSO to set up a class of candidate models. Then we derive a MDL criterion for this LASSO technique in a group manner to selection the final model via the tuning parameters. Extensive numerical experiments are conducted to demonstrate the usefulness of the proposed MDL procedures on both population level and cluster level. Minimum description length Model selection AIC BIC Data compression Linear mixed effects Generalized estimating equation Functional data 0463
70	Model Selection via Minimum Description Length Li, Li 10 January 2012 (has links) The minimum description length (MDL) principle originated from data compression literature and has been considered for deriving statistical model selection procedures. Most existing methods utilizing the MDL principle focus on models consisting of independent data, particularly in the context of linear regression. The data considered in this thesis are in the form of repeated measurements, and the exploration of MDL principle begins with classical linear mixed-effects models. We distinct two kinds of research focuses: one concerns the population parameters and the other concerns the cluster/subject parameters. When the research interest is on the population level, we propose a class of MDL procedures which incorporate the dependence structure within individual or cluster with data-adaptive penalties and enjoy the advantages of Bayesian information criteria. When the number of covariates is large, the penalty term is adjusted by data-adaptive structure to diminish the under selection issue in BIC and try to mimic the behaviour of AIC. Theoretical justifications are provided from both data compression and statistical perspectives. Extensions to categorical response modelled by generalized estimating equations and functional data modelled by functional principle components are illustrated. When the interest is on the cluster level, we use group LASSO to set up a class of candidate models. Then we derive a MDL criterion for this LASSO technique in a group manner to selection the final model via the tuning parameters. Extensive numerical experiments are conducted to demonstrate the usefulness of the proposed MDL procedures on both population level and cluster level. Minimum description length Model selection AIC BIC Data compression Linear mixed effects Generalized estimating equation Functional data 0463

Search results