Global ETD Search

71	Análise de dados funcionais aplicada ao estudo de repetitividade e reprodutividade : ANOVA das distâncias Pedott, Alexandre Homsi January 2010 (has links) Esta dissertação apresenta um método adaptado do estudo de repetitividade e reprodutibilidade para analisar a capacidade e o desempenho de sistemas de medição, no contexto da análise de dados funcionais. Dado funcional é a variável de resposta dada por uma coleção de dados que formam um perfil ou uma curva. O método adaptado contribui para o avanço do estado da arte sobre a análise de sistemas de medição. O método proposto é uma alternativa ao uso de métodos tradicionais de análise, que usados de forma equivocada, podem deteriorar a qualidade dos produtos monitorados através de variáveis de resposta funcionais. O método proposto envolve a adaptação de testes de hipótese e da análise de variância de um e dois fatores usados em comparações de populações, na avaliação de sistemas de medições. A proposta de adaptação foi baseada na utilização de distâncias entre curvas. Foi usada a Distância de Hausdorff como uma medida de proximidade entre as curvas. A adaptação proposta à análise de variância foi composta de três abordagens. Os métodos adaptados foram aplicados a um estudo simulado de repetitividade e reprodutibilidade. O estudo foi estruturado para analisar cenários em que o sistema de medição foi aprovado e reprovado. O método proposto foi denominado de ANOVA das Distâncias. / This work presents a method to analyze a measurement system's performance in a functional data analysis context, based on repeatability and reproducibility studies. Functional data are a collection of data points organized as a profile or curve. The proposed method contributes to the state of the art on measurement system analysis. The method is an alternative to traditional methods often used mistakenly, leading to deterioration in the quality of products monitored through functional responses. In the proposed method we adapt hypothesis tests and one-way and two-way ANOVA to be used in measurement system analysis. The method is grounded on the use of distances between curves. For that matter the Hausdorff distance was chosen as a measure of proximity between curves. Three ANOVA approaches were proposed and applied in a simulated repeatability and reproducibility study. The study was structured to analyze scenarios in which the measurement system was approved or rejected. The proposed method was named ANOVA of the distances. Controle de qualidade Análise de dados funcionais Functional data analysis Measurement systems R & R studies ANOVA Functional ANOVA
72	Statistická hloubka funkcionálních dat / Statistical Depth for Functional Data Nagy, Stanislav January 2016 (has links) Statistical data depth is a nonparametric tool applicable to multivariate datasets in an attempt to generalize quantiles to complex data such as random vectors, random functions, or distributions on manifolds and graphs. The main idea is, for a general multivariate space M, to assign to a point x ∈ M and a probability distribution P on M a number D(x; P) ∈ [0, 1] characterizing how "centrally located" x is with respect to P. A point maximizing D(·; P) is then a generalization of the median to M-valued data, and the locus of points whose depth value is greater than a certain threshold constitutes the inner depth-quantile region corresponding to P. In this work, we focus on data depth designed for infinite-dimensional spaces M and functional data. Initially, a review of depth functionals available in the literature is given. The emphasis of the exposition is put on the unification of these diverse concepts from the theoretical point of view. It is shown that most of the established depths fall into the general framework of projection-driven functionals of either integrated, or infimal type. Based on the proposed methodology, characteristics and theoretical properties of all these depths can be evaluated simultaneously. The first part of the work is devoted to the investigation of these theoretical properties,...
73	Functional linear regression models : application to high-throughput plant phenotyping functional data / Modèles statistiques de régression linéaire fonctionnelle : application sur des données fonctionnelles issues du phénotypage végétal haut débit Manrique, Tito 19 December 2016 (has links) L'Analyse des Données Fonctionnelles (ADF) est une branche de la statistique qui est de plus en plus utilisée dans de nombreux domaines scientifiques appliqués tels que l'expérimentation biologique, la finance, la physique, etc. Une raison à cela est l'utilisation des nouvelles technologies de collecte de données qui augmentent le nombre d'observations dans un intervalle de temps.Les jeux de données fonctionnelles sont des échantillons de réalisations de fonctions aléatoires qui sont des fonctions mesurables définies sur un espace de probabilité à valeurs dans un espace fonctionnel de dimension infinie.Parmi les nombreuses questions étudiées par l'ADF, la régression linéaire fonctionnelle est l'une des plus étudiées, aussi bien dans les applications que dans le développement méthodologique.L'objectif de cette thèse est l'étude de modèles de régression linéaire fonctionnels lorsque la covariable X et la réponse Y sont des fonctions aléatoires et les deux dépendent du temps. En particulier, nous abordons la question de l'influence de l'histoire d'une fonction aléatoire X sur la valeur actuelle d'une autre fonction aléatoire Y à un instant donné t.Pour ce faire, nous sommes surtout intéressés par trois modèles: le modèle fonctionnel de concurrence (Functional Concurrent Model: FCCM), le modèle fonctionnel de convolution (Functional Convolution Model: FCVM) et le modèle linéaire fonctionnel historique. En particulier pour le FCVM et FCCM nous avons proposé des estimateurs qui sont consistants, robustes et plus rapides à calculer par rapport à d'autres estimateurs déjà proposés dans la littérature.Notre méthode d'estimation dans le FCCM étend la méthode de régression Ridge développée dans le cas linéaire classique au cadre de données fonctionnelles. Nous avons montré la convergence en probabilité de cet estimateur, obtenu une vitesse de convergence et développé une méthode de choix optimal du paramètre de régularisation.Le FCVM permet d'étudier l'influence de l'histoire de X sur Y d'une manière simple par la convolution. Dans ce cas, nous utilisons la transformée de Fourier continue pour définir un estimateur du coefficient fonctionnel. Cet opérateur transforme le modèle de convolution en un FCCM associé dans le domaine des fréquences. La consistance et la vitesse de convergence de l'estimateur sont obtenues à partir du FCCM.Le FCVM peut être généralisé au modèle linéaire fonctionnel historique, qui est lui-même un cas particulier du modèle linéaire entièrement fonctionnel. Grâce à cela, nous avons utilisé l'estimateur de Karhunen-Loève du noyau historique. La question connexe de l'estimation de l'opérateur de covariance du bruit dans le modèle linéaire entièrement fonctionnel est également traitée. Finalement nous utilisons tous les modèles mentionnés ci-dessus pour étudier l'interaction entre le déficit de pression de vapeur (Vapour Pressure Deficit: VPD) et vitesse d'élongation foliaire (Leaf Elongation Rate: LER) courbes. Ce type de données est obtenu avec phénotypage végétal haut débit. L'étude est bien adaptée aux méthodes de l'ADF. / Functional data analysis (FDA) is a statistical branch that is increasingly being used in many applied scientific fields such as biological experimentation, finance, physics, etc. A reason for this is the use of new data collection technologies that increase the number of observations during a time interval.Functional datasets are realization samples of some random functions which are measurable functions defined on some probability space with values in an infinite dimensional functional space.There are many questions that FDA studies, among which functional linear regression is one of the most studied, both in applications and in methodological development.The objective of this thesis is the study of functional linear regression models when both the covariate X and the response Y are random functions and both of them are time-dependent. In particular we want to address the question of how the history of a random function X influences the current value of another random function Y at any given time t.In order to do this we are mainly interested in three models: the functional concurrent model (FCCM), the functional convolution model (FCVM) and the historical functional linear model. In particular for the FCVM and FCCM we have proposed estimators which are consistent, robust and which are faster to compute compared to others already proposed in the literature.Our estimation method in the FCCM extends the Ridge Regression method developed in the classical linear case to the functional data framework. We prove the probability convergence of this estimator, obtain a rate of convergence and develop an optimal selection procedure of theregularization parameter.The FCVM allows to study the influence of the history of X on Y in a simple way through the convolution. In this case we use the continuous Fourier transform operator to define an estimator of the functional coefficient. This operator transforms the convolution model into a FCCM associated in the frequency domain. The consistency and rate of convergence of the estimator are derived from the FCCM.The FCVM can be generalized to the historical functional linear model, which is itself a particular case of the fully functional linear model. Thanks to this we have used the Karhunen–Loève estimator of the historical kernel. The related question about the estimation of the covariance operator of the noise in the fully functional linear model is also treated.Finally we use all the aforementioned models to study the interaction between Vapour Pressure Deficit (VPD) and Leaf Elongation Rate (LER) curves. This kind of data is obtained with high-throughput plant phenotyping platform and is well suited to be studied with FDA methods. Données fonctionnelles Régression linéaire Modèle de convolution Modèle de concurrence Modèle historique Functional data Linear regression Convolution model Concurrent model Historical model
74	Análise de dados funcionais aplicada ao estudo de repetitividade e reprodutividade : ANOVA das distâncias Pedott, Alexandre Homsi January 2010 (has links) Esta dissertação apresenta um método adaptado do estudo de repetitividade e reprodutibilidade para analisar a capacidade e o desempenho de sistemas de medição, no contexto da análise de dados funcionais. Dado funcional é a variável de resposta dada por uma coleção de dados que formam um perfil ou uma curva. O método adaptado contribui para o avanço do estado da arte sobre a análise de sistemas de medição. O método proposto é uma alternativa ao uso de métodos tradicionais de análise, que usados de forma equivocada, podem deteriorar a qualidade dos produtos monitorados através de variáveis de resposta funcionais. O método proposto envolve a adaptação de testes de hipótese e da análise de variância de um e dois fatores usados em comparações de populações, na avaliação de sistemas de medições. A proposta de adaptação foi baseada na utilização de distâncias entre curvas. Foi usada a Distância de Hausdorff como uma medida de proximidade entre as curvas. A adaptação proposta à análise de variância foi composta de três abordagens. Os métodos adaptados foram aplicados a um estudo simulado de repetitividade e reprodutibilidade. O estudo foi estruturado para analisar cenários em que o sistema de medição foi aprovado e reprovado. O método proposto foi denominado de ANOVA das Distâncias. / This work presents a method to analyze a measurement system's performance in a functional data analysis context, based on repeatability and reproducibility studies. Functional data are a collection of data points organized as a profile or curve. The proposed method contributes to the state of the art on measurement system analysis. The method is an alternative to traditional methods often used mistakenly, leading to deterioration in the quality of products monitored through functional responses. In the proposed method we adapt hypothesis tests and one-way and two-way ANOVA to be used in measurement system analysis. The method is grounded on the use of distances between curves. For that matter the Hausdorff distance was chosen as a measure of proximity between curves. Three ANOVA approaches were proposed and applied in a simulated repeatability and reproducibility study. The study was structured to analyze scenarios in which the measurement system was approved or rejected. The proposed method was named ANOVA of the distances. Controle de qualidade Análise de dados funcionais Functional data analysis Measurement systems R & R studies ANOVA Functional ANOVA
75	Classification bayésienne non supervisée de données fonctionnelles en présence de covariables / Unsupervised Bayesian clustering of functional data in the presence of covariates Juery, Damien 18 December 2014 (has links) Un des objectifs les plus importants en classification non supervisée est d'extraire des groupes de similarité depuis un jeu de données. Avec le développement actuel du phénotypage où les données sont recueillies en temps continu, de plus en plus d'utilisateurs ont besoin d'outils capables de classer des courbes.Le travail présenté dans cette thèse se fonde sur la statistique bayésienne. Plus précisément, nous nous intéressons à la classification bayésienne non supervisée de données fonctionnelles. Les lois a priori bayésiennes non paramétriques permettent la construction de modèles flexibles et robustes.Nous généralisons un modèle de classification (DPM), basé sur le processus de Dirichlet, au cadre fonctionnel. Contrairement aux méthodes actuelles qui utilisent la dimension finie en projetant les courbes dans des bases de fonctions, ou en considérant les courbes aux temps d'observation, la méthode proposée considère les courbes complètes, en dimension infinie. La théorie des espaces de Hilbert à noyau reproduisant (RKHS) nous permet de calculer, en dimension infinie, les densités de probabilité des courbes par rapport à une mesure gaussienne. De la même façon, nous explicitons un calcul de loi a posteriori, sachant les courbes complètes et non seulement les valeurs discrétisées. Nous proposons un algorithme qui généralise l'algorithme "Gibbs sampling with auxiliary parameters" de Neal (2000). L'implémentation numérique requiert le calcul de produits scalaires, qui sont approchés à partir de méthodes numériques. Quelques applications sur données réelles et simulées sont également présentées, puis discutées.En dernier lieu, l'ajout d'une hiérarchie supplémentaire à notre modèle nous permet de pouvoir prendre en compte des covariables fonctionnelles. Nous verrons à cet effet qu'il est possible de définir plusieurs modèles. La méthode algorithmique proposée précédemment est ainsi étendue à chacun de ces nouveaux modèles. Quelques applications sur données simulées sont présentées. / One of the major objectives of unsupervised clustering is to find similarity groups in a dataset. With the current development of phenotyping, in which continuous-time data are collected, more and more users require new efficient tools capable of clustering curves.The work presented in this thesis is based on Bayesian statistics. Specifically, we are interested in unsupervised Bayesian clustering of functional data. Nonparametric Bayesian priors allow the construction of flexible and robust models.We generalize a clustering model (DPM), founded on the Dirichlet process, to the functional framework. Unlike current methods which make use of the finite dimension, either by representing curves as linear combinations of basis functions, or by regarding curves as data points, calculations are hereby carried out on complete curves, in the infinite dimension. The reproducing kernel Hilbert space (RKHS) theory allows us to derive, in the infinite dimension, probability density functions of curves with respect to a gaussian measure. In the same way, we make explicit a posterior distribution, given complete curves and not only data points. We suggest generalizing the algorithm "Gibbs sampling with auxiliary parameters" by Neal (2000). The numerical implementation requires the calculation of inner products, which are approximated from numerical methods. Some case studies on real and simulated data are also presented, then discussed.Finally, the addition of an extra hierarchy in our model allows us to take functional covariates into account. For that purpose, we will show that it is possible to define several models. The previous algorithmic method is therefore extended to each of these models. Some case studies on simulated data are presented. Classification Données fonctionnelles Statistique bayésienne Processus de Dirichlet Mcmc Courbes Clustering Functional data Bayesian statistics Dirichlet process Mcmc Curves
76	Um estudo de estresse através dos níveis de cortisol em crianças / A study of stress through cortisol levels in children Karine Zanuto Mendes 26 May 2017 (has links) O nível de cortisol é considerado uma forma de medir o estresse de pessoas. Um estudo foi realizado a fim de verificar se crianças que trabalhavam nas ruas durante o dia tem estresse mais alto do que crianças que não trabalhavam. O nível de cortisol de uma pessoa pode ser considerado uma função crescente até atingir um máximo e depois decrescente (função quasicôncava). O cortisol das crianças foram coletados 4 vezes ao dia,sendo considerado dois grupos de crianças: aquelas que trabalham na rua e aquelas que ficavam em casa. Para a análise dos dados, foi considerada uma metanálise de um modelo de dados funcionais sob enfoque Bayesiano. Cada individuo é analisado por um modelo de dados funcionais e a metánalise foi usada para termos uma inferência para cada grupo. A geração de uma amostra da distribuição a posteriori foi obtida pelo o método de Gibbs com Metrópolis-Hasting. Na comparação das curvas calculamos a probabilidade a posteriori ponto-a-ponto da função do cortisol de um grupo ser maior do que a do outro. / The level of cortisol is considered as a measure of peoples stress. We perform an statistical analysis of the data from a study conducted to evaluate if children that work on the streets during the day have higher stress than children who does not work. The cortisol level of a person can be considered as an increasing function until reaching a maximum level and then decreasing to almost zero (quasi-concave function). Childrens cortisol were collected 4 times in one day, where two groups of children were considered: those who work in the street and those who stay at home. To analyse the data we considered a meta-analysis of a functional data model under Bayesian approach. Each individual is analysed by a functional data model, and then, a meta-analysis was used to have inference for each group. We used the Gibbs Metropolis-Hastings method to sample from the posteriori distribution. Also, we calculated the pointwise posterior probability of the cortisol function of one group being greater than the cortisol function of other group to compare the groups. Cortisol Dados funcionais Eixo HPA Estatística Bayesiana Metanálise. Baysesian statistics Cortisol Functional data HPA axis mata-analysis.
77	Estimação de modelos geoestatísticos com dados funcionais usando ondaletas / Estimation of Geostatistical Models with Functional Data using Wavelets Gilberto Pereira Sassi 03 March 2016 (has links) Com o recente avanço do poder computacional, a amostragem de curvas indexadas espacialmente tem crescido principalmente em dados ecológicos, atmosféricos e ambientais, o que conduziu a adaptação de métodos geoestatísticos para o contexto de Análise de Dados Funcionais. O objetivo deste trabalho é estudar métodos de krigagem para Dados Funcionais, adaptando os métodos de interpolação espacial em Geoestatística. Mais precisamente, em um conjunto de dados funcionais pontualmente fracamente estacionário e isotrópico, desejamos estimar uma curva em um ponto não monitorado no espaço buscando estimadores não viciados com erro quadrático médio mínimo. Apresentamos três abordagens para aproximar uma curva em sítio não monitorado, demonstramos resultados que simplificam o problema de otimização postulado pela busca de estimadores ótimos não viciados, implementamos os modelos em MATLAB usando ondaletas, que é mais adequada para captar comportamentos localizados, e comparamos os três modelos através de estudos de simulação. Ilustramos os métodos através de dois conjuntos de dados reais: um conjunto de dados de temperatura média diária das províncias marítimas do Canadá (New Brunswick, Nova Scotia e Prince Edward Island) coletados em 82 estações no ano 2000 e um conjunto de dados da CETESB (Companhia Ambiental do Estado de São Paulo) referentes ao índice de qualidade de ar MP10 em 22 estações meteorológicas na região metropolitana da cidade de São Paulo coletados no ano de 2014. / The advance of the computational power in last decades has been generating a considerable increase in datasets of spatially indexed curves, mainly in ecological, atmospheric and environmental data, what have leaded to adjustments of geostatistcs for the context of Functional Data Analysis. The goal of this work is to adapt the kriging methods from geostatistcs analysis to the framework of Functional Data Analysis. More precisely, we shall interpolate a curve in an unvisited spot searching for an unbiased estimator with minimum mean square error for a pointwise weakly stationary and isotropic functional dataset. We introduce three different approaches to estimate a curve in an unvisited spot, we demonstrate some results simplifying the optimization problem postulated by the optimality from these estimators, we implement the three models in MATLAB using wavelets and we compare them by simulation. We illustrate the ideas using two dataset: a real climatic dataset from Canadian maritime provinces (New Brunswick, Nova Scotia and Prince Edward Island) sampled at year 2000 in 82 weather station consisting of daily mean temperature and data from CETESB (environmental agency from the state of São Paulo, Brazil) sampled at 22 weather station in the metropolitan region of São Paulo city at year 2014 consisting of the air quality index PM10. Análise de dados funcionais Estatística espacial Geoestatística Krigagem MATLAB Ondaletas Functional Data Analysis Geostatistcs Kriging MATLAB Spatial Statistics Wavelets
78	Computational Methods for Large Spatio-temporal Datasets and Functional Data Ranking Huang, Huang 16 July 2017 (has links) This thesis focuses on two topics, computational methods for large spatial datasets and functional data ranking. Both are tackling the challenges of big and high-dimensional data. The first topic is motivated by the prohibitive computational burden in fitting Gaussian process models to large and irregularly spaced spatial datasets. Various approximation methods have been introduced to reduce the computational cost, but many rely on unrealistic assumptions about the process and retaining statistical efficiency remains an issue. We propose a new scheme to approximate the maximum likelihood estimator and the kriging predictor when the exact computation is infeasible. The proposed method provides different types of hierarchical low-rank approximations that are both computationally and statistically efficient. We explore the improvement of the approximation theoretically and investigate the performance by simulations. For real applications, we analyze a soil moisture dataset with 2 million measurements with the hierarchical low-rank approximation and apply the proposed fast kriging to fill gaps for satellite images. The second topic is motivated by rank-based outlier detection methods for functional data. Compared to magnitude outliers, it is more challenging to detect shape outliers as they are often masked among samples. We develop a new notion of functional data depth by taking the integration of a univariate depth function. Having a form of the integrated depth, it shares many desirable features. Furthermore, the novel formation leads to a useful decomposition for detecting both shape and magnitude outliers. Our simulation studies show the proposed outlier detection procedure outperforms competitors in various outlier models. We also illustrate our methodology using real datasets of curves, images, and video frames. Finally, we introduce the functional data ranking technique to spatio-temporal statistics for visualizing and assessing covariance properties, such as separability and full symmetry. We formulate test functions as functions of temporal lags for each pair of spatial locations and develop a rank-based testing procedure induced by functional data depth for assessing these properties. The method is illustrated using simulated data from widely used spatio-temporal covariance models, as well as real datasets from weather stations and climate model outputs. Large spatial data set low rank approximation Functional Data Analysis spatio-temporal covariance Statistical efficiency Outlier detection
79	Novel statistical models for ecological momentary assessment studies of sexually transmitted infections He, Fei 18 July 2016 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / The research ideas included in this dissertation are motivated by a large sexually trans mitted infections (STIs) study (IU Phone study), which is also an ecological momentary assessment (EMA) study implemented by Indiana University from 2008 to 2013. EMA, as a group of methods used to collect subjects’ up-to-date behaviors and status, can increase the accuracy of this information by allowing a participant to self-administer a survey or diary entry, in their own environment, as close to the occurrence of the behavior as possible. IU Phone study’s high reporting level shows one of the beneﬁts gain from introducing EMA in STIs study. As a prospective study lasting for 84 days, participants in IU Phone study undergo STI testing and complete EMA forms with project-furnished cellular telephones according to the predetermined schedules. At pre-selected eight-hour intervals, participants respond to a series of questions to identify sexual and non-sexual interactions with speciﬁc partners including partner name, relationship satisfaction and sexual satisfaction with this partner, time of each coital event and condom use for each event. etc. STIs lab results of all the participants are collected weekly as well. We are interested in several variables related to the risk of infection and sexual or non-sexual behaviors, especially the relationship among the longitudinal processes of those variables. New statistical models and applications are established to deal with the data with complex dependence and sampling data structures. The methodologies covers various of statistical aspect like generalized mixed models, mul tivariate models and autoregressive and cross-lagged model in longitudinal data analysis, misclassiﬁcation adjustment in imperfect diagnostic tests, and variable-domain functional regression in functional data analysis. The contribution of our work is we bridge the meth ods from diﬀerent areas with EMA data in the IU Phone study and also build up a novel understanding of the association among all the variables of interest from diﬀerent perspec tives based on the characteristic of the data. Besides all the statistical analyses included in this dissertation, variety of data visualization techniques also provide informative support in presenting the complex EMA data structure. Ecological momentary assessment Functional data analysis Generalized mixed models Longitudinal data analysis Misclassification adjustment Sexually transmitted infections
80	Functional clustering methods and marital fertility modelling Arnqvist, Per January 2017 (has links) This thesis consists of two parts.The first part considers further development of a model used for marital fertility, the Coale-Trussell's fertility model, which is based on age-specific fertility rates. A new model is suggested using individual fertility data and a waiting time after pregnancies. The model is named the waiting model and can be understood as an alternating renewal process with age-specific intensities. Due to the complicated form of the waiting model and the way data is presented, as given in the United Nation Demographic Year Book 1965, a normal approximation is suggested together with a normal approximation of the mean and variance of the number of births per summarized interval. A further refinement of the model was then introduced to allow for left truncated and censored individual data, summarized as table data. The waiting model suggested gives better understanding of marital fertility and by a simulation study it is shown that the waiting model outperforms the Coale-Trussell model when it comes to estimating the fertility intensity and to predict the mean and variance of the number of births for a population. The second part of the thesis focus on developing functional clustering methods.The methods are motivated by and applied to varved (annually laminated) sediment data from lake Kassj\"on in northern Sweden. The rich but complex information (with respect to climate) in the varves, including the shapes of the seasonal patterns, the varying varve thickness, and the non-linear sediment accumulation rates makes it non-trivial to cluster the varves. Functional representations, smoothing and alignment are functional data tools used to make the seasonal patterns comparable.Functional clustering is used to group the seasonal patterns into different types, which can be associated with different weather conditions. A new non-parametric functional clustering method is suggested, the Bagging Voronoi K-mediod Alignment algorithm, (BVKMA), which simultaneously clusters and aligns spatially dependent curves. BVKMA is used on the varved lake sediment, to infer on climate, defined as frequencies of different weather types, over longer time periods. Furthermore, a functional model-based clustering method is proposed that clusters subjects for which both functional data and covariates are observed, allowing different covariance structures in the different clusters. The model extends a model-based functional clustering method proposed by James and Suger (2003). An EM algorithm is derived to estimate the parameters of the model. censoring Coale-Trussell model EM-algorithm functional data analysis functional clustering marital fertility normal approximation Poisson process varved lake sediments warping

Search results