Global ETD Search

141	Combining Geospatial and Temporal Ontologies Joshi, Kripa January 2007 (has links) (PDF) No description available. Geographic information systems
142	Policy and Place: A Spatial Data Science Framework for Research and Decision-Making January 2017 (has links) abstract: A major challenge in health-related policy and program evaluation research is attributing underlying causal relationships where complicated processes may exist in natural or quasi-experimental settings. Spatial interaction and heterogeneity between units at individual or group levels can violate both components of the Stable-Unit-Treatment-Value-Assumption (SUTVA) that are core to the counterfactual framework, making treatment effects difficult to assess. New approaches are needed in health studies to develop spatially dynamic causal modeling methods to both derive insights from data that are sensitive to spatial differences and dependencies, and also be able to rely on a more robust, dynamic technical infrastructure needed for decision-making. To address this gap with a focus on causal applications theoretically, methodologically and technologically, I (1) develop a theoretical spatial framework (within single-level panel econometric methodology) that extends existing theories and methods of causal inference, which tend to ignore spatial dynamics; (2) demonstrate how this spatial framework can be applied in empirical research; and (3) implement a new spatial infrastructure framework that integrates and manages the required data for health systems evaluation. The new spatially explicit counterfactual framework considers how spatial effects impact treatment choice, treatment variation, and treatment effects. To illustrate this new methodological framework, I first replicate a classic quasi-experimental study that evaluates the effect of drinking age policy on mortality in the United States from 1970 to 1984, and further extend it with a spatial perspective. In another example, I evaluate food access dynamics in Chicago from 2007 to 2014 by implementing advanced spatial analytics that better account for the complex patterns of food access, and quasi-experimental research design to distill the impact of the Great Recession on the foodscape. Inference interpretation is sensitive to both research design framing and underlying processes that drive geographically distributed relationships. Finally, I advance a new Spatial Data Science Infrastructure to integrate and manage data in dynamic, open environments for public health systems research and decision- making. I demonstrate an infrastructure prototype in a final case study, developed in collaboration with health department officials and community organizations. / Dissertation/Thesis / Doctoral Dissertation Geography 2017 Geography Statistics Computer science Counterfactual Framework Data Science Public Health Quasi-Experimental Research Design Spatial Data Infrastructure Spatial Effects
143	Fouille de données billettiques pour l'analyse de la mobilité dans les transports en commun / Analysis of Mobility in Public Transport Systems Through Machine Learning Applied to Ticketing Log Data Briand, Anne-Sarah 05 December 2017 (has links) Les données billettiques sont de plus en plus utilisées pour l'analyse de la mobilité dans les transports en commun. Leur richesse spatiale et temporelle ainsi que leur volume, en font un bon matériel pour une meilleure compréhension des habitudes des usagers, pour prédire les flux de passagers ou bien encore pour extraire des informations sur les événements atypiques (ou anomalies), correspondant par exemple à un accroissement ou à une baisse inhabituelle du nombre de validations enregistrées sur le réseau.Après une présentation des travaux ayant été menés sur les données billettiques, cette thèse s'est attachée à développer de nouveaux outils de traitement de ces données. Nous nous sommes particulièrement intéressés à deux challenges nous semblant non encore totalement résolus dans la littérature : l'aide à la mise en qualité des données et la modélisation et le suivi des habitudes temporelles des usagers.Un des principaux challenges de la mise en qualité des données consiste en la construction d'une méthodologie robuste qui soit capable de détecter des plages de données potentiellement problématique correspondant à des situations atypiques et ce quel que soit le contexte (jour de la semaine, vacances, jours fériés, ...). Pour cela une méthodologie en deux étapes a été déployée, à savoir le clustering pour la détermination du contexte et la détection d'anomalies. L'évaluation de la méthodologie proposée a été entreprise sur un jeu de données réelles collectées sur le réseau de transport en commun rennais. En croisant les résultats obtenus avec les événements sociaux et culturels de la ville, l'approche a permis d'évaluer l'impact de ces événements sur la demande en transport, en termes de sévérité et d'influence spatiale sur les stations voisines.Le deuxième volet de la thèse concerne la modélisation et le suivi de l'activité temporelle des usagers. Un modèle de mélange de gaussiennes a été développé pour partitionner les usagers dans les clusters en fonction des heures auxquelles ils utilisent les transports en commun. L'originalité de la méthodologie proposée réside dans l'obtention de profils temporels continus pour décrire finement les routines temporelles de chaque groupe d'usager. Les appartenance aux clusters ont également été croisées avec les données disponibles sur les usagers (type de carte) en vue d'obtenir une description plus précise de chaque cluster. L'évolution de l'appartenance aux clusters au cours des années a également été analysée afin d'évaluer la stabilité de l'utilisation des transports d'une année sur l'autre. / Ticketing logs are being increasingly used to analyse mobility in public transport. The spatial and temporal richness as well as the volume of these data make them useful for understanding passenger habits and predicting origin-destination flows. Information on the operations carried out on the transportation network can also be extracted in order to detect atypical events (or anomalies), such as an unusual increase or decrease in the number of validations.This thesis focuses on developing new tools to process ticketing log data. We are particularly interested in two challenges that seem to be not yet fully resolved in the literature: help with data quality as well as the modeling and monitoring of passengers' temporal habits.One of the main challenges in data quality is the construction of a robust methodology capable of detecting atypical situations in any context (day of the week, holidays, public holidays, etc.). To this end, two steps were deployed, namely clustering for context estimation and detection of anomalies. The evaluation of the proposed methodology is conducted on a real dataset collected on the Rennes public transport network. By cross-comparing the obtained results with the social and cultural events of the city, it is possible to assess the impact of these events on transport demand, in terms, of severity and spatial influence on neighboring stations.The second part of the thesis focuses on the modeling and the tracking of the temporal activity of passengers. A Gaussian mixture model is proposed to partition passengers into clusters according to the hours they use public transport. The originality of the methodology compared to existing approaches lies in obtaining continuous time profiles in order to finely describe the time routines of each passenger cluster. Cluster memberships are also cross-referenced with passenger data (card type) to obtain a more accurate description of each cluster. The cluster membership over the years has also been analyzed in order to study how the use of transport evolves Apprentissage statistique Données spatiales Données longitudinales Masse de données Suivi temporel Statistical learning Spatial data Longitudinal data Mass data Time tracking
144	Amélioration de la modélisation de la calotte de glace Antarctique à partir de la topographie de la surface / Joined data/modelling study of the dynamics of the Antarctic Ice sheet evolution in the context of climate change. Navas, Giuliat 22 November 2011 (has links) La modélisation des calottes polaires est importante pour reconstruire l'état passé des calottes, comprendre l'état présent, et prévoir son évolution dans le contexte du réchauffement climatique et de l'élévation du niveau des mers. Les mécanismes qui interviennent dans la dynamique des calottes de glace et qui dépendent du climat sont nombreux, mais pour l'Antarctique il y a deux mécanismes très importants qui s'opposent : L'augmentation de la température qui est supposée entraîner une augmentation de la précipitation et un épaississement de la calotte, et l'intensification de l'écoulement de la glace qui tend à amincir la calotte. Pour étudier ces deux mécanismes, nous avons suivi deux approches : caractériser la calotte à partir des observations directes (c.-à-d. topographie de la surface et les vitesses d'écoulement de glace) ou indirectes (c.-à-d. Flux de bilan). Et la modéliser avec GRISLI (GRenoble Ice Shelf and Land Ice), en prenant en considération la dynamique des fleuves de glace et leurs localisations précises, pour mieux comprendre les mécanismes actifs qui interviennent dans la calotte. Le sujet de la thèse est l'amélioration de la modélisation de la calotte Antarctique à partir des données disponibles. Notamment celles basées sur la première et la deuxième dérivées de la surface (pente et courbures respectivement) pour faire des liens avec le drainage de la glace, et les structures de vitesses de bilan. Ces informations nous ont permis entre autres de développer différentes méthodes pour autoriser les fleuves de glace, qui ensuite ont été introduites dans GRISLI. Nous avons ensuite fait plusieurs études de sensibilité de la calotte sur les localisations des fleuves de glace, les données du flux géothermique et des paramètres qui contrôlent le glissement et la déformation de la glace. Enfin nous avons fait des comparaisons entre les structures observées et modélisées de la calotte, et nous avons vu que le modèle n'est pas loin de reproduire les structures observées. / Modelling of the polar ice sheets is important to reconstruct its past, understand its current state and predict its evolution in the context of the global warming and rising sea levels. There are numerous mechanisms involved in the dynamics of ice sheets and these are climate-dependent. In particular there are two very important opposing mechanisms: the increase in the temperature which is supposed to lead to increased precipitation and thickening of the ice, and increased melting of the ice, which tends to reduce the mass of the ice sheets. To study these two mechanisms, we followed two approaches: characterize the ice sheets from direct observations (i.e., surface topography and ice flow velocities) or indirect observations (i.e., flow balance), and model it with GRISLI (Grenoble Ice Shelf and Land Ice), taking into account the dynamics of ice streams and their precise locations, to better understand the active mechanisms involved in the ice sheets. The subject of this thesis is to improve the modelling of ice sheets from the available data, especially those based on the first and second derivatives of the surface (slope and curvature respectively), and to link with the ice drainage, and velocity structure outcome. This data has allowed us to develop methods to model ice flows, subsequently introduce them into the GRISLI. We then made several sensitivity studies of the ice sheets, to localize ice streams, geothermal data flow and the parameters that control the slip and deformation of the ice. Finally, we compare the observed and modelled structures of the ice sheets, and show that model results are not far from the actual observed structures. Changement climatique Données altimétriques Vitesse de bilan Calotte de glace Fleuves de glace Climate change Spatial data Balance flux Ice-sheet Ice-streams
145	Otimização de algoritmos de agrupamento espacial baseado em densidade aplicados em grandes conjuntos de dados / Optimization of Density-Based Spatial Clustering Algorithms Applied to Large Data Sets Daniel, Guilherme Priólli [UNESP] 12 August 2016 (has links) Submitted by Guilherme Priólli Daniel (gui.computacao@yahoo.com.br) on 2016-09-06T13:30:29Z No. of bitstreams: 1 Dissertação_final.pdf: 2456534 bytes, checksum: 4d2279141f7c034de1e4e4e261805db8 (MD5) / Approved for entry into archive by Juliano Benedito Ferreira (julianoferreira@reitoria.unesp.br) on 2016-09-09T17:54:56Z (GMT) No. of bitstreams: 1 daniel_gp_me_sjrp.pdf: 2456534 bytes, checksum: 4d2279141f7c034de1e4e4e261805db8 (MD5) / Made available in DSpace on 2016-09-09T17:54:56Z (GMT). No. of bitstreams: 1 daniel_gp_me_sjrp.pdf: 2456534 bytes, checksum: 4d2279141f7c034de1e4e4e261805db8 (MD5) Previous issue date: 2016-08-12 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / A quantidade de dados gerenciados por serviços Web de grande escala tem crescido significantemente e passaram a ser chamados de Big Data. Esses conjuntos de dados podem ser definidos como um grande volume de dados complexos provenientes de múltiplas fontes que ultrapassam a capacidade de armazenamento e processamento dos computadores atuais. Dentro desses conjuntos, estima-se que 80% dos dados possuem associação com alguma posição espacial. Os dados espaciais são mais complexos e demandam mais tempo de processamento que os dados alfanuméricos. Nesse sentido, as técnicas de MapReduce e sua implementação têm sido utilizadas a fim de retornar resultados em tempo hábil com a paralelização dos algoritmos de prospecção de dados. Portanto, o presente trabalho propõe dois algoritmos de agrupamento espacial baseado em densidade: o VDBSCAN-MR e o OVDBSCAN-MR. Ambos os algoritmos utilizam técnicas de processamento distribuído e escalável baseadas no modelo de programação MapReduce com intuito de otimizar o desempenho e permitir a análise em conjuntos Big Data. Por meio dos experimentos realizados foi possível verificar que os algoritmos desenvolvidos apresentaram melhor qualidade nos agrupamentos encontrados em comparação com os algoritmos tomados como base. Além disso, o VDBSCAN-MR obteve um melhor desempenho que o algoritmo sequencial e suportou a aplicação em grandes conjuntos de dados espaciais. / The amount of data managed by large-scale Web services has increased significantly and it arise to the status of Big Data. These data sets can be defined as a large volume of complex data from multiple data sources exceeding the storage and processing capacity of current computers. In such data sets, about 80% of the data is associated with some spatial position. Spatial data is even more complex and require more processing time than what would be required for alphanumeric data. In that sense, MapReduce techniques and their implementation have returned results timely with parallelization of data mining algorithms and could apply for Big Data sets. Therefore, this work develops two density-based spatial clustering algorithms: VDBSCAN-MR and OVDBSCAN-MR. Both algorithms use distributed and scalable processing techniques based on the MapReduce programming model in order to optimize performance and enable Big Data analysis. Throughout experimentation, we observed that the developed algorithms have better quality clusters compared to the base algorithms. Furthermore, VDBSCAN-MR achieved a better performance than the original sequential algorithm and it supported the application on large spatial data sets. VDBSCAN-MR OVDBSCAN-MR Big Data Prospecção de dados espaciais Spatial Data Mining Agrupamento Espacial Spatial Clustering MapReduce
146	Fundamentos da análise geográfica da difusão espacial das mortes por agressão no espaço urbano de Belém/PA (2000-2012) / Fundamentals of the geographical analysis of spatial diffusion of homicide in the urban area of Belém/PA (2000-2012) Costa, Tiago Barreto de Andrade [UNESP] 06 July 2017 (has links) Submitted by TIAGO BARRETO DE ANDRADE COSTA null (tiagobac@yahoo.com.br) on 2017-07-13T22:57:26Z No. of bitstreams: 1 TESE_TIAGO COSTA_VERSÃO FINAL.pdf: 23380596 bytes, checksum: 619a0746dd076316d81d596da3513a6c (MD5) / Approved for entry into archive by Monique Sasaki (sayumi_sasaki@hotmail.com) on 2017-07-14T18:38:21Z (GMT) No. of bitstreams: 1 costa_tba_dr_prud.pdf: 23380596 bytes, checksum: 619a0746dd076316d81d596da3513a6c (MD5) / Made available in DSpace on 2017-07-14T18:38:21Z (GMT). No. of bitstreams: 1 costa_tba_dr_prud.pdf: 23380596 bytes, checksum: 619a0746dd076316d81d596da3513a6c (MD5) Previous issue date: 2017-07-06 / A mortalidade por agressão no Brasil desde pelo menos o início de 1980 vem apresentando um recrudescimento linear, em paralelo ao contexto de produção das periferias urbanas. A desagregação desse fenômeno em escalas de maior detalhe, entretanto, permite-nos perceber que esse crescimento linear é na verdade o resultado de diversas dinâmicas locais, várias delas com características epidêmicas. Isto é, crescimentos abruptos fora do padrão histórico de dada localidade, portanto, crescimentos não-lineares. De acordo com a análise proposta no presente trabalho, identificamos Belém/PA enquanto um desses contextos epidêmicos de agressões letais. Tais expressões de violência naquela cidade vêm crescendo fora de sua dinâmica costumeira desde o início dos anos 2000 e a expressão geográfica desse fenômeno tem sido a difusão das mortes pelo espaço intraurbano. Dessa forma, propomos enquanto tese que tal propagação espacial da problemática, sendo uma típica expressão de difusão sobre o espaço geográfico, não se dá de forma homogênea em todas as direções a partir de um foco. Antes, segue por caminhos que oferecem condições mais favoráveis à epidemia (corredores de difusão) e evita aqueles mais desfavoráveis (barreiras à difusão). Enfim, no presente trabalho argumentamos no sentido de demonstrar essa dinâmica espaço-temporal peculiar do crescimento dos homicídios em Belém nos últimos anos, a partir de uma abordagem dos conceitos de epidemia e difusão espacial calcada em análise de dados espaciais. Ficou demonstrado, com base nessa análise, que os homicídios nessa metrópole brasileira têm alta correlação espacial, o que aponta para a necessidade dos fundamentos teóricos e metodológicos da Geografia na análise dessa problemática no contexto brasileiro. / Since at least 1980 the homicide mortality on Brazil has increased linearly at the same time of the urban slums has being shaped. However, in a larger zoom into the fenomeno, we can to see this linear growth is a result of several local dynamics, some of which are epidemic. That is, abrupt growth outside the historical pattern of a given locality, therefore, nonlinear growth. According to the analysis proposed in the present study, we identified Belém/PA as one of those epidemic contexts of lethal violence. Such expressions of violence in that city have been growing out of their usual dynamics since the early 2000s and the geographic expression of this phenomenon has been the spread of deaths through intraurban space. In this way, we propose as thesis that such spread of the problematic, as a typical expression of the diffusion on the geographical space, does not take place homogeneously in all directions from a focus. Indeed, it follows paths that offer more favorable conditions to the epidemic (diffusion channels) and avoid the more unfavorable ones (barriers to diffusion). Therefore, using a spatial data analysis approach, in the present work we argue to demonstrating this peculiar space-time dynamics of the present-day increase in the Belém’s homicides events. We do this from an approach of the concepts of epidemic and spatial diffusion based on spatial data analysis. It was demonstrated that the homicides in this Brazilian metropolis have a high spatial correlation, which points to the need of the theoretical and methodological foundations of Geography in the analysis of this problem in the Brazilian context. Espaço intraurbano Epidemia de homicídios Difusão espacial Análise de dados espaciais Intraurban space Homicides epidemic Spatial diffusion Spatial data analysis
147	Infraestrutura de dados espaciais em unidades de conservação: uma proposta para disseminação da informação geográfica do Parque Estadual de Intervales - SP / Spatial data infrastructure in protected area: a proposal for dissemination of geographic information of Parque Estadual de Intervales - SP Eduardo Tomio Nakamura 01 September 2010 (has links) Esse trabalho apresenta uma proposta de Infraestrutura de Dados Espaciais de nível organizacional para o Parque Estadual de Intervales-SP, que visa compartilhar suas informações geográficas com a sociedade em geral. Nos processos de elaboração da IDE são discutidas questões como interoperabilidade, padronização, metadados, especificação de serviços geográficos e o relacionamento dos nós das Infraestrutura de Dados Espaciais que vão permitir a disseminação da informação geográfica de fácil acesso a usuários externos. Os procedimentos, benefícios e limitações são listados e problematizados de forma que demonstrem as etapas necessárias na elaboração da Infraestrutura de Dados Espaciais de nível organizacional para uma Unidade de Conservação. Conclui-se que uma Infraestrutura de Dados Espaciais depende de variáveis administrativas, culturais, técnicas e financeiras, o que leva a uma proposta de implementação por estágios. Também são elaboradas críticas aos recursos existentes e sugestões para melhorias e estudos futuros. / This paper presents a proposal about Spatial Data Infrastructure in organizational level to the Parque Estadual de Intervales-SP, in order to promote the sharing of geographic information with the society. In the elaboration process of the SDI are discussed issues such as interoperability, standardization, metadata, specifying geographic services and relationship of the Spatial Data Infrastructure nodes that will enable the dissemination of geographic information easily and accessible to external users. The process steps, benefits and limitations are listed and discussed in order to demonstrate the necessary steps to prepare the Spatial Data Infrastructure in organizational level to a protected area. As results we observe a spatial data infrastructure that depends of others variables like management, culture, technical and financial company aspects, which leads to a proposal of implementation in stages, as well as discussions about the capabilities and suggestions for improvements and future studies. Infraestrutura de dados espaciais Interoperabilidade Metadados Sistemas de informação geográfica Unidades de conservação Geographic information systems Interoperability Metadata Protected area Spatial data infrastructure
148	Processamento de dados de monitores de produtividade de cana-de-açúcar / Processing of data from sugarcane yield monitors Leonardo Felipe Maldaner 10 July 2017 (has links) Na cultura da cana-de-açúcar, a colheita é realizada por uma colhedora que efetua o corte e processamento do produto colhido ao longo de uma (ou duas) fileira (s) da cultura estabelecida. Neste processo, dados obtidos por monitor de produtividade, quando existentes, fornecem informações com diferentes utilidades. Métodos existentes para o processamento de dados de produtividade utlizados atualmente foram desenvolvidos para conjuntos de dados de produtividade de grãos e quando aplicados a um conjunto de dados de produtividade de cana-de-açúcar podem eliminar dados com variações reais de produtividade dentro da fileira. O objetivo deste trabalho é desenvolver métodos que busquem identificar e remover dados errôneos, em pós-processamento, do conjunto de dados gerados por monitor de produtividade para caracterização das pequenas variações de produtividade dentro de uma fileira de cana-de-açúcar. A identificação de dados discrepantes do conjunto de dados utilizando método estatístico por quartis e uma filtragem comparando valores de produtividade usando somente dados de uma única passada da colhedora foi proposto. Foram utlizados quatro conjunto de dados de produtividade gerados por dois monitores. O monitor de produtividade 1 registrou os dados a uma frequência de 0,5 Hz e o monitor de produtividade 2 a uma frequência de 1 Hz. Foram encontrados dados errôneos gerados devido ao tempo de sincronização entre a colhedora e o conjunto transbordo durante as manobras de cabeceira e durante a troca do conjunto de transbordo. Também foram encontrados dados durante a manobras da colhedora, onde o monitor registrou dados com produtividade zero e nulas. Foram simuladas diferentes frequência de registro de dados com objetivo de verificar se a densidade de dados fornecida pelo monitor influência na caracterização de pequenas variações nos valores de produtividade dentro da passada. Os conjuntos de dados de produtividade gerados por diferentes tipos de monitores demostraram a necessidade de pós-processamento para remoção devalores de produtividades discrepantes. A metodologia desenvolvida neste trabalho foi capaz de identificar e eliminar os dados errôneos dos conjuntos de dados analisados. A metodologia de filtragem de dados considerando somente dados dentro de uma única passada da colhedora de cana-de-açúcar proporcionou a caracterização da variação de valores de produtividade em pequenas distâncias. / In the sugarcane crop, a harvest is performed by a harvester who cuts and processes the product harvested along one (or two) row (s) of the established crop. In this process, data from yield monitor, when applicable, provide information with different utilities. Existing methods for processing yield data currently used have been developed for datasets of yield grain and when applied to a sugarcane yield dataset can eliminate data with actual variations of yield within the row. The objective of this work is to develop methods that seek to identify and remove erroneous data, in post-processing, from the data set generated by yield monitor to characterize the small variations of yield within a row of sugarcane. The identification of outliers from the data set using statistical method for comparing quartiles and filtering yield values using only data from a single past the harvester has been proposed. Assay were utilized four yield dataset generated by two monitors. The yield monitor 1 recorded data at a frequency of 0.5 Hz and the yield monitor 2 at a frequency of 1 Hz. Erroneous data were found in the data set generated due to the time of synchronization between the sugarcane harvester and the transportation of chopped sugarcane during the headland turns and during the exchange of the transportation of chopped sugarcane during harvest. Were also found during the headland turns of the sugarcane harvester, where the yield monitor recorded data with values of yield zero and void. It was simulated different frequency of recording data with the objective of verifying if density of data provided by the monitor influences in the characterization of small variations in the yield values within the path. The yield data sets generated by different types of displays have demonstrated the need for post-processing to remove outliers in the yield dataset. The methodology developed in this study was able to identify and eliminate erroneous data sets analyzed data. Data filtering methodology considering only data within a single pass of the sugarcane harvester provided to characterize the variation in yield values over short distances. Agricultura de precisão Dados errôneos Processamento de dados espacial Variabilidade espacial Erroneous data Precision agriculture Spatial data processing Spatial variability
149	GIS, data mining and wild land fire data within Räddningstjänsten Sandell, Anna January 2001 (has links) Geographical information systems (GIS), data mining and wild land fire would theoretically be suitable to use together. However, would data mining in reality bring out any useful information from wild land fire data stored within a GIS? In this report an investigation is done if GIS and data mining are used within Räddningstjänsten today in some municipalities of the former Skaraborg. The investigation shows that neither data mining nor GIS are used within the investigated municipalities. However, there is an interest in using GIS within the organisations in the future but also some kind of analysis tool, for example data mining. To show how GIS and data mining could be used in the future within Räddningstjänsten some examples on this were constructed. Geographical Information Systems GIS Wild land fire Data mining Spatial data mining Information Systems
150	Computational Methods for Large Spatio-temporal Datasets and Functional Data Ranking Huang, Huang 16 July 2017 (has links) This thesis focuses on two topics, computational methods for large spatial datasets and functional data ranking. Both are tackling the challenges of big and high-dimensional data. The first topic is motivated by the prohibitive computational burden in fitting Gaussian process models to large and irregularly spaced spatial datasets. Various approximation methods have been introduced to reduce the computational cost, but many rely on unrealistic assumptions about the process and retaining statistical efficiency remains an issue. We propose a new scheme to approximate the maximum likelihood estimator and the kriging predictor when the exact computation is infeasible. The proposed method provides different types of hierarchical low-rank approximations that are both computationally and statistically efficient. We explore the improvement of the approximation theoretically and investigate the performance by simulations. For real applications, we analyze a soil moisture dataset with 2 million measurements with the hierarchical low-rank approximation and apply the proposed fast kriging to fill gaps for satellite images. The second topic is motivated by rank-based outlier detection methods for functional data. Compared to magnitude outliers, it is more challenging to detect shape outliers as they are often masked among samples. We develop a new notion of functional data depth by taking the integration of a univariate depth function. Having a form of the integrated depth, it shares many desirable features. Furthermore, the novel formation leads to a useful decomposition for detecting both shape and magnitude outliers. Our simulation studies show the proposed outlier detection procedure outperforms competitors in various outlier models. We also illustrate our methodology using real datasets of curves, images, and video frames. Finally, we introduce the functional data ranking technique to spatio-temporal statistics for visualizing and assessing covariance properties, such as separability and full symmetry. We formulate test functions as functions of temporal lags for each pair of spatial locations and develop a rank-based testing procedure induced by functional data depth for assessing these properties. The method is illustrated using simulated data from widely used spatio-temporal covariance models, as well as real datasets from weather stations and climate model outputs. Large spatial data set low rank approximation Functional Data Analysis spatio-temporal covariance Statistical efficiency Outlier detection

Search results