Global ETD Search

91	Non-Parametric Clustering of Multivariate Count Data Tekumalla, Lavanya Sita January 2017 (has links) (PDF) The focus of this thesis is models for non-parametric clustering of multivariate count data. While there has been significant work in Bayesian non-parametric modelling in the last decade, in the context of mixture models for real-valued data and some forms of discrete data such as multinomial-mixtures, there has been much less work on non-parametric clustering of Multi-variate Count Data. The main challenges in clustering multivariate counts include choosing a suitable multivariate distribution that adequately captures the properties of the data, for instance handling over-dispersed data or sparse multivariate data, at the same time leveraging the inherent dependency structure between dimensions and across instances to get meaningful clusters. As the first contribution, this thesis explores extensions to the Multivariate Poisson distribution, proposing efficient algorithms for non-parametric clustering of multivariate count data. While Poisson is the most popular distribution for count modelling, the Multivariate Poisson often leads to intractable inference and a suboptimal t of the data. To address this, we introduce a family of models based on the Sparse-Multivariate Poisson, that exploit the inherent sparsity in multivariate data, reducing the number of latent variables in the formulation of Multivariate Poisson leading to a better t and more efficient inference. We explore Dirichlet process mixture model extensions and temporal non-parametric extensions to models based on the Sparse Multivariate Poisson for practical use of Poisson based models for non-parametric clustering of multivariate counts in real-world applications. As a second contribution, this thesis addresses moving beyond the limitations of Poisson based models for non-parametric clustering, for instance in handling over dispersed data or data with negative correlations. We explore, for the first time, marginal independent inference techniques based on the Gaussian Copula for multivariate count data in the Dirichlet Process mixture model setting. This enables non-parametric clustering of multivariate counts without limiting assumptions that usually restrict the marginal to belong to a particular family, such as the Poisson or the negative-binomial. This inference technique can also work for mixed data (combination of counts, binary and continuous data) enabling Bayesian non-parametric modelling to be used for a wide variety of data types. As the third contribution, this thesis addresses modelling a wide range of more complex dependencies such as asymmetric and tail dependencies during non-parametric clustering of multivariate count data with Vine Copula based Dirichlet process mixtures. While vine copula inference has been well explored for continuous data, it is still a topic of active research for multivariate counts and mixed multivariate data. Inference for multivariate counts and mixed data is a hard problem owing to ties that arise with discrete marginal. An efficient marginal independent inference approach based on extended rank likelihood, based on recent work in the statistics literature, is proposed in this thesis, extending the use vines for multivariate counts and mixed data in practical clustering scenarios. This thesis also explores the novel systems application of Bulk Cache Preloading by analysing I/O traces though predictive models for temporal non-parametric clustering of multivariate count data. State of the art techniques in the caching domain are limited to exploiting short-range correlations in memory accesses at the milli-second granularity or smaller and cannot leverage long range correlations in traces. We explore for the first time, Bulk Cache Preloading, the process of pro-actively predicting data to load into cache, minutes or hours before the actual request from the application, by leveraging longer range correlation at the granularity of minutes or hours. This enables the development of machine learning techniques tailored for caching due to relaxed timing constraints. Our approach involves a data aggregation process, converting I/O traces into a temporal sequence of multivariate counts, that we analyse with the temporal non-parametric clustering models proposed in this thesis. While the focus of our thesis is models for non-parametric clustering for discrete data, particularly multivariate counts, we also hope our work on bulk cache preloading paves the way to more inter-disciplinary research for using data mining techniques in the systems domain. As an additional contribution, this thesis addresses multi-level non-parametric admixture modelling for discrete data in the form of grouped categorical data, such as document collections. Non-parametric clustering for topic modelling in document collections, where a document is as-associated with an unknown number of semantic themes or topics, is well explored with admixture models such as the Hierarchical Dirichlet Process. However, there exist scenarios, where a doc-ument requires being associated with themes at multiple levels, where each theme is itself an admixture over themes at the previous level, motivating the need for multilevel admixtures. Consider the example of non-parametric entity-topic modelling of simultaneously learning entities and topics from document collections. This can be realized by modelling a document as an admixture over entities while entities could themselves be modeled as admixtures over topics. We propose the nested Hierarchical Dirichlet Process to address this gap and apply a two level version of our model to automatically learn author entities and topics from research corpora. Multivariate Count Data Clustering Mixture Models Non-parametric Clustering Bulk Cache Preloading Dirichlet Process Mixture Models Spatio-Temporal Data Aggregation Sparse Multivariate Poisson MultiVariate Poisson (MVP) Copulas Nested Hierarchical Dirichlet Processes Dirichlet Process Mixtures Sparse-Multivariate Poisson Dirichlet Process Mixture Model Computer Science
92	Designing conventional, spatial, and temporal data warehouses: concepts and methodological framework Malinowski Gajda, Elzbieta 02 October 2006 (has links) Decision support systems are interactive, computer-based information systems that provide data and analysis tools in order to better assist managers on different levels of organization in the process of decision making. Data warehouses (DWs) have been developed and deployed as an integral part of decision support systems. <p><p>A data warehouse is a database that allows to store high volume of historical data required for analytical purposes. This data is extracted from operational databases, transformed into a coherent whole, and loaded into a DW during the extraction-transformation-loading (ETL) process. <p><p>DW data can be dynamically manipulated using on-line analytical processing (OLAP) systems. DW and OLAP systems rely on a multidimensional model that includes measures, dimensions, and hierarchies. Measures are usually numeric additive values that are used for quantitative evaluation of different aspects about organization. Dimensions provide different analysis perspectives while hierarchies allow to analyze measures on different levels of detail. <p><p>Nevertheless, currently, designers as well as users find difficult to specify multidimensional elements required for analysis. One reason for that is the lack of conceptual models for DW and OLAP system design, which would allow to express data requirements on an abstract level without considering implementation details. Another problem is that many kinds of complex hierarchies arising in real-world situations are not addressed by current DW and OLAP systems.<p><p>In order to help designers to build conceptual models for decision-support systems and to help users in better understanding the data to be analyzed, in this thesis we propose the MultiDimER model - a conceptual model used for representing multidimensional data for DW and OLAP applications. Our model is mainly based on the existing ER constructs, for example, entity types, attributes, relationship types with their usual semantics, allowing to represent the common concepts of dimensions, hierarchies, and measures. It also includes a conceptual classification of different kinds of hierarchies existing in real-world situations and proposes graphical notations for them.<p><p>On the other hand, currently users of DW and OLAP systems demand also the inclusion of spatial data, visualization of which allows to reveal patterns that are difficult to discover otherwise. The advantage of using spatial data in the analysis process is widely recognized since it allows to reveal patterns that are difficult to discover otherwise. <p><p>However, although DWs typically include a spatial or a location dimension, this dimension is usually represented in an alphanumeric format. Furthermore, there is still a lack of a systematic study that analyze the inclusion as well as the management of hierarchies and measures that are represented using spatial data. <p><p>With the aim of satisfying the growing requirements of decision-making users, we extend the MultiDimER model by allowing to include spatial data in the different elements composing the multidimensional model. The novelty of our contribution lays in the fact that a multidimensional model is seldom used for representing spatial data. To succeed with our proposal, we applied the research achievements in the field of spatial databases to the specific features of a multidimensional model. The spatial extension of a multidimensional model raises several issues, to which we refer in this thesis, such as the influence of different topological relationships between spatial objects forming a hierarchy on the procedures required for measure aggregations, aggregations of spatial measures, the inclusion of spatial measures without the presence of spatial dimensions, among others. <p><p>Moreover, one of the important characteristics of multidimensional models is the presence of a time dimension for keeping track of changes in measures. However, this dimension cannot be used to model changes in other dimensions. <p>Therefore, usual multidimensional models are not symmetric in the way of representing changes for measures and dimensions. Further, there is still a lack of analysis indicating which concepts already developed for providing temporal support in conventional databases can be applied and be useful for different elements composing a multidimensional model. <p><p>In order to handle in a similar manner temporal changes to all elements of a multidimensional model, we introduce a temporal extension for the MultiDimER model. This extension is based on the research in the area of temporal databases, which have been successfully used for modeling time-varying information for several decades. We propose the inclusion of different temporal types, such as valid and transaction time, which are obtained from source systems, in addition to the DW loading time generated in DWs. We use this temporal support for a conceptual representation of time-varying dimensions, hierarchies, and measures. We also refer to specific constraints that should be imposed on time-varying hierarchies and to the problem of handling multiple time granularities between source systems and DWs. <p><p>Furthermore, the design of DWs is not an easy task. It requires to consider all phases from the requirements specification to the final implementation including the ETL process. It should also take into account that the inclusion of different data items in a DW depends on both, users' needs and data availability in source systems. However, currently, designers must rely on their experience due to the lack of a methodological framework that considers above-mentioned aspects. <p><p>In order to assist developers during the DW design process, we propose a methodology for the design of conventional, spatial, and temporal DWs. We refer to different phases, such as requirements specification, conceptual, logical, and physical modeling. We include three different methods for requirements specification depending on whether users, operational data sources, or both are the driving force in the process of requirement gathering. We show how each method leads to the creation of a conceptual multidimensional model. We also present logical and physical design phases that refer to DW structures and the ETL process.<p><p>To ensure the correctness of the proposed conceptual models, i.e. with conventional data, with the spatial data, and with time-varying data, we formally define them providing their syntax and semantics. With the aim of assessing the usability of our conceptual model including representation of different kinds of hierarchies as well as spatial and temporal support, we present real-world examples. Pursuing the goal that the proposed conceptual solutions can be implemented, we include their logical representations using relational and object-relational databases.<p> / Doctorat en sciences appliquées / info:eu-repo/semantics/nonPublished Sciences de l'ingénieur Informatique générale OLAP technology Data warehousing Data warehousing -- Design Multidimensional databases OLAP, Technologie Entrepôts de données (Informatique) Bases de données multidimensionnelles temporal data warehouses spatial data warehouses OLAP hierarchies multidimensional model conceptual modeling data warehouses methodology for data warehouse design spatial OLAP
93	Získávání znalostí z časoprostorových dat / Knowledge Discovery in Spatio-Temporal Data Pešek, Martin January 2011 (has links) This thesis deals with knowledge discovery in spatio-temporal data, which is currently a rapidly evolving area of research in information technology. First, it describes the general principles of knowledge discovery, then, after a brief introduction to mining in the temporal and spatial data, it focuses on the overview and description of existing methods for mining in spatio-temporal data. It focuses, in particular, on moving objects data in the form of trajectories with an emphasis on the methods for trajectory outlier detection. The next part of the thesis deals with the process of implementation of the trajectory outlier detection algorithm called TOP-EYE. In order to testing, validation and possibility of using this algorithm is designed and implemented an application for trajectory outlier detection. The algorithm is experimentally evaluated on two different data sets.
94	The impact of parsing methods on recurrent neural networks applied to event-based vehicular signal data / Påverkan av parsningsmetoder på återkommande neuronnät applicerade på händelsebaserad signaldata från fordon Max, Lindblad January 2018 (has links) This thesis examines two different approaches to parsing event-based vehicular signal data to produce input to a neural network prediction model: event parsing, where the data is kept unevenly spaced over the temporal domain, and slice parsing, where the data is made to be evenly spaced over the temporal domain instead. The dataset used as a basis for these experiments consists of a number of vehicular signal logs taken at Scania AB. Comparisons between the parsing methods have been made by first training long short-term memory (LSTM) recurrent neural networks (RNN) on each of the parsed datasets and then measuring the output error and resource costs of each such model after having validated them on a number of shared validation sets. The results from these tests clearly show that slice parsing compares favourably to event parsing. / Denna avhandling jämför två olika tillvägagångssätt vad gäller parsningen av händelsebaserad signaldata från fordon för att producera indata till en förutsägelsemodell i form av ett neuronnät, nämligen händelseparsning, där datan förblir ojämnt fördelad över tidsdomänen, och skivparsning, där datan är omgjord till att istället vara jämnt fördelad över tidsdomänen. Det dataset som används för dessa experiment är ett antal signalloggar från fordon som kommer från Scania. Jämförelser mellan parsningsmetoderna gjordes genom att först träna ett lång korttidsminne (LSTM) återkommande neuronnät (RNN) på vardera av de skapade dataseten för att sedan mäta utmatningsfelet och resurskostnader för varje modell efter att de validerats på en delad uppsättning av valideringsdata. Resultaten från dessa tester visar tydligt på att skivparsning står sig väl mot händelseparsning. neural network artificial neural network ANN recurrent neural network RNN long-short term memory LSTM event slice parsing method slice parsing event parsing time-slice event-based slice-based signal data temporal data temporal sequence multivariate time-series unequally spaced unevenly spaced irregularly spaced Scania SICS SAGA Computer Sciences Datavetenskap (datalogi)
95	Konzeption einer qualitätsgesicherten Implementierung eines Echtzeitassistenzsystems basierend auf einem terrestrischen Long Range Laserscanner Czerwonka-Schröder, Daniel 04 July 2023 (has links) Sich verändernde Rahmenbedingungen des Klimawandels haben einen erheblichen Einfluss auf die Gestaltung der Erdoberfläche. Der Sachverhalt ist anhand unterschiedlicher geomorphologischer Veränderungsprozesse zu beobachten, sei es bei gravitativen Naturgefahren (Felsstürze, Hangrutschungen oder Murereignissen), der Gletscherschmelze in Hochgebirgsregionen oder der Änderungen der Küstendynamik an Sandstränden. Derartige Ereignisse werden durch immer stärker ausgeprägte, extreme Wetterbedingungen verursacht. In diesem Zusammenhang sind präventive Maßnahmen und der Schutz der Bevölkerung im Zuge eines Risikomanagements essentiell. Um mit diesen Gefahren sicher umgehen zu können, sind qualitativ hochwertige drei- und vierdimensionale (3D und 4D) Datensätze der Erdoberfläche erforderlich. Der technische Fortschritt in der Messtechnik und damit verbunden ein Paradigmenwechsel haben die Möglichkeiten in der Erfassung von räumlich als auch zeitlich verdichteten Daten erheblich verbessert. Die Weiterentwicklung von terrestrischen Laserscannern hin zu kommunikationsfähigen, programmierbaren Multisensorsystemen, eine kompakte und robuste Bauweise, hohe Messreichweiten sowie wirtschaftlich attraktive Systeme lassen einen Übergang zu permanentem terrestrischen Laserscanning (PLS) zu. Im Sinne eines adaptiven Monitorings ist PLS für die Integration in echtzeitnahe Assistenz- oder Frühwarnsysteme prädestiniert. Um die Akzeptanz eines solchen Systems zu erreichen sind jedoch transparente, nachvollziehbare Methoden und Prozesse zur Informationsgewinnung und -aufbereitung zu definieren. Ziel dieser Arbeit ist es, PLS als Methode systematisch aufzuarbeiten. Vier wesentliche Schritte entlang der Prozesskette werden identifiziert: (i) Die Datenerfassung einer einzelnen Epoche, (ii) die Bereitstellung eines redundanten Datenmanagements sowie einer sicheren Datenkommunikation zu zentralen Servern, (iii) die multitemporale Datenanalyse und (iv) die Aufbereitung, das Reporting und die Präsentation der Ergebnisse für Stakeholder. Basierend auf dieser Prozesskette ergeben sich zwei Untersuchungsschwerpunkte. Zunächst wird die qualitative Beurteilung der erfassten Punktwolken behandelt. Der Fokus liegt dabei einerseits auf dem Einfluss unterschiedlicher Registrierungsmethoden auf die multitemporalen Punktwolken und andererseits auf dem Einfluss der Atmosphäre auf die Messergebnisse. Es wird nachgewiesen, dass eine Nichtberücksichtigung dieser Einflüsse zu signifikanten Abweichungen führt, welche zu Fehlinterpretationen der abgeleiteten Informationen führen kann. Weiterhin wird gezeigt, dass es an datenbasierten Verfahren zur Berücksichtigung dieser Einflüsse fehlt. Als Grundlage für die Untersuchungen dienen umfangreiche Datensätze aus Noordwijk / Niederlande und Vals / Österreich. Der zweite Schwerpunkt befasst sich mit der Datenanalyse. Die Herausforderung besteht darin, tausende Punktwolken einzelner Messepochen analysieren zu müssen. Bitemporale Methoden sind hier nur eingeschränkt anwendbar. Die vorliegende Arbeit stellt eine zweistufige Methode vor, mit der automatisiert Informationen aus dem umfangreichen Datensatz abgeleitet werden können. Aus der vollumfänglichen 3D-Zeitserie der Szene werden zunächst relevante Merkmale auf Basis von 2D-Rasterbildern durch Clustering extrahiert. Semiautomatisch lassen sich die extrahierten Segmente klassifizieren und so maßgeblichen geomorphologischen Prozessen zuweisen. Dieser Erkenntnisgewinn über den vorliegenden Datensatz wird in einem zweiten Schritt genutzt, um die Szene räumlich zu limitieren und in den Interessensbereichen tiefergehende Analysen durchzuführen. Auf Basis der Methoden «M3C2-EP mit adaptierter Kalman-Filterung» und «4D-Änderungsobjekten» werden zwei Analysetools vorgestellt und auf den Datensatz in Vals angewendet. Die Überwachung topographischer Oberflächenveränderungen mit PLS wird zunehmen und eine große Menge an Daten erzeugen. Diese Datensätze müssen verarbeitet, analysiert und gespeichert werden. Diese Dissertation trägt zum besseren Verständnis der Methodik bei. Anwender bekommen durch die Systematisierung der Methode ein besseres Verständnis über die beeinflussenden Faktoren entlang der Prozesskette von der Datenerfassung bis hin zur Darstellung relevanter Informationen. Mit dieser Dissertation wird eine Toolbox vorgestellt, die es ermöglicht, multitemporale Punktwolken mit Hilfe von unüberwachtem maschinellem Lernen automatisiert auszuwerten und Informationen dem Nutzer zur Verfügung zu stellen. Dieser Ansatz ist einfach und hat ein hohes Potential für die automatische Analyse in zukünftigen Anwendungen.:Kurzfassung i Abstract iii Danksagung v 1. Einleitung 1 2. Deformationsmonitoring mittels terrestrischer Laserscanner: Aktuelle Methoden, Regulierungen und technische Aspekte 5 2.1. Ingenieurgeodätische Überwachungsmessungen . . . . . . . . . . . . . . 5 2.2. Anforderungen an ein integratives Monitoring aus der Sicht eines ganzheitlichen Risikomanagements . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.1. Aktives Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.2. Ganzheitliches Risikomanagement . . . . . . . . . . . . . . . . . 9 2.2.3. Qualitätsbeurteilung und Qualitätssicherung . . . . . . . . . . . 12 2.2.4. Relevante Normen, Richtlinien und Merkblätter beim Einsatz von permanentem Laserscanning . . . . . . . . . . . . . . . . . 15 2.3. Terrestrisches Laserscanning . . . . . . . . . . . . . . . . . . . . . . . . 17 2.4. Permanentes Laserscanning . . . . . . . . . . . . . . . . . . . . . . . . 22 2.5. Parameter einer permanenten Installation eines Long Range Laserscanners 24 2.5.1. Registrierung und Georeferenzierung . . . . . . . . . . . . . . . 26 2.5.2. Geodätische Refraktion . . . . . . . . . . . . . . . . . . . . . . . 30 2.5.3. Geodätisches Monitoring mittels terrestrischer Laserscanner . . 36 2.6. Zusammenfassende Betrachtungen . . . . . . . . . . . . . . . . . . . . . 39 3. Ziel und abgeleiteter Untersuchungsschwerpunkt dieser Arbeit 41 4. Konzept eines Echtzeitassistenzsystems basierend auf PLS 43 5. Untersuchung von Einflussfaktoren auf die Messergebnisse von permanent installierten terrestrischen Long Range Laserscannern 47 5.1. Fallstudie I: Noordwijk / Niederlande . . . . . . . . . . . . . . . . . . . 47 5.1.1. Beschreibung der Daten . . . . . . . . . . . . . . . . . . . . . . 48 5.1.2. Methoden . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.1.3. Resultate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.1.4. Zusammenfassung und Diskussion . . . . . . . . . . . . . . . . . 57 5.2. Fallstudie II: Detektion von Corner Cube Prismen und deren Genauigkeit 58 5.2.1. Prismendetektion aus Daten eines TLS . . . . . . . . . . . . . . 59 5.2.2. Genauigkeitsanalyse der Prismendetektion . . . . . . . . . . . . 59 5.3. Fallstudie III: Valsertal (Tirol) / Österreich . . . . . . . . . . . . . . . 67 5.3.1. Beschreibung des Datensatzes . . . . . . . . . . . . . . . . . . . 68 5.3.2. Geodätische Refraktion als Einfluss auf die Messergebnisse eines PLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.3.3. Einfluss von Registrierungsparametern auf die Messergebnisse eines PLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 6. Informationsextraktion aus multitemporalen Punktwolken 95 6.1. Stufe I: Segmentierung räumlich verteilter Zeitreihen auf Basis von 2DBildrastern als Methode des unüberwachten maschinellen Lernens . . . 96 6.1.1. Extraktion von Zeitreihen aus den Punktwolken . . . . . . . . . 98 6.1.2. Zeitreihensegmentierung mittels k-Means-Algorithmen . . . . . 101 6.1.3. Zeitreihensegmentierung mittels extrahierter Merkmale auf Grundlage Gaußscher Mischmodelle (GMM) . . . . . . . . . . . . . . . 113 6.2. Stufe II: Zeitreihenanalyse von räumlich hochauflösenden 3D-Daten . . 122 6.2.1. M3C2-EP mit adaptiver Kalman-Filterung . . . . . . . . . . . . 122 6.2.2. 4D-Änderungsobjekte (4D-OBC) . . . . . . . . . . . . . . . . . 128 6.2.3. Zusammenfassung und Diskussion . . . . . . . . . . . . . . . . . 132 7. Fazit und Ausblick 135 A. Ergebnisse der Systemuntersuchung in Unna-Hemmerde (21.03.2022) 141 B. Ergebnisse der Zeitreihensegmentierung mittels k-Means 145 B.1. Ergebnistabellen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 B.2. Zeitreihen und räumliche Visualisierung - vollständiger Bereich . . . . . 148 B.3. Zeitreihen und räumliche Visualisierung - limitierter Bereich . . . . . . 161 C. Ergebnisse der Zeitreihensegmentierung mittels GMM 164 C.1. Ergebnistabellen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 C.2. Zeitreihen und räumliche Visualisierung - vollständiger Bereich . . . . . 166 Literaturverzeichnis 175 Abbildungsverzeichnis 195 Abkürzungsverzeichnis 199 Tabellenverzeichnis 201 Curriculum Vitae 203 / Climate change has an important impact on the scale and frequency with which the Earths surface is changing. This can be seen in various geomorphological change processes, such as gravitational natural hazards (rockfalls, landslides or debris flows), glacier melt in high mountain regions or the quantification of coastal dynamics on sandy beaches. Such events are triggered by increasingly prominent and extreme meteorological conditions. In this context, it is essential to implement preventive measures to protect the population as part of a risk management system. To safely manage these hazards, high quality three- and four-dimensional (3D and 4D) data sets of the Earth’s surface are required. Technological advances in metrology and the associated paradigm shift have significantly improved the ability to collect spatially and temporally distributed data. Progress from terrestrial laser scanners to communication-enabled, programmable multisensor systems, compact and robust design, long range and economically competitive systems allow a transition to a permanent laser scanning (PLS). PLS enables the acquisition of data from a fixed position to a target area kilometers away at high frequency and over a long period of time. In terms of adaptive monitoring, PLS is suitable for integration into near realtime assistance or early warning systems. However, in order to achieve acceptance of these systems, transparent, reproducible methods and processes for extracting information must be defined. The aim of this thesis is to present a methodological framework for PLS. Four crucial steps along the processing chain are identifiable: (i) collecting single epoch data, (ii) providing redundant data management and secure data communication to central servers, (iii) multi-temporal data analysis and (iv) reporting and presenting results to stakeholders. Two main research topics emerge from this processing chain. First, the qualitative assessment of the acquired point clouds, which focuses on the influence of different registration methods on the multitemporal point clouds and the influence of the atmosphere on the measured data. It is shown that ignoring these influences leads to significant deviations, which in turn can result in a misinterpretation of the derived information. It is also shown that there is still a lack of data-based procedures to account for these influences. The investigations are based on extensive data sets from Noordwijk/Netherlands and Vals/Austria. The second research topic addreses data analysis. The challenge is to analyse thousands of point clouds per measurement epoch. In this case, bitemporal methods are limited in their applicability. The thesis presents a two-step method to automatically extract information from the large data set. In the first step relevant features are extracted from the full 3D time series of the scene based on 2D raster images by clustering. The extracted segments can then be semi-automatically classified and assigned to relevant geomorphological processes. Based on this knowledge, the scene is, in the second step, spatially delimited. Deeper analyses can then be performed in areas of interest. Using the «M3C2-EP method with adapted Kalman filtering» and «4D objects-by-change», two analysis tools are presented and applied to the dataset in Vals. The monitoring of topographic surface changes with PLS will increase and generate large amounts of data. These data sets need to be processed, analysed and stored. This thesis contributes to a better understanding of the methodology. Users will gain a deeper understanding of the influencing factors along the processing chain from data acquisition to reporting of relevant information by applying the method in a systematic way. The dissertation presents a toolbox that enables automated evaluation of multitemporal point clouds using unsupervised machine learning and provides relevant information to the user. The approach is straightforward and simple and has a high potential for automated analysis in future applications.:Kurzfassung i Abstract iii Danksagung v 1. Einleitung 1 2. Deformationsmonitoring mittels terrestrischer Laserscanner: Aktuelle Methoden, Regulierungen und technische Aspekte 5 2.1. Ingenieurgeodätische Überwachungsmessungen . . . . . . . . . . . . . . 5 2.2. Anforderungen an ein integratives Monitoring aus der Sicht eines ganzheitlichen Risikomanagements . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.1. Aktives Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.2. Ganzheitliches Risikomanagement . . . . . . . . . . . . . . . . . 9 2.2.3. Qualitätsbeurteilung und Qualitätssicherung . . . . . . . . . . . 12 2.2.4. Relevante Normen, Richtlinien und Merkblätter beim Einsatz von permanentem Laserscanning . . . . . . . . . . . . . . . . . 15 2.3. Terrestrisches Laserscanning . . . . . . . . . . . . . . . . . . . . . . . . 17 2.4. Permanentes Laserscanning . . . . . . . . . . . . . . . . . . . . . . . . 22 2.5. Parameter einer permanenten Installation eines Long Range Laserscanners 24 2.5.1. Registrierung und Georeferenzierung . . . . . . . . . . . . . . . 26 2.5.2. Geodätische Refraktion . . . . . . . . . . . . . . . . . . . . . . . 30 2.5.3. Geodätisches Monitoring mittels terrestrischer Laserscanner . . 36 2.6. Zusammenfassende Betrachtungen . . . . . . . . . . . . . . . . . . . . . 39 3. Ziel und abgeleiteter Untersuchungsschwerpunkt dieser Arbeit 41 4. Konzept eines Echtzeitassistenzsystems basierend auf PLS 43 5. Untersuchung von Einflussfaktoren auf die Messergebnisse von permanent installierten terrestrischen Long Range Laserscannern 47 5.1. Fallstudie I: Noordwijk / Niederlande . . . . . . . . . . . . . . . . . . . 47 5.1.1. Beschreibung der Daten . . . . . . . . . . . . . . . . . . . . . . 48 5.1.2. Methoden . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.1.3. Resultate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.1.4. Zusammenfassung und Diskussion . . . . . . . . . . . . . . . . . 57 5.2. Fallstudie II: Detektion von Corner Cube Prismen und deren Genauigkeit 58 5.2.1. Prismendetektion aus Daten eines TLS . . . . . . . . . . . . . . 59 5.2.2. Genauigkeitsanalyse der Prismendetektion . . . . . . . . . . . . 59 5.3. Fallstudie III: Valsertal (Tirol) / Österreich . . . . . . . . . . . . . . . 67 5.3.1. Beschreibung des Datensatzes . . . . . . . . . . . . . . . . . . . 68 5.3.2. Geodätische Refraktion als Einfluss auf die Messergebnisse eines PLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.3.3. Einfluss von Registrierungsparametern auf die Messergebnisse eines PLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 6. Informationsextraktion aus multitemporalen Punktwolken 95 6.1. Stufe I: Segmentierung räumlich verteilter Zeitreihen auf Basis von 2DBildrastern als Methode des unüberwachten maschinellen Lernens . . . 96 6.1.1. Extraktion von Zeitreihen aus den Punktwolken . . . . . . . . . 98 6.1.2. Zeitreihensegmentierung mittels k-Means-Algorithmen . . . . . 101 6.1.3. Zeitreihensegmentierung mittels extrahierter Merkmale auf Grundlage Gaußscher Mischmodelle (GMM) . . . . . . . . . . . . . . . 113 6.2. Stufe II: Zeitreihenanalyse von räumlich hochauflösenden 3D-Daten . . 122 6.2.1. M3C2-EP mit adaptiver Kalman-Filterung . . . . . . . . . . . . 122 6.2.2. 4D-Änderungsobjekte (4D-OBC) . . . . . . . . . . . . . . . . . 128 6.2.3. Zusammenfassung und Diskussion . . . . . . . . . . . . . . . . . 132 7. Fazit und Ausblick 135 A. Ergebnisse der Systemuntersuchung in Unna-Hemmerde (21.03.2022) 141 B. Ergebnisse der Zeitreihensegmentierung mittels k-Means 145 B.1. Ergebnistabellen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 B.2. Zeitreihen und räumliche Visualisierung - vollständiger Bereich . . . . . 148 B.3. Zeitreihen und räumliche Visualisierung - limitierter Bereich . . . . . . 161 C. Ergebnisse der Zeitreihensegmentierung mittels GMM 164 C.1. Ergebnistabellen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 C.2. Zeitreihen und räumliche Visualisierung - vollständiger Bereich . . . . . 166 Literaturverzeichnis 175 Abbildungsverzeichnis 195 Abkürzungsverzeichnis 199 Tabellenverzeichnis 201 Curriculum Vitae 203 info:eu-repo/classification/ddc/520 ddc:520 Naturkatastrophe Katastrophenschutz Laserscanner Echtzeitsystem Echtzeitverarbeitung Monitoring Umweltüberwachung Terrestrisches Laserscanning Datenaufzeichnung Datenerfassung Datenübertragung Datenverarbeitung
96	VISUAL ANALYTICS OF BIG DATA FROM MOLECULAR DYNAMICS SIMULATION Catherine Jenifer Rajam Rajendran (5931113) 03 February 2023 (has links) <p>Protein malfunction can cause human diseases, which makes the protein a target in the process of drug discovery. In-depth knowledge of how protein functions can widely contribute to the understanding of the mechanism of these diseases. Protein functions are determined by protein structures and their dynamic properties. Protein dynamics refers to the constant physical movement of atoms in a protein, which may result in the transition between different conformational states of the protein. These conformational transitions are critically important for the proteins to function. Understanding protein dynamics can help to understand and interfere with the conformational states and transitions, and thus with the function of the protein. If we can understand the mechanism of conformational transition of protein, we can design molecules to regulate this process and regulate the protein functions for new drug discovery. Protein Dynamics can be simulated by Molecular Dynamics (MD) Simulations.</p> <p>The MD simulation data generated are spatial-temporal and therefore very high dimensional. To analyze the data, distinguishing various atomic interactions within a protein by interpreting their 3D coordinate values plays a significant role. Since the data is humongous, the essential step is to find ways to interpret the data by generating more efficient algorithms to reduce the dimensionality and developing user-friendly visualization tools to find patterns and trends, which are not usually attainable by traditional methods of data process. The typical allosteric long-range nature of the interactions that lead to large conformational transition, pin-pointing the underlying forces and pathways responsible for the global conformational transition at atomic level is very challenging. To address the problems, Various analytical techniques are performed on the simulation data to better understand the mechanism of protein dynamics at atomic level by developing a new program called Probing Long-distance interactions by Tapping into Paired-Distances (PLITIP), which contains a set of new tools based on analysis of paired distances to remove the interference of the translation and rotation of the protein itself and therefore can capture the absolute changes within the protein.</p> <p>Firstly, we developed a tool called Decomposition of Paired Distances (DPD). This tool generates a distance matrix of all paired residues from our simulation data. This paired distance matrix therefore is not subjected to the interference of the translation or rotation of the protein and can capture the absolute changes within the protein. This matrix is then decomposed by DPD</p> <p>using Principal Component Analysis (PCA) to reduce dimensionality and to capture the largest structural variation. To showcase how DPD works, two protein systems, HIV-1 protease and 14-3-3 σ, that both have tremendous structural changes and conformational transitions as displayed by their MD simulation trajectories. The largest structural variation and conformational transition were captured by the first principal component in both cases. In addition, structural clustering and ranking of representative frames by their PC1 values revealed the long-distance nature of the conformational transition and locked the key candidate regions that might be responsible for the large conformational transitions.</p> <p>Secondly, to facilitate further analysis of identification of the long-distance path, a tool called Pearson Coefficient Spiral (PCP) that generates and visualizes Pearson Coefficient to measure the linear correlation between any two sets of residue pairs is developed. PCP allows users to fix one residue pair and examine the correlation of its change with other residue pairs.</p> <p>Thirdly, a set of visualization tools that generate paired atomic distances for the shortlisted candidate residue and captured significant interactions among them were developed. The first tool is the Residue Interaction Network Graph for Paired Atomic Distances (NG-PAD), which not only generates paired atomic distances for the shortlisted candidate residues, but also display significant interactions by a Network Graph for convenient visualization. Second, the Chord Diagram for Interaction Mapping (CD-IP) was developed to map the interactions to protein secondary structural elements and to further narrow down important interactions. Third, a Distance Plotting for Direct Comparison (DP-DC), which plots any two paired distances at user’s choice, either at residue or atomic level, to facilitate identification of similar or opposite pattern change of distances along the simulation time. All the above tools of PLITIP enabled us to identify critical residues contributing to the large conformational transitions in both HIV-1 protease and 14-3-3σ proteins.</p> <p>Beside the above major project, a side project of developing tools to study protein pseudo-symmetry is also reported. It has been proposed that symmetry provides protein stability, opportunities for allosteric regulation, and even functionality. This tool helps us to answer the questions of why there is a deviation from perfect symmetry in protein and how to quantify it.</p> Applications in life sciences Spatial data and applications Semi- and unsupervised learning Visual Analytics Data Visualization Principal Component Analysis Parallel Computing Pearson Coefficient Correlation Protein Structure Analysis Molecular Dynamics Simulation Study Paired-Distance Spatial-Temporal Data Pseudo-Symmetry
97	Spatially Correlated Data Accuracy Estimation Models in Wireless Sensor Networks Karjee, Jyotirmoy January 2013 (has links) (PDF) One of the major applications of wireless sensor networks is to sense accurate and reliable data from the physical environment with or without a priori knowledge of data statistics. To extract accurate data from the physical environment, we investigate spatial data correlation among sensor nodes to develop data accuracy models. We propose three data accuracy models namely Estimated Data Accuracy (EDA) model, Cluster based Data Accuracy (CDA) model and Distributed Cluster based Data Accuracy (DCDA) model with a priori knowledge of data statistics. Due to the deployment of high density of sensor nodes, observed data are highly correlated among sensor nodes which form distributed clusters in space. We describe two clustering algorithms called Deterministic Distributed Clustering (DDC) algorithm and Spatial Data Correlation based Distributed Clustering (SDCDC) algorithm implemented under CDA model and DCDA model respectively. Moreover, due to data correlation in the network, it has redundancy in data collected by sensor nodes. Hence, it is not necessary for all sensor nodes to transmit their highly correlated data to the central node (sink node or cluster head node). Even an optimal set of sensor nodes are capable of measuring accurate data and transmitting the accurate, precise data to the central node. This reduces data redundancy, energy consumption and data transmission cost to increase the lifetime of sensor networks. Finally, we propose a fourth accuracy model called Adaptive Data Accuracy (ADA) model that doesn't require any a priori knowledge of data statistics. ADA model can sense continuous data stream at regular time intervals to estimate accurate data from the environment and select an optimal set of sensor nodes for data transmission to the network. Data transmission can be further reduced for these optimal sensor nodes by transmitting a subset of sensor data using a methodology called Spatio-Temporal Data Prediction (STDP) model under data reduction strategies. Furthermore, we implement data accuracy model when the network is under a threat of malicious attack. Wireless Sensor Networks Sensor Network Models Data Accuracy Estimation Models Spatio-Temporal Data Estimation Adaptive Data Accuracy (ADA) Model Estimated Data Accuracy Model (EDA) Cluster Data Accuracy Model (CDA) Distributed Clustering Distributed Clusters - Data Estimation Communication Engineering
98	Arquitectura de un sistema de geo-visualización espacio-temporal de actividad delictiva, basada en el análisis masivo de datos, aplicada a sistemas de información de comando y control (C2IS) Salcedo González, Mayra Liliana 03 April 2023 (has links) [ES] La presente tesis doctoral propone la arquitectura de un sistema de Geo-visualización Espaciotemporal de actividad delictiva y criminal, para ser aplicada a Sistemas de Comando y Control (C2S) específicamente dentro de sus Sistemas de Información de Comando y Control (C2IS). El sistema de Geo-visualización Espaciotemporal se basa en el análisis masivo de datos reales de actividad delictiva, proporcionado por la Policía Nacional Colombiana (PONAL) y está compuesto por dos aplicaciones diferentes: la primera permite al usuario geo-visualizar espaciotemporalmente de forma dinámica, las concentraciones, tendencias y patrones de movilidad de esta actividad dentro de la extensión de área geográfica y el rango de fechas y horas que se precise, lo cual permite al usuario realizar análisis e interpretaciones y tomar decisiones estratégicas de acción más acertadas; la segunda aplicación permite al usuario geo-visualizar espaciotemporalmente las predicciones de la actividad delictiva en periodos continuos y cortos a modo de tiempo real, esto también dentro de la extensión de área geográfica y el rango de fechas y horas de elección del usuario. Para estas predicciones se usaron técnicas clásicas y técnicas de Machine Learning (incluido el Deep Learning), adecuadas para el pronóstico en multiparalelo de varios pasos de series temporales multivariantes con datos escasos. Las dos aplicaciones del sistema, cuyo desarrollo se muestra en esta tesis, están realizadas con métodos novedosos que permitieron lograr estos objetivos de efectividad a la hora de detectar el volumen y los patrones y tendencias en el desplazamiento de dicha actividad, mejorando así la conciencia situacional, la proyección futura y la agilidad y eficiencia en los procesos de toma de decisiones, particularmente en la gestión de los recursos destinados a la disuasión, prevención y control del delito, lo cual contribuye a los objetivos de ciudad segura y por consiguiente de ciudad inteligente, dentro de arquitecturas de Sistemas de Comando y Control (C2S) como en el caso de los Centros de Comando y Control de Seguridad Ciudadana de la PONAL. / [CA] Aquesta tesi doctoral proposa l'arquitectura d'un sistema de Geo-visualització Espaitemporal d'activitat delictiva i criminal, per ser aplicada a Sistemes de Comandament i Control (C2S) específicament dins dels seus Sistemes d'informació de Comandament i Control (C2IS). El sistema de Geo-visualització Espaitemporal es basa en l'anàlisi massiva de dades reals d'activitat delictiva, proporcionada per la Policia Nacional Colombiana (PONAL) i està composta per dues aplicacions diferents: la primera permet a l'usuari geo-visualitzar espaitemporalment de forma dinàmica, les concentracions, les tendències i els patrons de mobilitat d'aquesta activitat dins de l'extensió d'àrea geogràfica i el rang de dates i hores que calgui, la qual cosa permet a l'usuari fer anàlisis i interpretacions i prendre decisions estratègiques d'acció més encertades; la segona aplicació permet a l'usuari geovisualitzar espaciotemporalment les prediccions de l'activitat delictiva en períodes continus i curts a mode de temps real, això també dins l'extensió d'àrea geogràfica i el rang de dates i hores d'elecció de l'usuari. Per a aquestes prediccions es van usar tècniques clàssiques i tècniques de Machine Learning (inclòs el Deep Learning), adequades per al pronòstic en multiparal·lel de diversos passos de sèries temporals multivariants amb dades escasses. Les dues aplicacions del sistema, el desenvolupament de les quals es mostra en aquesta tesi, estan realitzades amb mètodes nous que van permetre assolir aquests objectius d'efectivitat a l'hora de detectar el volum i els patrons i les tendències en el desplaçament d'aquesta activitat, millorant així la consciència situacional , la projecció futura i l'agilitat i eficiència en els processos de presa de decisions, particularment en la gestió dels recursos destinats a la dissuasió, prevenció i control del delicte, la qual cosa contribueix als objectius de ciutat segura i per tant de ciutat intel·ligent , dins arquitectures de Sistemes de Comandament i Control (C2S) com en el cas dels Centres de Comandament i Control de Seguretat Ciutadana de la PONAL. / [EN] This doctoral thesis proposes the architecture of a Spatiotemporal Geo-visualization system of criminal activity, to be applied to Command and Control Systems (C2S) specifically within their Command and Control Information Systems (C2IS). The Spatiotemporal Geo-visualization system is based on the massive analysis of real data of criminal activity, provided by the Colombian National Police (PONAL) and is made up of two different applications: the first allows the user to dynamically geo-visualize spatiotemporally, the concentrations, trends and patterns of mobility of this activity within the extension of the geographic area and the range of dates and times that are required, which allows the user to carry out analyses and interpretations and make more accurate strategic action decisions; the second application allows the user to spatially visualize the predictions of criminal activity in continuous and short periods like in real time, this also within the extension of the geographic area and the range of dates and times of the user's choice. For these predictions, classical techniques and Machine Learning techniques (including Deep Learning) were used, suitable for multistep multiparallel forecasting of multivariate time series with sparse data. The two applications of the system, whose development is shown in this thesis, are carried out with innovative methods that allowed achieving these effectiveness objectives when detecting the volume and patterns and trends in the movement of said activity, thus improving situational awareness, the future projection and the agility and efficiency in the decision-making processes, particularly in the management of the resources destined to the dissuasion, prevention and control of crime, which contributes to the objectives of a safe city and therefore of a smart city, within architectures of Command and Control Systems (C2S) as in the case of the Citizen Security Command and Control Centers of the PONAL. / Salcedo González, ML. (2023). Arquitectura de un sistema de geo-visualización espacio-temporal de actividad delictiva, basada en el análisis masivo de datos, aplicada a sistemas de información de comando y control (C2IS) [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/192685 Temporal data space Predictive geo-visualisation Command and control systems (C2S) Efficiency improvement Decision-making Future projection Real-time systems Sparse data Multivariate time series Forecasting of criminal activity Smart city Dynamic geo-visualisation of data Situational awareness Mobility of criminal activity Vector autoregressive (VAR) Multilayer perceptron (MLP) Espacio temporal de datos Geo-visualización predictiva Sistemas de mando y control (C2S) Mejora de la eficiencia Toma de decisiones Proyección futura Sistemas de tiempo real Datos dispersos Pronóstico multipaso y multiparalelo Series temporales multivariantes Pronóstico de actividad delictiva Ciudad segura Ciudad inteligente Geo-visualización dinámica de datos Conciencia situacional Movilidad de actividad delictiva INGENIERÍA TELEMÁTICA

Search results