Global ETD Search

71	Une approche automatisée basée sur des contraintes d’intégrité définies en UML et OCL pour la vérification de la cohérence logique dans les systèmes SOLAP : applications dans le domaine agri-environnemental / An automated approach based on integrity constraints defined in UML and OCL for the verification of logical consistency in SOLAP systems : applications in the agri-environmental field Boulil, Kamal 26 October 2012 (has links) Les systèmes d'Entrepôts de Données et OLAP spatiaux (EDS et SOLAP) sont des technologies d'aide à la décision permettant l'analyse multidimensionnelle de gros volumes de données spatiales. Dans ces systèmes, la qualité de l'analyse dépend de trois facteurs : la qualité des données entreposées, la qualité des agrégations et la qualité de l’exploration des données. La qualité des données entreposées dépend de critères comme la précision, l'exhaustivité et la cohérence logique. La qualité d'agrégation dépend de problèmes structurels (e.g. les hiérarchies non strictes qui peuvent engendrer le comptage en double des mesures) et de problèmes sémantiques (e.g. agréger les valeurs de température par la fonction Sum peut ne pas avoir de sens considérant une application donnée). La qualité d'exploration est essentiellement affectée par des requêtes utilisateur inconsistantes (e.g. quelles ont été les valeurs de température en URSS en 2010 ?). Ces requêtes peuvent engendrer des interprétations erronées des résultats. Cette thèse s'attaque aux problèmes d'incohérence logique qui peuvent affecter les qualités de données, d'agrégation et d'exploration. L'incohérence logique est définie habituellement comme la présence de contradictions dans les données. Elle est typiquement contrôlée au moyen de Contraintes d'Intégrité (CI). Dans cette thèse nous étendons d'abord la notion de CI (dans le contexte des systèmes SOLAP) afin de prendre en compte les incohérences relatives aux agrégations et requêtes utilisateur. Pour pallier les limitations des approches existantes concernant la définition des CI SOLAP, nous proposons un Framework basé sur les langages standards UML et OCL. Ce Framework permet la spécification conceptuelle et indépendante des plates-formes des CI SOLAP et leur implémentation automatisée. Il comporte trois parties : (1) Une classification des CI SOLAP. (2) Un profil UML implémenté dans l'AGL MagicDraw, permettant la représentation conceptuelle des modèles des systèmes SOLAP et de leurs CI. (3) Une implémentation automatique qui est basée sur les générateurs de code Spatial OCL2SQL et UML2MDX qui permet de traduire les spécifications conceptuelles en code au niveau des couches EDS et serveur SOLAP. Enfin, les contributions de cette thèse ont été appliquées dans le cadre de projets nationaux de développement d'applications (S)OLAP pour l'agriculture et l'environnement. / Spatial Data Warehouse (SDW) and Spatial OLAP (SOLAP) systems are Business Intelligence (BI) allowing for interactive multidimensional analysis of huge volumes of spatial data. In such systems the quality ofanalysis mainly depends on three components : the quality of warehoused data, the quality of data aggregation, and the quality of data exploration. The warehoused data quality depends on elements such accuracy, comleteness and logical consistency. The data aggregation quality is affected by structural problems (e.g., non-strict dimension hierarchies that may cause double-counting of measure values) and semantic problems (e.g., summing temperature values does not make sens in many applications). The data exploration quality is mainly affected by inconsistent user queries (e.g., what are temperature values in USSR in 2010?) leading to possibly meaningless interpretations of query results. This thesis address the problems of logical inconsistency that may affect the data, aggregation and exploration qualities in SOLAP. The logical inconsistency is usually defined as the presence of incoherencies (contradictions) in data ; It is typically controlled by means of Integrity Constraints (IC). In this thesis, we extends the notion of IC (in the SOLAP domain) in order to take into account aggregation and query incoherencies. To overcome the limitations of existing approaches concerning the definition of SOLAP IC, we propose a framework that is based on the standard languages UML and OCL. Our framework permits a plateforme-independent conceptual design and an automatic implementation of SOLAP IC ; It consists of three parts : (1) A SOLAP IC classification, (2) A UML profile implemented in the CASE tool MagicDraw, allowing for a conceptual design of SOLAP models and their IC, (3) An automatic implementation based on the code generators Spatial OCLSQL and UML2MDX, which allows transforming the conceptual specifications into code. Finally, the contributions of this thesis have been experimented and validated in the context of French national projetcts aimming at developping (S)OLAP applications for agriculture and environment. OLAP Spatial Entrepôt de Données Spatiales Qualité de Données Qualité d'Agrégation de Données Qualité d'Exploration SOLAP Modélisation Multidimensionnelle Profil UML Langage de Contraintes Objet Génération de Code Spatial OLAP Spatial Data Warehouse Data Quality Data Aggregation Quality SOLAP Exploration Quality Multidimensional Modelling UML Profile Object Constraint Language Code Generation
72	Non-Parametric Clustering of Multivariate Count Data Tekumalla, Lavanya Sita January 2017 (has links) (PDF) The focus of this thesis is models for non-parametric clustering of multivariate count data. While there has been significant work in Bayesian non-parametric modelling in the last decade, in the context of mixture models for real-valued data and some forms of discrete data such as multinomial-mixtures, there has been much less work on non-parametric clustering of Multi-variate Count Data. The main challenges in clustering multivariate counts include choosing a suitable multivariate distribution that adequately captures the properties of the data, for instance handling over-dispersed data or sparse multivariate data, at the same time leveraging the inherent dependency structure between dimensions and across instances to get meaningful clusters. As the first contribution, this thesis explores extensions to the Multivariate Poisson distribution, proposing efficient algorithms for non-parametric clustering of multivariate count data. While Poisson is the most popular distribution for count modelling, the Multivariate Poisson often leads to intractable inference and a suboptimal t of the data. To address this, we introduce a family of models based on the Sparse-Multivariate Poisson, that exploit the inherent sparsity in multivariate data, reducing the number of latent variables in the formulation of Multivariate Poisson leading to a better t and more efficient inference. We explore Dirichlet process mixture model extensions and temporal non-parametric extensions to models based on the Sparse Multivariate Poisson for practical use of Poisson based models for non-parametric clustering of multivariate counts in real-world applications. As a second contribution, this thesis addresses moving beyond the limitations of Poisson based models for non-parametric clustering, for instance in handling over dispersed data or data with negative correlations. We explore, for the first time, marginal independent inference techniques based on the Gaussian Copula for multivariate count data in the Dirichlet Process mixture model setting. This enables non-parametric clustering of multivariate counts without limiting assumptions that usually restrict the marginal to belong to a particular family, such as the Poisson or the negative-binomial. This inference technique can also work for mixed data (combination of counts, binary and continuous data) enabling Bayesian non-parametric modelling to be used for a wide variety of data types. As the third contribution, this thesis addresses modelling a wide range of more complex dependencies such as asymmetric and tail dependencies during non-parametric clustering of multivariate count data with Vine Copula based Dirichlet process mixtures. While vine copula inference has been well explored for continuous data, it is still a topic of active research for multivariate counts and mixed multivariate data. Inference for multivariate counts and mixed data is a hard problem owing to ties that arise with discrete marginal. An efficient marginal independent inference approach based on extended rank likelihood, based on recent work in the statistics literature, is proposed in this thesis, extending the use vines for multivariate counts and mixed data in practical clustering scenarios. This thesis also explores the novel systems application of Bulk Cache Preloading by analysing I/O traces though predictive models for temporal non-parametric clustering of multivariate count data. State of the art techniques in the caching domain are limited to exploiting short-range correlations in memory accesses at the milli-second granularity or smaller and cannot leverage long range correlations in traces. We explore for the first time, Bulk Cache Preloading, the process of pro-actively predicting data to load into cache, minutes or hours before the actual request from the application, by leveraging longer range correlation at the granularity of minutes or hours. This enables the development of machine learning techniques tailored for caching due to relaxed timing constraints. Our approach involves a data aggregation process, converting I/O traces into a temporal sequence of multivariate counts, that we analyse with the temporal non-parametric clustering models proposed in this thesis. While the focus of our thesis is models for non-parametric clustering for discrete data, particularly multivariate counts, we also hope our work on bulk cache preloading paves the way to more inter-disciplinary research for using data mining techniques in the systems domain. As an additional contribution, this thesis addresses multi-level non-parametric admixture modelling for discrete data in the form of grouped categorical data, such as document collections. Non-parametric clustering for topic modelling in document collections, where a document is as-associated with an unknown number of semantic themes or topics, is well explored with admixture models such as the Hierarchical Dirichlet Process. However, there exist scenarios, where a doc-ument requires being associated with themes at multiple levels, where each theme is itself an admixture over themes at the previous level, motivating the need for multilevel admixtures. Consider the example of non-parametric entity-topic modelling of simultaneously learning entities and topics from document collections. This can be realized by modelling a document as an admixture over entities while entities could themselves be modeled as admixtures over topics. We propose the nested Hierarchical Dirichlet Process to address this gap and apply a two level version of our model to automatically learn author entities and topics from research corpora. Multivariate Count Data Clustering Mixture Models Non-parametric Clustering Bulk Cache Preloading Dirichlet Process Mixture Models Spatio-Temporal Data Aggregation Sparse Multivariate Poisson MultiVariate Poisson (MVP) Copulas Nested Hierarchical Dirichlet Processes Dirichlet Process Mixtures Sparse-Multivariate Poisson Dirichlet Process Mixture Model Computer Science
73	Wireless Sensor Networks : Bit Transport Maximization and Delay Efficient Function Computation Shukla, Samta January 2013 (has links) (PDF) We consider a wireless sensor network, in which end users are interested in maximizing the useful information supplied by the network till network partition due to inevitable node deaths. Neither throughput maximization nor network lifetime maximization achieves the objective: A network with high throughput provides information at a high rate, but can exhaust the nodes of their energies quickly; similarly, a network can achieve a long lifetime by remaining idle for most of the time. We propose and seek to maximize a new metric: “Aggregate bit transported before network partition” (a product of throughput and lifetime), which precisely captures the usefulness of sensor networks. We model the links in the wireless sensor network as wired links with reduced equivalent capacities, formulate and solve the problem of maximizing bits transported before network partition on arbitrary networks. To assess the benefits that network coding can yield for the same objective, we study a scenario where the coding-capable nodes are placed on a regular grid. We propose an optimal algorithm to choose the minimum number of coding points in the grid to ensure energy efficiency. Our results show that, even with simple XOR coding, the bits transported can increase up to 83 % of that without coding. Further, we study the problem of in-network data aggregation in a wireless sensor network to achieve minimum delay. The nodes in the network compute and forward data as per a query graph, which allows operations belonging to a general class of functions. We aim to extract the best sub-network that achieves the minimum delay. We design an algorithm to schedule the sub-network such that the computed data reaches sink at the earliest. We consider directed acyclic query graphs as opposed to the existing work which considers tree query graphs only. Wireless Sensor Networks Wireless Sensor Networks - Computation Network Coding Network Coded Bit Maximization Arbitrary Wireless Sensor Networks In-Network Data Aggregation Arbitrary Wireless Network Arbitrary Wireless Sensor Network Wireless Sensor Network Communication Engineering
74	Securing sensor network Zare Afifi, Saharnaz January 2014 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / A wireless sensor network consists of lightweight nodes with a limited power source. They can be used in a variety of environments, especially in environments for which it is impossible to utilize a wired network. They are easy/fast to deploy. Nodes collect data and send it to a processing center (base station) to be analyzed, in order to detect an event and/or determine information/characteristics of the environment. The challenges for securing a sensor network are numerous. Nodes in this network have a limited amount of power, therefore they could be faulty because of a lack of battery power and broadcast faulty information to the network. Moreover, nodes in this network could be prone to different attacks from an adversary who tries to eavesdrop, modify or repeat the data which is collected by other nodes. Nodes may be mobile. There is no possibility of having a fixed infrastructure. Because of the importance of extracting information from the data collected by the sensors in the network there needs to be some level of security to provide trustworthy information. The goal of this thesis is to organize part of the network in an energy efficient manner in order to produce a suitable amount of integrity/security. By making nodes monitor each other in small organized clusters we increase security with a minimal energy cost. To increase the security of the network we use cryptographic techniques such as: public/ private key, manufacturer signature, cluster signature, etc. In addition, nodes monitor each other's activity in the network, we call it a "neighborhood watch" In this case, if a node does not forward data, or modifies it, and other nodes which are in their transmission range can send a claim against that node. Cluster-Base Network Threshold Signature Data Aggregation Wireless Network Ad-hoc Network Neighborhood Watch Threshold signatures -- Research Computer networks -- Security measures Wireless communication systems Wireless LANs Group signatures (Computer security) Data protection Computer security -- Management Sensor networks -- Research Pattern recognition systems Computers -- Access control Computer network protocols
75	Database centric software test management framework for test metrics Pleehajinda, Parawee 13 July 2015 (has links) Big amounts of test data generated by the current used software testing tools (QA-C/QA-C++ and Cantata) contain a variety of different values. The variances cause enormous challenges in data aggregation and interpretation that directly affect generation of test metrics. Due to the circumstance of data processing, this master thesis introduces a database-centric test management framework for test metrics aims at centrally handling the big data as well as facilitating the generation of test metrics. Each test result will be individually parsed to be a particular format before being stored in a centralized database. A friendly front-end user interface is connected and synchronized with the database that allows authorized users to interact with the stored data. With a granularity tracking mechanism, any stored data will be systematically located and programmatically interpreted by a test metrics generator to create various kinds of high-quality test metrics. The automatization of the framework is driven by Jenkins CI to automatically and periodically performing the sequential operations. The technology greatly and effectively optimizes and reduces effort in the development, as well as enhance the performance of the software testing processes. In this research, the framework is only started at managing the testing processes on software-unit level. However, because of the independence of the database from levels of software testing, it could also be expanded to support software development at any level. info:eu-repo/classification/ddc/004 ddc:004 info:eu-repo/classification/ddc/005 ddc:005

Page generated in 0.2246 seconds