Global ETD Search

1	Metadata-Driven Data Integration Nadal Francesch, Sergi 16 May 2019 (has links) (PDF) Data has an undoubtable impact on society. Storing and processing large amounts of available data is currently one of the key success factors for an organization. Nonetheless, we are recently witnessing a change represented by huge and heterogeneous amounts of data. Indeed, 90% of the data in the world has been generated in the last two years. Thus, in order to carry on these data exploitation tasks, organizations must first perform data integration combining data from multiple sources to yield a unified view over them. Yet, the integration of massive and heterogeneous amounts of data requires revisiting the traditional integration assumptions to cope with the new requirements posed by such data-intensive settings.This PhD thesis aims to provide a novel framework for data integration in the context of data-intensive ecosystems, which entails dealing with vast amounts of heterogeneous data, from multiple sources and in their original format. To this end, we advocate for an integration process consisting of sequential activities governed by a semantic layer, implemented via a shared repository of metadata. From an stewardship perspective, this activities are the deployment of a data integration architecture, followed by the population of such shared metadata. From a data consumption perspective, the activities are virtual and materialized data integration, the former an exploratory task and the latter a consolidation one. Following the proposed framework, we focus on providing contributions to each of the four activities.We begin proposing a software reference architecture for semantic-aware data-intensive systems. Such architecture serves as a blueprint to deploy a stack of systems, its core being the metadata repository. Next, we propose a graph-based metadata model as formalism for metadata management. We focus on supporting schema and data source evolution, a predominant factor on the heterogeneous sources at hand. For virtual integration, we propose query rewriting algorithms that rely on the previously proposed metadata model. We additionally consider semantic heterogeneities in the data sources, which the proposed algorithms are capable of automatically resolving. Finally, the thesis focuses on the materialized integration activity, and to this end, proposes a method to select intermediate results to materialize in data-intensive flows. Overall, the results of this thesis serve as contribution to the field of data integration in contemporary data-intensive ecosystems. / Doctorat en Sciences de l'ingénieur et technologie / info:eu-repo/semantics/nonPublished Informatique générale data integration metadata
2	On the automated verification of symmetric-key cryptographic algorithms: an approach based on SAT-solvers Lafitte, Frédéric 19 September 2017 (has links) A cryptographic protocol is a structured exchange of messages protected by means of cryptographic algorithms. Computer security in general relies heavily on these protocols and algorithms; in turn, these rely absolutely on smaller components called primitives. As technology advances, computers have reached a cost and a degree of miniaturisation conducive to their proliferation throughout society in the form of software-controlled network-enabled things. As these things find their way into environments where security is critical, their protection ultimately relies on primitives; if a primitive fails, all security solutions (protocols, policies, etc.) that are built on top of it are likely to offer no security at all. Lightweight symmetric-key primitives, in particular, will play a critical role.The security of protocols is frequently verified using formal and automated methods. Concerning algorithms and public-key primitives, formal proofs are often used, although they are somewhat error prone and current efforts aim to automate them. On the other hand, symmetric-key primitives are still currently analysed in a rather ad-hoc manner. Since their security is only guaranteed by the test-of-time, they traditionally have a built-in security margin. Despite being paramount to the security of embedded devices, lightweight primitives appear to have a smaller security margin and researchers would greatly benefit from automated tools in order to strengthen tests-of-time.In their seminal work back in 2000, Massacci and Marraro proposed to formulate primitives in propositional logic and to use SAT solvers to automatically verify their properties. At that time, SAT solvers were quite different from what they have become today; the continuous improvement of their performance makes them an even better choice for a verification back-end. The performance of SAT solvers improved so much that starting around 2006, some cryptanalysts started to use them, but mostly in order to speedup their attacks. This thesis introduces the framework CryptoSAT and shows its advantages for the purpose of verification. / La sécurité informatique repose en majeure partie sur des mécanismes cryptographiques, qui à leur tour dépendent de composants encore plus fondamentaux appelés primitives ;si une primitive échoue, toute la sécurité qui en dépend est vouée à l'échec. Les ordinateurs ont atteint un coût et un degré de miniaturisation propices à leur prolifération sous forme de systèmes embarqués (ou enfouis) qui offrent généralement peu de ressources calculatoires, notamment dans des environnements où la sécurité est primordiale. Leur sécurité repose donc lourdement sur les primitives dites à clé symétrique, puisque ce sont celles qui sont le mieux adaptées aux ressources limitées dont disposent les systèmes embarqués. Il n'est pas mathématiquement prouvé que les primitives à clé symétrique soient dépourvues de failles de sécurité, contrairement à tous les autres mécanismes cryptographiques :alors que la protection qu'offre la cryptographie peut, en général, être prouvée de façon formelle (dans un modèle limité) et parfois au moyen de méthodes automatisées qui laissent peu de place à l'erreur, la protection qu'offrent les primitives à clé symétrique n'est garantie que par “l'épreuve du temps”, c.-à-d. par la résistance (durable) de ces primitives face aux attaques conçues par la communauté des chercheurs en cryptologie. Pour compenser l'absence de garanties formelles, ces primitives sont traditionnellement pourvues d'une “marge de sécurité”, c.-à-d. de calculs supplémentaires, juste au cas où, dont le coût est difficile à justifier lorsque les ressources calculatoires sont rares.Afin de pallier à l'insuffisance de l'épreuve du temps et à la diminution des marges de sécurité, cette thèse revient sur les travaux de Massacci et Marraro qui, en 2000, avaient proposé de formuler les primitives en logique propositionnelle de sorte que leurs propriétés puissent être vérifiées automatiquement au moyen d'algorithmes SAT. A cette époque, les algorithmes SAT étaient très différents de ce qu'ils sont devenus aujourd'hui ;l'amélioration de leur performance, continuelle au fil des années, en fait un choix encore plus judicieux comme moteur de vérification. Dans le cadre de cette thèse, une méthode a été développée pour permettre à un cryptologue de facilement vérifier les propriétés d'une primitive à clé symétrique de façon formelle et automatique à l'aide d'algorithmes SAT, tout en lui permettant de faire abstraction de la logique propositionnelle. L'utilité de la méthode a ensuite été mise en évidence en obtenant des réponses formelles à des questions, posées dans la littérature en cryptanalyse, concernant des failles potentielles tant au niveau de la conception qu'au niveau de la mise en oeuvre de certaines primitives. / Doctorat en Sciences / info:eu-repo/semantics/nonPublished Informatique générale Boolean satisfiability Cryptanalysis Automated verification
3	Graph Data Warehousing: Database and Multidimensional Modeling of Graphs Ghrab, Amine 29 October 2020 (has links) (PDF) Over the last decade, we have witnessed the emergence of networks in a wide spectrum of application domains, ranging from social and information networks to biological and transportation networks.Graphs provide a solid theoretical foundation for modeling complex networks and revealing valuable insights from both the network structure and the data embedded within its entities.As the business and social environments are getting increasingly complex and interconnected, graphs became a widespread abstraction at the core of the information infrastructure supporting those environments. Modern information systems consist of a large number of sophisticated and interacting business entities that naturally form graphs. In particular, integrating graphs into data warehouse systems received a lot of interest from both academia and industry. Indeed, data warehouses are the central enterprise's information repository and are critical for proper decision support and future planning. Graph warehousing is emerging as the field that extends current information systems with graph management and analytics capabilities. Many approaches were proposed to address the graph data warehousing challenge. These efforts laid the foundation for multidimensional modeling and analysis of graphs. However, most of the proposed approaches partially tackle the graph warehousing problem by being restricted to simple abstractions such as homogeneous graphs or ignoring important topics such as multidimensional integrity constraints and dimension hierarchies.In this dissertation, we conduct a systematic study of the graph data warehousing topic and address the key challenges of database and multidimensional modeling of graphs.We first propose GRAD, a new graph database model tailored for graph warehousing and OLAP analytics. GRAD aims to provide analysts with a set of simple, well-defined, and adaptable conceptual components to support rich semantics and perform complex analysis on graphs.Then, we define the multidimensional concepts for heterogeneous attributed graphs and highlight the new types of measures that could be derived. We project this multidimensional model on property graphs and explore how to extract the candidate multidimensional concepts and build graph cubes. Then, we extend the multidimensional model by integrating GRAD and show how GRAD facilitates multidimensional graph modeling, and enables supporting dimension hierarchies and building new types of OLAP cubes on graphs.Afterward, we present TopoGraph, a graph data warehousing framework that extends current graph warehousing models with new types of cubes and queries combining graph-oriented and OLAP querying. TopoGraph goes beyond traditional OLAP cubes, which process value-based grouping of tables, by considering also the topological properties of the graph elements. And it goes beyond current graph warehousing models by proposing new types of graph cubes. These cubes embed a rich repertoire of measures that could be represented with numerical values, with entire graphs, or as a combination of them.Finally, we propose an architecture of the graph data warehouse and describe its main building blocks and the remaining gaps. The various components of the graph warehousing framework can be effectively leveraged as a foundation for designing and building industry-grade graph data warehouses.We believe that our research in this thesis brings us a step closer towards a better understanding of graph warehousing. Yet, the models and framework we proposed are the tip of the iceberg. The marriage of graph and warehousing technologies will bring many exciting research opportunities, which we briefly discuss at the end of the thesis. / Doctorat en Sciences de l'ingénieur et technologie / info:eu-repo/semantics/nonPublished Informatique générale Graph, Data Warehouse, OLAP, Database
4	On proximity problems in Euclidean spaces Barba Flores, Luis 20 June 2016 (has links) In this work, we focus on two kinds of problems involving the proximity of geometric objects. The first part revolves around intersection detection problems. In this setting, we are given two (or more) geometric objects and we are allowed to preprocess them. Then, the objects are translated and rotated within a geometric space, and we need to efficiently test if they intersect in these new positions. We develop representations of convex polytopes in any (constant) dimension that allow us to perform this intersection test in logarithmic time.In the second part of this work, we turn our attention to facility location problems. In this setting, we are given a set of sites in a geometric space and we want to place a facility at a specific place in such a way that the distance between the facility and its farthest site is minimized. We study first the constrained version of the problem, in which the facility can only be place within a given geometric domain. We then study the facility location problem under the geodesic metric. In this setting, we consider a different way to measure distances: Given a simple polygon, we say that the distance between two points is the length of the shortest path that connects them while staying within the given polygon. In both cases, we present algorithms to find the optimal location of the facility.In the process of solving facility location problems, we rely heavily on geometric structures called Voronoi diagrams. These structures summarize the proximity information of a set of ``simple'' geometric objects in the plane and encode it as a decomposition of the plane into interior disjoint regions whose boundaries define a plane graph. We study the problem of constructing Voronoi diagrams incrementally by analyzing the number of edge insertions and deletions needed to maintain its combinatorial structure as new sites are added. / Option Informatique du Doctorat en Sciences / info:eu-repo/semantics/nonPublished Informatique générale Informatique mathématique Proximity problems Voronoi diagrams Convex polyhedra
5	Synthèse des systèmes réactifs interactifs Bozianu, Rodica 12 December 2016 (has links) Nous étudions le problème de la synthèse automatique de programmes dans des architectures multi-composants tels qu'elles respectent les spécifications par construction. Le principal objectif de cette thèse est de développer des procédures pour résoudre le problème de synthèse qui peut conduire à des implémentations efficaces. Chaque composant a une observation partielle sur l'état global du système multi-composants. Le problème est alors de fournir des protocoles basés sur les observations tel que les composants synthétisés assurent les spécifications pour tout le comportement de leur environnement.L'environnement peut être antagoniste, ou peut avoir ses propres objectifs et se comporter de façon rationnelle. Nous étudions d'abord le problème de synthèse lorsque l'environnement est présumé antagoniste. Pour ce contexte, nous proposons une procédure "Safraless" pour la synthèse d'un composant partiellement informé et un environnement omniscient à partir de spécifications KLTL+. Elle est implémentée dans l'outil Acacia-K. Ensuite, nous étudions le problème de synthèse lorsque les composants de l'environnement ont leurs propres objectifs et sont rationnels. Pour le cadre plus simple de l'information parfaite, nous fournissons des complexités serrées pour des objectifs oméga-réguliers particuliers. Pour le cas de l'information imparfaite, nous prouvons que le problème de la synthèse rationnelle est indécidable en général, mais nous regagnons la décidabilité si on demande à synthétiser un composant avec observation partielle contre un environnement multi-composante, omniscient et rationnel. / Doctorat en Sciences / info:eu-repo/semantics/nonPublished Informatique mathématique Informatique générale synthèse jeux systèmes réactifs logiques
6	Inference of gene networks from time series expression data and application to type 1 Diabetes Lopes, Miguel 04 September 2015 (has links) The inference of gene regulatory networks (GRN) is of great importance to medical research, as causal mechanisms responsible for phenotypes are unravelled and potential therapeutical targets identified. In type 1 diabetes, insulin producing pancreatic beta-cells are the target of an auto-immune attack leading to apoptosis (cell suicide). Although key genes and regulations have been identified, a precise characterization of the process leading to beta-cell apoptosis has not been achieved yet. The inference of relevant molecular pathways in type 1 diabetes is then a crucial research topic. GRN inference from gene expression data (obtained from microarrays and RNA-seq technology) is a causal inference problem which may be tackled with well-established statistical and machine learning concepts. In particular, the use of time series facilitates the identification of the causal direction in cause-effect gene pairs. However, inference from gene expression data is a very challenging problem due to the large number of existing genes (in human, over twenty thousand) and the typical low number of samples in gene expression datasets. In this context, it is important to correctly assess the accuracy of network inference methods. The contributions of this thesis are on three distinct aspects. The first is on inference assessment using precision-recall curves, in particular using the area under the curve (AUPRC). The typical approach to assess AUPRC significance is using Monte Carlo, and a parametric alternative is proposed. It consists on deriving the mean and variance of the null AUPRC and then using these parameters to fit a beta distribution approximating the true distribution. The second contribution is an investigation on network inference from time series. Several state of the art strategies are experimentally assessed and novel heuristics are proposed. One is a fast approximation of first order Granger causality scores, suited for GRN inference in the large variable case. Another identifies co-regulated genes (ie. regulated by the same genes). Both are experimentally validated using microarray and simulated time series. The third contribution of this thesis is on the context of type 1 diabetes and is a study on beta cell gene expression after exposure to cytokines, emulating the mechanisms leading to apoptosis. 8 datasets of beta cell gene expression were used to identify differentially expressed genes before and after 24h, which were functionally characterized using bioinformatics tools. The two most differentially expressed genes, previously unknown in the type 1 Diabetes literature (RIPK2 and ELF3) were found to modulate cytokine induced apoptosis. A regulatory network was then inferred using a dynamic adaptation of a state of the art network inference method. Three out of four predicted regulations (involving RIPK2 and ELF3) were experimentally confirmed, providing a proof of concept for the adopted approach. / Doctorat en Sciences / info:eu-repo/semantics/nonPublished Informatique générale inference assessment type 1 diabetes causal inference network inference precision-recall curves
7	Advances in secure remote electronic voting Dossogne, Jérôme 30 October 2015 (has links) In this document, most readers should be easily introduced to the challengesoffered to a designer, an implementer and a user when using electronic voting.Some of these challenges are receiving an answer in the second part of thedocument where we introduce and describe several distinct scientific resultsobtained during our years as PhD student covering essentially the years 2009 to2011 included. All these results are aimed towards either better understandingthe issues of electronic voting or solving them. Nonetheless, a reader might beinterested in picking one of these contributions to use for his own electronicvoting system while leaving the rest. That is, the different chapters of thesecond part of the document are able to stand on their own most of the timeand could be used without the others which leads us to introduce each of themseparately.After concluding in the third part, we provide a certain amount of appendicesthat were not thoroughly discussed within the second part of the documentbut that might be of interest to the reader. These appendices are made ofvarious researches, collaborations and analyzes that we performed during thosesame years and which are related to electronic voting. / Doctorat en Sciences / info:eu-repo/semantics/nonPublished Informatique mathématique Technologie de la sécurité Informatique générale Analyse de systèmes informatiques voting security remote
8	Requirement-driven Design and Optimization of Data-Intensive Flows Jovanovic, Petar 26 September 2016 (has links) Data have become number one assets of today's business world. Thus, its exploitation and analysis attracted the attention of people from different fields and having different technical backgrounds. Data-intensive flows are central processes in today’s business intelligence (BI) systems, deploying different technologies to deliver data, from a multitude of data sources, in user-preferred and analysis-ready formats. However, designing and optimizing such data flows, to satisfy both users' information needs and agreed quality standards, have been known as a burdensome task, typically left to the manual efforts of a BI system designer. These tasks have become even more challenging for next generation BI systems, where data flows typically need to combine data from in-house transactional storages, and data coming from external sources, in a variety of formats (e.g. social media, governmental data, news feeds). Moreover, for making an impact to business outcomes, data flows are expected to answer unanticipated analytical needs of a broader set of business users' and deliver valuable information in near real-time (i.e. at the right time). These challenges largely indicate a need for boosting the automation of the design and optimization of data-intensive flows. This PhD thesis aims at providing automatable means for managing the lifecycle of data-intensive flows. The study primarily analyzes the remaining challenges to be solved in the field of data-intensive flows, by performing a survey of current literature, and envisioning an architecture for managing the lifecycle of data-intensive flows. Following the proposed architecture, we further focus on providing automatic techniques for covering different phases of the data-intensive flows' lifecycle. In particular, the thesis first proposes an approach (CoAl) for incremental design of data-intensive flows, by means of multi-flow consolidation. CoAl not only facilitates the maintenance of data flow designs in front of changing information needs, but also supports the multi-flow optimization of data-intensive flows, by maximizing their reuse. Next, in the data warehousing (DW) context, we propose a complementary method (ORE) for incremental design of the target DW schema, along with systematically tracing the evolution metadata, which can further facilitate the design of back-end data-intensive flows (i.e. ETL processes). The thesis then studies the problem of implementing data-intensive flows into deployable formats of different execution engines, and proposes the BabbleFlow system for translating logical data-intensive flows into executable formats, spanning single or multiple execution engines. Lastly, the thesis focuses on managing the execution of data-intensive flows on distributed data processing platforms, and to this end, proposes an algorithm (H-WorD) for supporting the scheduling of data-intensive flows by workload-driven redistribution of data in computing clusters. The overall outcome of this thesis an end-to-end platform for managing the lifecycle of data-intensive flows, called Quarry. The techniques proposed in this thesis, plugged to the Quarry platform, largely facilitate the manual efforts, and assist users of different technical skills in their analytical tasks. Finally, the results of this thesis largely contribute to the field of data-intensive flows in today's BI systems, and advocate for further attention by both academia and industry to the problems of design and optimization of data-intensive flows. / Doctorat en Sciences de l'ingénieur et technologie / info:eu-repo/semantics/nonPublished Analyse de systèmes informatiques Informatique générale data-intensive flows workflow management optimization business intelligence ETL Data Warehousing
9	A modular approach to the automatic design of control software for robot swarms: From a novel perspective on the reality gap to AutoMoDe Francesca, Gianpiero 21 April 2017 (has links) The main issue in swarm robotics is to design the behavior of the individual robots so that a desired collective behavior is achieved. A promising alternative to the classical trial-and-error design approach is to rely on automatic design methods. In an automatic design method, the problem of designing the control software for a robot swarm is cast into an optimization problem: the different design choices define a search space that is explored using an optimization algorithm. Most of the automatic design methods proposed so far belong to the framework of evolutionary robotics. Traditionally, in evolutionary robotics the control software is based on artificial neural networks and is optimized automatically via an evolutionary algorithm, following a process inspired by natural evolution. Evolutionary robotics has been successfully adopted to design robot swarms that perform various tasks. The results achieved show that automatic design is a viable and promising approach to designing the control software of robot swarms. Despite these successes, a widely recognized problem of evolutionary robotics is the difficulty to overcome the reality gap, that is, having a seamless transition from simulation to the real world. In this thesis, we aim at conceiving an effective automatic design approach that is able to deliver robot swarms that have high performance once deployed in the real world. To this, we consider the major problem in the automatic design of robot swarms: the reality gap problem. We analyze the reality gap problem from a machine learning perspective. We show that the reality gap problem bears a strong resemblance to the generalization problem encountered in supervised learning. By casting the reality gap problem into the bias-variance tradeoff, we show that the inability to overcome the reality gap experienced in evolutionary robotics could be explained by the excessive representational power of the control architecture adopted. Consequently, we propose AutoMoDe, a novel automatic design approach that adopts a control architecture with low representational power. AutoMoDe designs software in the form of a probabilistic finite state machine that is composed automatically starting from a number of pre-existing parametric modules. In the experimental analysis presented in this thesis, we show that adopting a control architecture that features a low representational power is beneficial: AutoMoDe performs better than an evolutionary approach. Moreover, AutoMoDe is able to design robot swarms that perform better that the ones designed by human designers. AutoMoDe is the first automatic design approach that it is shown to outperform human designers in a controlled experiment. / Doctorat en Sciences de l'ingénieur et technologie / info:eu-repo/semantics/nonPublished Informatique générale Technologie informatique hardware Automatique swarm robotics automatic design
10	On the design and implementation of an accurate, efficient, and flexible simulator for heterogeneous swarm robotics systems Pinciroli, Carlo 28 April 2014 (has links) Swarm robotics is a young multidisciplinary research field at the<p>intersection of disciplines such as distributed systems, robotics,<p>artificial intelligence, and complex systems. Considerable research<p>effort has been dedicated to the study of algorithms targeted to<p>specific problems. Nonetheless, implementation and comparison remain difficult due to the lack of shared tools and benchmarks. Among the tools necessary to enable experimentation, the most fundamental is a simulator that offers an adequate level of accuracy and flexibility to suit the diverse needs of the swarm robotics<p>community. The very nature of swarm robotics, in which systems may comprise large numbers of robots, forces the design to provide<p>runtimes that increase gracefully with increasing swarm sizes.<p><p>In this thesis, I argue that none of the existing simulators offers<p>satisfactory levels of accuracy, flexibility, and efficiency, due to<p>fundamental limitations of their design. To overcome these<p>limitations, I present ARGoS---a general, multi-robot simulator that<p>currently benchmarks as the fastest in the literature.<p><p>In the design of ARGoS, I faced a number of unsolved issues. First, in existing simulators, accuracy is an intrinsic feature of the<p>design. For single-robot applications this choice is reasonable, but<p>for the large number of robots typically involved in a swarm, it<p>results in an unacceptable trade-off between accuracy and<p>efficiency. Second, the prospect of swarm robotics spans diverse<p>potential applications, such as space exploration, ocean restoration,<p>deep-underground mining, and construction of large structures. These applications differ in terms of physics (e.g. motion dynamics) and available communication means. The existing general-purpose simulators are not suitable to simulate such diverse environments accurately and efficiently.<p><p>To design ARGoS I introduced novel concepts. First, in ARGoS accuracy is framed as a property of the experimental setup, and is tunable to the requirements of the experiment. To achieve this, I designed the architecture of ARGoS to offer unprecedented levels of modularity. The user can provide customized versions of individual modules, thus assigning computational resources to the relevant aspects. This feature enhances efficiency, since the user can lower the computational cost of unnecessary aspects of a simulation.<p><p>To further decrease runtimes, the architecture of ARGoS exploits the computational resources of modern multi-core systems. In contrast to existing designs with comparable features, ARGoS allows the user to define both the granularity and the scheduling strategy of the parallel tasks, attaining unmatched levels of scalability and efficiency in resource usage.<p><p>A further unique feature of ARGoS is the possibility to partition the<p>simulated space in regions managed by dedicated physics engines<p>running in parallel. This feature, besides enhancing parallelism,<p>enables experiments in which multiple regions with different features are simulated. For instance, ARGoS can perform accurate and efficient simulations of scenarios in which amphibian robots act both underwater and on sandy shores.<p><p>ARGoS is listed among the major results of the Swarmanoid<p>project. It is currently<p>the official simulator of 4 European projects<p>(ASCENS, H2SWARM, E-SWARM, Swarmix) and is used by 15<p>universities worldwide. While the core architecture of ARGoS is<p>complete, extensions are continually added by a community of<p>contributors. In particular, ARGoS was the first robot simulator to be<p>integrated with the ns3 network simulator, yielding a software<p>able to simulate both the physics and the network aspects of a<p>swarm. Further extensions under development include support for<p>large-scale modular robots, construction of 3D structures with<p>deformable material, and integration with advanced statistical<p>analysis tools such as MultiVeStA. / Doctorat en Sciences de l'ingénieur / info:eu-repo/semantics/nonPublished Informatique générale Swarm intelligence Robotics Intelligence collective Robotique swarm intelligence swarm robotics simulation

Search results