Global ETD Search

1	A Resource-Oriented Architecture for Integration and Exploitation of Linked Data / Conception d'une architecture orientée services pour l'intégration et l'exploitation de données liées De Vettor, Pierre 29 September 2016 (has links) Cette thèse porte sur l'intégration de données brutes provenant de sources hétérogènes sur le Web. L'objectif global est de fournir une architecture générique et modulable capable de combiner, de façon sémantique et intelligente, ces données hétérogènes dans le but de les rendre réutilisables. Ce travail est motivé par un scenario réel de l'entreprise Audience Labs permettant une mise à l'échelle de cette architecture. Dans ce rapport, nous proposons de nouveaux modèles et techniques permettant d'adapter le processus de combinaison et d'intégration à la diversité des sources de données impliquées. Les problématiques sont une gestion transparente et dynamique des sources de données, passage à l'échelle et responsivité par rapport au nombre de sources, adaptabilité au caractéristiques de sources, et finalement, consistance des données produites(données cohérentes, sans erreurs ni doublons). Pour répondre à ces problématiques, nous proposons un méta-modèle pour représenter ces sources selon leurs caractéristiques, liées à l'accès (URI) ou à l'extraction (format) des données, mais aussi au capacités physiques des sources (latence, volume). En s'appuyant sur cette formalisation, nous proposent différentes stratégies d'accès aux données, afin d'adapter les traitements aux spécificités des sources. En se basant sur ces modèles et stratégies, nous proposons une architecture orientée ressource, ou tout les composants sont accessibles par HTTP via leurs URI. En se basant sur les caractéristiques des sources, sont générés des workflows d'exécution spécifiques et adapté, permettant d'orchestrer les différentes taches du processus d'intégration de façon optimale, en donnant différentes priorités à chacune des tâches. Ainsi, les temps de traitements sont diminuées, ainsi que les volumes des données échangées. Afin d'améliorer la qualité des données produites par notre approches, l'accent est mis sur l'incertitude qui peut apparaître dans les données sur le Web. Nous proposons un modèle, permettant de représenter cette incertitude, au travers du concept de ressource Web incertaines, basé sur un modèle probabiliste ou chaque ressource peut avoir plusieurs représentation possibles, avec une certaine probabilité. Cette approche sera à l'origine d'une nouvelle optimisation de l'architecture pour permettre de prendre en compte l'incertitude pendant la combinaison des données / In this thesis, we focus on data integration of raw data coming from heterogeneous and multi-origin data sources on the Web. The global objective is to provide a generic and adaptive architecture able to analyze and combine this heterogeneous, informal, and sometimes meaningless data into a coherent smart data set. We define smart data as significant, semantically explicit data, ready to be used to fulfill the stakeholders' objective. This work is motivated by a live scenario from the French {\em Audience Labs} company. In this report, we propose new models and techniques to adapt the combination and integration process to the diversity of data sources. We focus on transparency and dynamicity in data source management, scalability and responsivity according to the number of data sources, adaptability to data source characteristics, and finally consistency of produced data (coherent data, without errors and duplicates). In order to address these challenges, we first propose a meta-models in order to represent the variety of data source characteristics, related to access (URI, authentication) extraction (request format), or physical characteristics (volume, latency). By relying on this coherent formalization of data sources, we define different data access strategies in order to adapt access and processing to data source capabilities. With help form these models and strategies, we propose a distributed resource oriented software architecture, where each component is freely accessible through REST via its URI. The orchestration of the different tasks of the integration process can be done in an optimized way, regarding data source and data characteristics. This data allows us to generate an adapted workflow, where tasks are prioritized amongst other in order to fasten the process, and by limiting the quantity of data transfered. In order to improve the data quality of our approach, we then focus on the data uncertainty that could appear in a Web context, and propose a model to represent uncertainty in a Web context. We introduce the concept of Web resource, based on a probabilistic model where each resource can have different possible representations, each with a probability. This approach will be the basis of a new architecture optimization allowing to take uncertainty into account during our combination process Architecture orientees ressource Adaptation Smart data Incertitude des données Intégration de données Sémantique des données Resource oriented architecture Adaptation Smart data Data uncertainty Data integration Data semantics 004.21
2	Optimising routing and trustworthiness of ad hoc networks using swarm intelligence Amin, Saman Hameed January 2014 (has links) This thesis proposes different approaches to address routing and security of MANETs using swarm technology. The mobility and infrastructure-less of MANET as well as nodes misbehavior compose great challenges to routing and security protocols of such a network. The first approach addresses the problem of channel assignment in multichannel ad hoc networks with limited number of interfaces, where stable route are more preferred to be selected. The channel selection is based on link quality between the nodes. Geographical information is used with mapping algorithm in order to estimate and predict the links’ quality and routes life time, which is combined with Ant Colony Optimization (ACO) algorithm to find most stable route with high data rate. As a result, a better utilization of the channels is performed where the throughput increased up to 74% over ASAR protocol. A new smart data packet routing protocol is developed based on the River Formation Dynamics (RFD) algorithm. The RFD algorithm is a subset of swarm intelligence which mimics how rivers are created in nature. The protocol is a distributed swarm learning approach where data packets are smart enough to guide themselves through best available route in the network. The learning information is distributed throughout the nodes of the network. This information can be used and updated by successive data packets in order to maintain and find better routes. Data packets act like swarm agents (drops) where they carry their path information and update routing information without the need for backward agents. These data packets modify the routing information based on different network metrics. As a result, data packet can guide themselves through better routes. In the second approach, a hybrid ACO and RFD smart data packet routing protocol is developed where the protocol tries to find shortest path that is less congested to the destination. Simulation results show throughput improvement by 30% over AODV protocol and 13% over AntHocNet. Both delay and jitter have been improved more than 96% over AODV protocol. In order to overcome the problem of source routing introduced due to the use of the ACO algorithm, a solely RFD based distance vector protocol has been developed as a third approach. Moreover, the protocol separates reactive learned information from proactive learned information to add more reliability to data routing. To minimize the power consumption introduced due to the hybrid nature of the RFD routing protocol, a forth approach has been developed. This protocol tackles the problem of power consumption and adds packets delivery power minimization to the protocol based on RFD algorithm. Finally, a security model based on reputation and trust is added to the smart data packet protocol in order to detect misbehaving nodes. A trust system has been built based on the privilege offered by the RFD algorithm, where drops are always moving from higher altitude to lower one. Moreover, the distributed and undefined nature of the ad hoc network forces the nodes to obligate to cooperative behaviour in order not to be exposed. This system can easily and quickly detect misbehaving nodes according to altitude difference between active intermediate nodes. 006.3
3	An exercise in database customized programming to compare the Smart Data Manager and dBaseIII Fitzgerald, Amy Lynn January 2010 (has links) Typescript (photocopy). / Digitized by Kansas Correctional Industries dBASE III--Evaluation
4	SILE: A Method for the Efficient Management of Smart Genomic Information León Palacio, Ana 25 November 2019 (has links) [ES] A lo largo de las últimas dos décadas, los datos generados por las tecnologías de secuenciación de nueva generación han revolucionado nuestro entendimiento de la biología humana. Es más, nos han permitido desarrollar y mejorar nuestro conocimiento sobre cómo los cambios (variaciones) en el ADN pueden estar relacionados con el riesgo de sufrir determinadas enfermedades. Actualmente, hay una gran cantidad de datos genómicos disponibles de forma pública, que son consultados con frecuencia por la comunidad científica para extraer conclusiones significativas sobre las asociaciones entre los genes de riesgo y los mecanismos que producen las enfermedades. Sin embargo, el manejo de esta cantidad de datos que crece de forma exponencial se ha convertido en un reto. Los investigadores se ven obligados a sumergirse en un lago de datos muy complejos que están dispersos en más de mil repositorios heterogéneos, representados en múltiples formatos y con diferentes niveles de calidad. Además, cuando se trata de resolver una tarea en concreto sólo una pequeña parte de la gran cantidad de datos disponibles es realmente significativa. Estos son los que nosotros denominamos datos "inteligentes". El principal objetivo de esta tesis es proponer un enfoque sistemático para el manejo eficiente de datos genómicos inteligentes mediante el uso de técnicas de modelado conceptual y evaluación de calidad de los datos. Este enfoque está dirigido a poblar un sistema de información con datos que sean lo suficientemente accesibles, informativos y útiles para la extracción de conocimiento de valor. / [CA] Al llarg de les últimes dues dècades, les dades generades per les tecnologies de secuenciació de nova generació han revolucionat el nostre coneixement sobre la biologia humana. És mes, ens han permès desenvolupar i millorar el nostre coneixement sobre com els canvis (variacions) en l'ADN poden estar relacionats amb el risc de patir determinades malalties. Actualment, hi ha una gran quantitat de dades genòmiques disponibles de forma pública i que són consultats amb freqüència per la comunitat científica per a extraure conclusions significatives sobre les associacions entre gens de risc i els mecanismes que produeixen les malalties. No obstant això, el maneig d'aquesta quantitat de dades que creix de forma exponencial s'ha convertit en un repte i els investigadors es veuen obligats a submergir-se en un llac de dades molt complexes que estan dispersos en mes de mil repositoris heterogenis, representats en múltiples formats i amb diferents nivells de qualitat. A m\és, quan es tracta de resoldre una tasca en concret només una petita part de la gran quantitat de dades disponibles és realment significativa. Aquests són els que nosaltres anomenem dades "intel·ligents". El principal objectiu d'aquesta tesi és proposar un enfocament sistemàtic per al maneig eficient de dades genòmiques intel·ligents mitjançant l'ús de tècniques de modelatge conceptual i avaluació de la qualitat de les dades. Aquest enfocament està dirigit a poblar un sistema d'informació amb dades que siguen accessibles, informatius i útils per a l'extracció de coneixement de valor. / [EN] In the last two decades, the data generated by the Next Generation Sequencing Technologies have revolutionized our understanding about the human biology. Furthermore, they have allowed us to develop and improve our knowledge about how changes (variants) in the DNA can be related to the risk of developing certain diseases. Currently, a large amount of genomic data is publicly available and frequently used by the research community, in order to extract meaningful and reliable associations among risk genes and the mechanisms of disease. However, the management of this exponential growth of data has become a challenge and the researchers are forced to delve into a lake of complex data spread in over thousand heterogeneous repositories, represented in multiple formats and with different levels of quality. Nevertheless, when these data are used to solve a concrete problem only a small part of them is really significant. This is what we call "smart" data. The main goal of this thesis is to provide a systematic approach to efficiently manage smart genomic data, by using conceptual modeling techniques and the principles of data quality assessment. The aim of this approach is to populate an Information System with data that are accessible, informative and actionable enough to extract valuable knowledge. / This thesis was supported by the Research and Development Aid Program (PAID-01-16) under the FPI grant 2137. / León Palacio, A. (2019). SILE: A Method for the Efficient Management of Smart Genomic Information [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/131698 / TESIS / Premios Extraordinarios de tesis doctorales Information Systems Smart Data SILE Conceptual Modeling Data Quality LENGUAJES Y SISTEMAS INFORMATICOS
5	A Personalized Smart Cube for Faster and Reliable Access to Data Antwi, Daniel K. 02 December 2013 (has links) Organizations own data sources that contain millions, billions or even trillions of rows and these data are usually highly dimensional in nature. Typically, these raw repositories are comprised of numerous independent data sources that are too big to be copied or joined, with the consequence that aggregations become highly problematic. Data cubes play an essential role in facilitating fast Online Analytical Processing (OLAP) in many multi-dimensional data warehouses. Current data cube computation techniques have had some success in addressing the above-mentioned aggregation problem. However, the combined problem of reducing data cube size for very large and highly dimensional databases, while guaranteeing fast query response times, has received less attention. Another issue is that most OLAP tools often causes users to be lost in the ocean of data while performing data analysis. Often, most users are interested in only a subset of the data. For example, consider in such a scenario, a business manager who wants to answer the crucial location-related business question. "Why are my sales declining at location X"? This manager wants fast, unambiguous location-aware answers to his queries. He requires access to only the relevant ltered information, as found from the attributes that are directly correlated with his current needs. Therefore, it is important to determine and to extract, only that small data subset that is highly relevant from a particular user's location and perspective. In this thesis, we present the Personalized Smart Cube approach to address the abovementioned scenario. Our approach consists of two main parts. Firstly, we combine vertical partitioning, partial materialization and dynamic computation to drastically reduce the size of the computed data cube while guaranteeing fast query response times. Secondly, our personalization algorithm dynamically monitors user query pattern and creates a personalized data cube for each user. This ensures that users utilize only that small subset of data that is most relevant to them. Our experimental evaluation of our Personalized Smart Cube approach showed that our work compared favorably with other state-of-the-art methods. We evaluated our work focusing on three main criteria, namely the storage space used, query response time and the cost savings ratio of using a personalized cube. The results showed that our algorithm materializes a relatively smaller number of views than other techniques and it also compared favourable in terms of query response time. Further, our personalization algorithm is superior to the state-of-the art Virtual Cube algorithm, when evaluated in terms of the number of user queries that were successfully answered when using a personalized cube, instead of the base cube. Data Cube Dynamic Data Cube Personalized Cube Smart Cube Smart Data Cube Data Cube partitioning Small Data cube
6	A Personalized Smart Cube for Faster and Reliable Access to Data Antwi, Daniel K. January 2013 (has links) Organizations own data sources that contain millions, billions or even trillions of rows and these data are usually highly dimensional in nature. Typically, these raw repositories are comprised of numerous independent data sources that are too big to be copied or joined, with the consequence that aggregations become highly problematic. Data cubes play an essential role in facilitating fast Online Analytical Processing (OLAP) in many multi-dimensional data warehouses. Current data cube computation techniques have had some success in addressing the above-mentioned aggregation problem. However, the combined problem of reducing data cube size for very large and highly dimensional databases, while guaranteeing fast query response times, has received less attention. Another issue is that most OLAP tools often causes users to be lost in the ocean of data while performing data analysis. Often, most users are interested in only a subset of the data. For example, consider in such a scenario, a business manager who wants to answer the crucial location-related business question. "Why are my sales declining at location X"? This manager wants fast, unambiguous location-aware answers to his queries. He requires access to only the relevant ltered information, as found from the attributes that are directly correlated with his current needs. Therefore, it is important to determine and to extract, only that small data subset that is highly relevant from a particular user's location and perspective. In this thesis, we present the Personalized Smart Cube approach to address the abovementioned scenario. Our approach consists of two main parts. Firstly, we combine vertical partitioning, partial materialization and dynamic computation to drastically reduce the size of the computed data cube while guaranteeing fast query response times. Secondly, our personalization algorithm dynamically monitors user query pattern and creates a personalized data cube for each user. This ensures that users utilize only that small subset of data that is most relevant to them. Our experimental evaluation of our Personalized Smart Cube approach showed that our work compared favorably with other state-of-the-art methods. We evaluated our work focusing on three main criteria, namely the storage space used, query response time and the cost savings ratio of using a personalized cube. The results showed that our algorithm materializes a relatively smaller number of views than other techniques and it also compared favourable in terms of query response time. Further, our personalization algorithm is superior to the state-of-the art Virtual Cube algorithm, when evaluated in terms of the number of user queries that were successfully answered when using a personalized cube, instead of the base cube. Data Cube Dynamic Data Cube Personalized Cube Smart Cube Smart Data Cube Data Cube partitioning Small Data cube
7	A tropical geometry and discrete convexity approach to bilevel programming : application to smart data pricing in mobile telecommunication networks / Une approche par la géométrie tropicale et la convexité discrète de la programmation bi-niveau : application à la tarification des données dans les réseaux mobiles de télécommunications Eytard, Jean-Bernard 12 November 2018 (has links) La programmation bi-niveau désigne une classe de problèmes d'optimisation emboîtés impliquant deux joueurs.Un joueur meneur annonce une décision à un joueur suiveur qui détermine sa réponse parmi l'ensemble des solutions d'un problème d'optimisation dont les données dépendent de la décision du meneur (problème de niveau bas).La décision optimale du meneur est la solution d'un autre problème d'optimisation dont les données dépendent de la réponse du suiveur (problème de niveau haut).Lorsque la réponse du suiveur n'est pas unique, on distingue les problèmes bi-niveaux optimistes et pessimistes,suivant que la réponse du suiveur soit respectivement la meilleure ou la pire possible pour le meneur.Les problèmes bi-niveaux sont souvent utilisés pour modéliser des problèmes de tarification. Dans les applications étudiées ici, le meneur est un vendeur qui fixe un prix, et le suiveur modélise le comportement d'un grand nombre de clients qui déterminent leur consommation en fonction de ce prix. Le problème de niveau bas est donc de grande dimension.Cependant, la plupart des problèmes bi-niveaux sont NP-difficiles, et en pratique, il n'existe pas de méthodes générales pour résoudre efficacement les problèmes bi-niveaux de grande dimension.Nous introduisons ici une nouvelle approche pour aborder la programmation bi-niveau.Nous supposons que le problème de niveau bas est un programme linéaire, en variables continues ou discrètes,dont la fonction de coût est déterminée par la décision du meneur.Ainsi, la réponse du suiveur correspond aux cellules d'un complexe polyédral particulier,associé à une hypersurface tropicale.Cette interprétation est motivée par des applications récentes de la géométrie tropicale à la modélisation du comportement d'agents économiques.Nous utilisons la dualité entre ce complexe polyédral et une subdivision régulière d'un polytope de Newton associé pour introduire une méthode dedécomposition qui résout une série de sous-problèmes associés aux différentes cellules du complexe.En utilisant des résultats portant sur la combinatoire des subdivisions, nous montrons que cette décomposition mène à un algorithme permettant de résoudre une grande classe de problèmes bi-niveaux en temps polynomial en la dimension du problème de niveau bas lorsque la dimension du problème de niveau haut est fixée.Nous identifions ensuite des structures spéciales de problèmes bi-niveaux pour lesquelles la borne de complexité peut être améliorée.C'est en particulier le cas lorsque la fonction coût du meneur ne dépend que de la réponse du suiveur.Ainsi, nous montrons que la version optimiste du problème bi-niveau peut être résolue en temps polynomial, notammentpour des instancesdans lesquelles les données satisfont certaines propriétés de convexité discrète.Nous montrons également que les solutions de tels problèmes sont des limites d'équilibres compétitifs.Dans la seconde partie de la thèse, nous appliquons cette approche à un problème d'incitation tarifaire dans les réseaux mobiles de télécommunication.Les opérateurs de données mobiles souhaitent utiliser des schémas de tarification pour encourager les différents utilisateurs à décaler leur consommation de données mobiles dans le temps, et par conséquent dans l'espace (à cause de leur mobilité), afin de limiter les pics de congestion.Nous modélisons cela par un problème bi-niveau de grande taille.Nous montrons qu'un cas simplifié peut être résolu en temps polynomial en utilisant la décomposition précédente,ainsi que des résultats de convexité discrète et de théorie des graphes.Nous utilisons ces idées pour développer une heuristique s'appliquant au cas général.Nous implémentons et validons cette méthode sur des données réelles fournies par Orange. / Bilevel programming deals with nested optimization problems involving two players. A leader annouces a decision to a follower, who responds by selecting a solution of an optimization problem whose data depend on this decision (low level problem). The optimal decision of the leader is the solution of another optimization problem whose data depend on the follower's response (high level problem). When the follower's response is not unique, one distinguishes between optimistic and pessimistic bilevel problems, in which the leader takes into account the best or worst possible response of the follower.Bilevel problems are often used to model pricing problems.We are interested in applications in which the leader is a seller who announces a price, and the follower models the behavior of a large number of customers who determine their consumptions depending on this price.Hence, the dimension of the low-level is large. However, most bilevel problems are NP-hard, and in practice, there is no general method to solve efficiently large-scale bilevel problems.In this thesis, we introduce a new approach to tackle bilevel programming. We assume that the low level problem is a linear program, in continuous or discrete variables, whose cost function is determined by the leader. Then, the follower responses correspond to the cells of a special polyhedral complex, associated to a tropical hypersurface. This is motivated by recent applications of tropical geometry to model the behavior of economic agents.We use the duality between this polyhedral complex and a regular subdivision of an associated Newton polytope to introduce a decomposition method, in which one solves a series of subproblems associated to the different cells of the complex. Using results about the combinatorics of subdivisions, we show thatthis leads to an algorithm to solve a wide class of bilevel problemsin a time that is polynomial in the dimension of the low-level problem when the dimension of the high-level problem is fixed.Then, we identify special structures of bilevel problems forwhich this complexity bound can be improved.This is the case when the leader's cost function depends only on the follower's response. Then, we showthe optimistic bilevel problem can be solved in polynomial time.This applies in particular to high dimensional instances in which the datasatisfy certain discrete convexity properties. We also show that the solutions of such bilevel problems are limits of competitive equilibria.In the second part of this thesis, we apply this approach to a price incentive problem in mobile telecommunication networks.The aim for Internet service providers is to use pricing schemes to encourage the different users to shift their data consumption in time(and so, also in space owing to their mobility),in order to reduce the congestion peaks.This can be modeled by a large-scale bilevel problem.We show that a simplified case can be solved in polynomial time by applying the previous decomposition approach together with graph theory and discrete convexity results. We use these ideas to develop an heuristic method which applies to the general case. We implemented and validated this method on real data provided by Orange. Géométrie tropicale Programmation bi-Niveau Convexité discrète Tarification des données Réseaux mobiles Tropical geometry Bilevel programming Discrete convexity Smart data pricing Mobile networks 516.35

1

Page generated in 0.0642 seconds