• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 508
  • 79
  • 36
  • 29
  • 22
  • 15
  • 11
  • 10
  • 9
  • 8
  • 6
  • 6
  • 5
  • 4
  • 3
  • Tagged with
  • 872
  • 286
  • 265
  • 221
  • 202
  • 169
  • 152
  • 134
  • 130
  • 129
  • 125
  • 117
  • 103
  • 102
  • 102
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
651

Préservation de la confidentialité des données externalisées dans le traitement des requêtes top-k / Privacy preserving top-k query processing over outsourced data

Mahboubi, Sakina 21 November 2018 (has links)
L’externalisation de données d’entreprise ou individuelles chez un fournisseur de cloud, par exemple avec l’approche Database-as-a-Service, est pratique et rentable. Mais elle introduit un problème majeur: comment préserver la confidentialité des données externalisées, tout en prenant en charge les requêtes expressives des utilisateurs. Une solution simple consiste à crypter les données avant leur externalisation. Ensuite, pour répondre à une requête, le client utilisateur peut récupérer les données cryptées du cloud, les décrypter et évaluer la requête sur des données en texte clair (non cryptées). Cette solution n’est pas pratique, car elle ne tire pas parti de la puissance de calcul fournie par le cloud pour évaluer les requêtes.Dans cette thèse, nous considérons un type important de requêtes, les requêtes top-k, et le problème du traitement des requêtes top-k sur des données cryptées dans le cloud, tout en préservant la vie privée. Une requête top-k permet à l’utilisateur de spécifier un nombre k de tuples les plus pertinents pour répondre à la requête. Le degré de pertinence des tuples par rapport à la requête est déterminé par une fonction de notation.Nous proposons d’abord un système complet, appelé BuckTop, qui est capable d’évaluer efficacement les requêtes top-k sur des données cryptées, sans avoir à les décrypter dans le cloud. BuckTop inclut un algorithme de traitement des requêtes top-k qui fonctionne sur les données cryptées, stockées dans un nœud du cloud, et retourne un ensemble qui contient les données cryptées correspondant aux résultats top-k. Il est aidé par un algorithme de filtrage efficace qui est exécuté dans le cloud sur les données chiffrées et supprime la plupart des faux positifs inclus dans l’ensemble renvoyé. Lorsque les données externalisées sont volumineuses, elles sont généralement partitionnées sur plusieurs nœuds dans un système distribué. Pour ce cas, nous proposons deux nouveaux systèmes, appelés SDB-TOPK et SD-TOPK, qui permettent d’évaluer les requêtes top-k sur des données distribuées cryptées sans avoir à les décrypter sur les nœuds où elles sont stockées. De plus, SDB-TOPK et SD-TOPK ont un puissant algorithme de filtrage qui filtre les faux positifs autant que possible dans les nœuds et renvoie un petit ensemble de données cryptées qui seront décryptées du côté utilisateur. Nous analysons la sécurité de notre système et proposons des stratégies efficaces pour la mettre en œuvre.Nous avons validé nos solutions par l’implémentation de BuckTop, SDB-TOPK et SD-TOPK, et les avons comparé à des approches de base par rapport à des données synthétiques et réelles. Les résultats montrent un excellent temps de réponse par rapport aux approches de base. Ils montrent également l’efficacité de notre algorithme de filtrage qui élimine presque tous les faux positifs. De plus, nos systèmes permettent d’obtenir une réduction significative des coûts de communication entre les nœuds du système distribué lors du calcul du résultat de la requête. / Outsourcing corporate or individual data at a cloud provider, e.g. using Database-as-a-Service, is practical and cost-effective. But it introduces a major problem: how to preserve the privacy of the outsourced data, while supporting powerful user queries. A simple solution is to encrypt the data before it is outsourced. Then, to answer a query, the user client can retrieve the encrypted data from the cloud, decrypt it, and evaluate the query over plaintext (non encrypted) data. This solution is not practical, as it does not take advantage of the computing power provided by the cloud for evaluating queries.In this thesis, we consider an important kind of queries, top-k queries,and address the problem of privacy-preserving top-k query processing over encrypted data in the cloud.A top-k query allows the user to specify a number k, and the system returns the k tuples which are most relevant to the query. The relevance degree of tuples to the query is determined by a scoring function.We first propose a complete system, called BuckTop, that is able to efficiently evaluate top-k queries over encrypted data, without having to decrypt it in the cloud. BuckTop includes a top-k query processing algorithm that works on the encrypted data, stored at one cloud node,and returns a set that is proved to contain the encrypted data corresponding to the top-k results. It also comes with an efficient filtering algorithm that is executed in the cloud on encypted data and removes most of the false positives included in the set returned.When the outsourced data is big, it is typically partitioned over multiple nodes in a distributed system. For this case, we propose two new systems, called SDB-TOPK and SD-TOPK, that can evaluate top-k queries over encrypted distributed data without having to decrypt at the nodes where they are stored. In addition, SDB-TOPK and SD-TOPK have a powerful filtering algorithm that filters the false positives as much as possible in the nodes, and returns a small set of encrypted data that will be decrypted in the user side. We analyze the security of our system, and propose efficient strategies to enforce it.We validated our solutions through implementation of BuckTop , SDB-TOPK and SD-TOPK, and compared them to baseline approaches over synthetic and real databases. The results show excellent response time compared to baseline approaches. They also show the efficiency of our filtering algorithm that eliminates almost all false positives. Furthermore, our systems yieldsignificant reduction in communication cost between the distributed system nodes when computing the query result.
652

A Declarative Approach to Modeling and Solving the View Selection Problem / Une approche déclarative pour la modélisation et la résolution du problème de la sélection de vues à matérialiser

Mami, Imene 15 November 2012 (has links)
La matérialisation de vues est une technique très utilisée dans les systèmes de gestion bases de données ainsi que dans les entrepôts de données pour améliorer les performances des requêtes. Elle permet de réduire de manière considérable le temps de réponse des requêtes en pré-calculant des requêtes coûteuses et en stockant leurs résultats. De ce fait, l'exécution de certaines requêtes nécessite seulement un accès aux vues matérialisées au lieu des données sources. En contrepartie, la matérialisation entraîne un surcoût de maintenance des vues. En effet, les vues matérialisées doivent être mises à jour lorsque les données sources changent afin de conserver la cohérence et l'intégrité des données. De plus, chaque vue matérialisée nécessite également un espace de stockage supplémentaire qui doit être pris en compte au moment de la sélection. Le problème de choisir quelles sont les vues à matérialiser de manière à réduire les coûts de traitement des requêtes étant donné certaines contraintes tel que l'espace de stockage et le coût de maintenance, est connu dans la littérature sous le nom du problème de la sélection de vues. Trouver la solution optimale satisfaisant toutes les contraintes est un problème NP-complet. Dans un contexte distribué constitué d'un ensemble de noeuds ayant des contraintes de ressources différentes (CPU, IO, capacité de l'espace de stockage, bande passante réseau, etc.), le problème de la sélection des vues est celui de choisir un ensemble de vues à matérialiser ainsi que les noeuds du réseau sur lesquels celles-ci doivent être matérialisées de manière à optimiser les coût de maintenance et de traitement des requêtes.Notre étude traite le problème de la sélection de vues dans un environnement centralisé ainsi que dans un contexte distribué. Notre objectif est de fournir une approche efficace dans ces contextes. Ainsi, nous proposons une solution basée sur la programmation par contraintes, connue pour être efficace dans la résolution des problèmes NP-complets et une méthode puissante pour la modélisation et la résolution des problèmes d'optimisation combinatoire. L'originalité de notre approche est qu'elle permet une séparation claire entre la formulation et la résolution du problème. A cet effet, le problème de la sélection de vues est modélisé comme un problème de satisfaction de contraintes de manière simple et déclarative. Puis, sa résolution est effectuée automatiquement par le solveur de contraintes. De plus, notre approche est flexible et extensible, en ce sens que nous pouvons facilement modéliser et gérer de nouvelles contraintes et mettre au point des heuristiques pour un objectif d'optimisation.Les principales contributions de cette thèse sont les suivantes. Tout d'abord, nous définissons un cadre qui permet d'avoir une meilleure compréhension des problèmes que nous abordons dans cette thèse. Nous analysons également l'état de l'art des méthodes de sélection des vues à matérialiser en en identifiant leurs points forts ainsi que leurs limites. Ensuite, nous proposons une solution utilisant la programmation par contraintes pour résoudre le problème de la sélection de vues dans un contexte centralisé. Nos résultats expérimentaux montrent notre approche fournit de bonnes performances. Elle permet en effet d'avoir le meilleur compromis entre le temps de calcul nécessaire pour la sélection des vues à matérialiser et le gain de temps de traitement des requêtes à réaliser en matérialisant ces vues. Enfin, nous étendons notre approche pour résoudre le problème de la sélection de vues à matérialiser lorsque celui-ci est étudié sous contraintes de ressources multiples dans un contexte distribué. A l'aide d'une évaluation de performances extensive, nous montrons que notre approche fournit des résultats de qualité et fiable. / View selection is important in many data-intensive systems e.g., commercial database and data warehousing systems to improve query performance. View selection can be defined as the process of selecting a set of views to be materialized in order to optimize query evaluation. To support this process, different related issues have to be considered. Whenever a data source is changed, the materialized views built on it have to be maintained in order to compute up-to-date query results. Besides the view maintenance issue, each materialized view also requires additional storage space which must be taken into account when deciding which and how many views to materialize.The problem of choosing which views to materialize that speed up incoming queries constrained by an additional storage overhead and/or maintenance costs, is known as the view selection problem. This is one of the most challenging problems in data warehousing and it is known to be a NP-complete problem. In a distributed environment, the view selection problem becomes more challenging. Indeed, it includes another issue which is to decide on which computer nodes the selected views should be materialized. The view selection problem in a distributed context is now additionally constrained by storage space capacities per computer node, maximum global maintenance costs and the communications cost between the computer nodes of the network.In this work, we deal with the view selection problem in a centralized context as well as in a distributed setting. Our goal is to provide a novel and efficient approach in these contexts. For this purpose, we designed a solution using constraint programming which is known to be efficient for the resolution of NP-complete problems and a powerful method for modeling and solving combinatorial optimization problems. The originality of our approach is that it provides a clear separation between formulation and resolution of the problem. Indeed, the view selection problem is modeled as a constraint satisfaction problem in an easy and declarative way. Then, its resolution is performed automatically by the constraint solver. Furthermore, our approach is flexible and extensible, in that it can easily model and handle new constraints and new heuristic search strategies for optimization purpose. The main contributions of this thesis are as follows. First, we define a framework that enables to have a better understanding of the problems we address in this thesis. We also analyze the state of the art in materialized view selection to review the existing methods by identifying respective potentials and limits. We then design a solution using constraint programming to address the view selection problem in a centralized context. Our performance experimentation results show that our approach has the ability to provide the best balance between the computing time to be required for finding the materialized views and the gain to be realized in query processing by materializing these views. Our approach will also guarantee to pick the optimal set of materialized views where no time limit is imposed. Finally, we extend our approach to provide a solution to the view selection problem when the latter is studied under multiple resource constraints in a distributed context. Based on our extensive performance evaluation, we show that our approach outperforms the genetic algorithm that has been designed for a distributed setting.
653

Operação de busca exata aos K-vizinhos mais próximos reversos em espaços métricos / Answering exact reverse k-nerarest neighbors queries in metric space

Oliveira, Willian Dener de 19 March 2010 (has links)
A complexidade dos dados armazenados em grandes bases de dados aumenta cada vez mais, criando a necessidade de novas operações de consulta. Uma classe de operações que tem apresentado interesse crescente são as chamadas Consultas por Similaridade, sendo as mais conhecidas as consultas por Abrangência (\'R IND. q\') e por k-Vizinhos mais Proximos (kNN), sendo que esta ultima obtem quais são os k elementos armazenados mais similares a um dado elemento de referência. Outra consulta que é interessante tanto para consultas diretas quanto como parte de operações de análises mais complexas e a operação de consulta aos k-Vizinhos mais Próximos Reversos (RkNN). Seu objetivo e obter todos os elementos armazenados que têm um dado elemento de referência como um dos seus k elementos mais similares. Devido a complexidade de execução da operação de RkNN, a grande maioria das soluções existentes restringem-se a dados representados em espaços multidimensionais euclidianos (nos quais estão denidas tambem operações cardinais e topológicas, além de se considerar a similaridade como sendo a distância Euclidiana entre dois elementos), ou então obtém apenas respostas aproximadas, sujeitas a existência de falsos negativos. Várias aplicações de análise de dados científicos, médicos, de engenharia, financeiros, etc. requerem soluções eficientes para o problema da operação de RkNN sobre dados representados em espaços métricos, onde os elementos não podem ser considerados estar em um espaço nem Euclidiano nem multidimensional. Num espaço métrico, além dos próprios elementos armazenados existe apenas uma função de comparação métrica entre pares de objetos. Neste trabalho, são propostas novas podas de espaço de busca e o algoritmo RkNN-MG que utiliza essas novas podas para solucionar o problema de consultas RkNN exatas em espaços métricos sem limitações. Toda a proposta supõe que o conjunto de dados esta em um espaço métrico imerso isometricamente em espaço euclidiano e utiliza propriedades da geometria métrica válida neste espaço para realizar podas eficientes por lei dos cossenos combinada com as podas tradicionais por desigualdade triangular. Os experimentos demonstram comparativamente que as novas podas são mais eficientes que as tradicionais podas por desigualdade triangular, tendo desempenhos equivalente quando comparadas em conjuntos de alta dimensionalidade ou com dimensão fractal alta. Assim, os resultados confirmam as novas podas propostas como soluções alternativas eficientes para o problema de consultas RkNN / Data stored in large databases present an ever increasing complexity, pressing for the development of new classes of query operators. One such class, which is enticing an increasing interest, is the so-called Similarity Queries, where the most common are the similarity range queries (\'R IND. q\') and the k-nearest neighbor queries (kNN). A k-nearest neighbor query aims at retrieving the k stored elements nearer (or more similar) to a given reference element. Another important similarity query is the reverse k-nearest neighbor (RkNN), useful both for queries posed directly by the analyst and for queries that are part of more complex analysis processes. The objective of a reverse k-nearest neighbor queries is obtaining the stored elements that has the query reference element as one of their k-nearest neighbors. As the RkNN operation is a rather expensive operation, from the computational standpoint, most existing solutions only solve the query when applied over Euclidean multidimensional spaces (as these spaces also define cardinal and topological operations besides the Euclidean distance between pairs of elements) or retrieve only approximate answers, where false negatives can occur. Several applications, like the analysis of scientific, medical, engineering or financial data, require efficient and exact answers for the RkNN queries over data which is frequently represented in metric spaces, that is where no other property besides the similarity measure exists. Therefore, for applications handling metrical data, the assumption of Euclidean metric or even multidimensional data cannot be used. In this work, we propose new pruning rules based on the law of cosines, and the RkNN-MG algorithm, which uses them to solve RkNN queries in a way that is exact, faster than the existing approaches, that is not limited for any value of k, and that can be applied both over static and over dynamic datasets. The new pruning rules assume that the data set is in a metric space that can be embedded into an Euclidean space and use metric geometry properties valid in this space to perform effective pruning based on the law of cosines combined with the traditional pruning based on the triangle inequality property. The experiments show that the new pruning rules are alkways more efficient than the traditional pruning rules based solely on the triangle inequality. The experiments show that for high high dimensionality datasets, or for metric datasets with high fractal dimensionality, the performance improvement is smaller than for for lower dimensioinality datasets, but it\'s never worse. Thus, the results confirm that the our pruning rules are efficient alternative to solve RkNN queries in general
654

A pattern-driven corpus to predictive analytics in mitigating SQL injection attack

Uwagbole, Solomon January 2018 (has links)
The back-end database provides accessible and structured storage for each web application's big data internet web traffic exchanges stemming from cloud-hosted web applications to the Internet of Things (IoT) smart devices in emerging computing. Structured Query Language Injection Attack (SQLIA) remains an intruder's exploit of choice to steal confidential information from the database of vulnerable front-end web applications with potentially damaging security ramifications. Existing solutions to SQLIA still follows the on-premise web applications server hosting concept which were primarily developed before the recent challenges of the big data mining and as such lack the functionality and ability to cope with new attack signatures concealed in a large volume of web requests. Also, most organisations' databases and services infrastructure no longer reside on-premise as internet cloud-hosted applications and services are increasingly used which limit existing Structured Query Language Injection (SQLI) detection and prevention approaches that rely on source code scanning. A bio-inspired approach such as Machine Learning (ML) predictive analytics provides functional and scalable mining for big data in the detection and prevention of SQLI in intercepting large volumes of web requests. Unfortunately, lack of availability of robust ready-made data set with patterns and historical data items to train a classifier are issues well known in SQLIA research applying ML in the field of Artificial Intelligence (AI). The purpose-built competition-driven test case data sets are antiquated and not pattern-driven to train a classifier for real-world application. Also, the web application types are so diverse to have an all-purpose generic data set for ML SQLIA mitigation. This thesis addresses the lack of pattern-driven data set by deriving one to predict SQLIA of any size and proposing a technique to obtain a data set on the fly and break the circle of relying on few outdated competitions-driven data sets which exist are not meant to benchmark real-world SQLIA mitigation. The thesis in its contributions derived pattern-driven data set of related member strings that are used in training a supervised learning model with validation through Receiver Operating Characteristic (ROC) curve and Confusion Matrix (CM) with results of low false positives and negatives. We further the evaluations with cross-validation to have obtained a low variance in accuracy that indicates of a successful trained model using the derived pattern-driven data set capable of generalisation of unknown data in the real-world with reduced biases. Also, we demonstrated a proof of concept with a test application by implementing an ML Predictive Analytics to SQLIA detection and prevention using this pattern-driven data set in a test web application. We observed in the experiments carried out in the course of this thesis, a data set of related member strings can be generated from a web expected input data and SQL tokens, including known SQLI signatures. The data set extraction ontology proposed in this thesis for applied ML in SQLIA mitigation in the context of emerging computing of big data internet, and cloud-hosted services set our proposal apart from existing approaches that were mostly on-premise source code scanning and queries structure comparisons of some sort.
655

Approches hybrides pour la recherche sémantique de l'information : intégration des bases de connaissances et des ressources semi-structurées / Hybrid Approaches for Semantic Information Retrieval : Towards the Integration of Knowledge Bases and Semistructured Resources

Mrabet, Yassine 12 July 2012 (has links)
La recherche sémantique de l'information a connu un nouvel essor avec les nouvelles technologies du Web sémantique. Des langages standards permettent aujourd'hui aux logiciels de communiquer par le biais de données écrites dans le vocabulaire d'ontologies de domaine décrivant une sémantique explicite. Cet accès ``sémantique'' à l'information requiert la disponibilité de bases de connaissances décrivant les instances des ontologies de domaine. Cependant, ces bases de connaissances, bien que de plus en plus riches, contiennent relativement peu d'information par comparaison au volume des informations contenu dans les documents du Web.La recherche sémantique de l'information atteint ainsi certaines limites par comparaison à la recherche classique de l'information qui exploite plus largement ces documents. Ces limites se traduisent explicitement par l'absence d'instances de concepts et de relations dans les bases de connaissances construites à partir des documents du Web. Dans cette thèse nous étudions deux directions de recherche différentes afin de permettre de répondre à des requêtes sémantiques dans de tels cas. Notre première étude porte sur la reformulation des requêtes sémantiques des utilisateurs afin d'atteindre des parties de document pertinentes à la place des faits recherchés et manquants dans les bases de connaissances. La deuxième problématique que nous étudions est celle de l'enrichissement des bases de connaissances par des instances de relations.Nous proposons deux solutions pour ces problématiques en exploitant des documents semi-structurés annotés par des concepts ou des instances de concepts. Un des points clés de ces solutions est qu'elles permettent de découvrir des instances de relations sémantiques sans s'appuyer sur des régularités lexico-syntaxiques ou structurelles dans les documents. Nous situons ces deux approches dans la littérature et nous les évaluons avec plusieurs corpus réels extraits du Web. Les résultats obtenus sur des corpus de citations bibliographiques, des corpus d'appels à communication et des corpus géographiques montrent que ces solutions permettent effectivement de retrouver de nouvelles instances relations à partir de documents hétérogènes tout en contrôlant efficacement leur précision. / Semantic information retrieval has known a rapid development with the new Semantic Web technologies. With these technologies, software can exchange and use data that are written according to domain ontologies describing explicit semantics. This ``semantic'' information access requires the availability of knowledge bases describing both domain ontologies and their instances. The most often, these knowledge bases are constructed automatically by annotating document corpora. However, while these knowledge bases are getting bigger, they still contain much less information when comparing them with the HTML documents available on the surface Web.Thus, semantic information retrieval reaches some limits with respect to ``classic'' information retrieval which exploits these documents at a bigger scale. In practice, these limits consist in the lack of concept and relation instances in the knowledge bases constructed from the same Web documents. In this thesis, we study two research directions in order to answer semantic queries in such cases. The first direction consists in reformulating semantic user queries in order to reach relevant document parts instead of the required (and missing) facts. The second direction that we study is the automatic enrichment of knowledge bases with relation instances.We propose two novel solutions for each of these research directions by exploiting semi-structured documents annotated with concept instances. A key point of these solutions is that they don't require lexico-syntactic or structure regularities in the documents. We position these approaches with respect to the state of the art and experiment them on several real corpora extracted from the Web. The results obtained from bibliographic citations, call-for-papers and geographic corpora show that these solutions allow to retrieve new answers/relation instances from heterogeneous documents and rank them efficiently according to their precision.
656

Heurísticas para aprimorar o método BMW e suas variantes

Carvalho, Lídia Lizziane Serejo de 11 March 2015 (has links)
Submitted by Kamila Costa (kamilavasconceloscosta@gmail.com) on 2015-06-11T19:18:34Z No. of bitstreams: 1 Dissertação-Lídia L S de Carvalho.pdf: 837456 bytes, checksum: 620d89f05fc84dc2af7b89b6b6e587a0 (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2015-06-15T17:53:19Z (GMT) No. of bitstreams: 1 Dissertação-Lídia L S de Carvalho.pdf: 837456 bytes, checksum: 620d89f05fc84dc2af7b89b6b6e587a0 (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2015-06-15T17:57:19Z (GMT) No. of bitstreams: 1 Dissertação-Lídia L S de Carvalho.pdf: 837456 bytes, checksum: 620d89f05fc84dc2af7b89b6b6e587a0 (MD5) / Made available in DSpace on 2015-06-15T17:57:19Z (GMT). No. of bitstreams: 1 Dissertação-Lídia L S de Carvalho.pdf: 837456 bytes, checksum: 620d89f05fc84dc2af7b89b6b6e587a0 (MD5) Previous issue date: 2015-03-11 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Several research efforts have been conducted in the literature to develop methods to reduce the cost of query processing in search engines. This research aims to propose modifications to improve the performance of the block-Max WAND (BMW) algorithm, one of the most efficient algorithms proposed previously. The BMW algorithm uses heuristics to discard the documents entries at query processing, which makes it extremely fast. In this dissertation, we propose and evaluate additional heuristics to improve the perfomance of BMW and your variant BMW-CS in an attempt to both further reduces query processing times and the amount of memory required for processing queries. / Nos últimos anos, pesquisas relacionadas ao processamento de consultas em máquinas de busca têm sido realizadas com o objetivo de desenvolver métodos que reduzam o seu custo. Este trabalho visa propor modificações para melhorar o desempenho do algoritmo Block-Max WAND (BMW), um dos algoritmos mais eficientes propostos na literatura. O algoritmo BMW utiliza heurísticas para descartar documentos da resposta durante o processamento de consultas, o que torna sua execução extremamente veloz. Nesta dissertação, serão propostas e experimentadas modificações nas heurísticas de descarte de documentos e redução na quantidade de memória utilizada para processar consultas pelo algoritmo BMW e suas variantes, buscando-se assim ganhos de desempenho.
657

Optimizing similarity queries in metric spaces meeting user\'s expectation / Otimização de operações de busca por similaridade em espaços métricos

Mônica Ribeiro Porto Ferreira 22 October 2012 (has links)
The complexity of data stored in large databases has increased at very fast paces. Hence, operations more elaborated than traditional queries are essential in order to extract all required information from the database. Therefore, the interest of the database community in similarity search has increased significantly. Two of the well-known types of similarity search are the Range (\'R IND. q\') and the k-Nearest Neighbor (\'kNN IND. q\') queries, which, as any of the traditional ones, can be sped up by indexing structures of the Database Management System (DBMS). Another way of speeding up queries is to perform query optimization. In this process, metrics about data are collected and employed to adjust the parameters of the search algorithms in each query execution. However, although the integration of similarity search into DBMS has begun to be deeply studied more recently, the query optimization has been developed and employed just to answer traditional queries. The execution of similarity queries, even using efficient indexing structures, tends to present higher computational cost than the execution of traditional ones. Two strategies can be applied to speed up the execution of any query, and thus they are worth to employ to answer also similarity queries. The first strategy is query rewriting based on algebraic properties and cost functions. The second technique is when external query factors are applied, such as employing the semantic expected by the user, to prune the answer space. This thesis aims at contributing to the development of novel techniques to improve the similarity-based query optimization processing, exploiting both algebraic properties and semantic restrictions as query refinements / A complexidade dos dados armazenados em grandes bases de dados tem aumentado sempre, criando a necessidade de novas operações de consulta. Uma classe de operações de crescente interesse são as consultas por similaridade, das quais as mais conhecidas são as consultas por abrangência (\'R IND. q\') e por k-vizinhos mais próximos (\'kNN IND. q\'). Qualquer consulta e agilizada pelas estruturas de indexação dos Sistemas de Gerenciamento de Bases de Dados (SGBDs). Outro modo de agilizar as operações de busca e a manutenção de métricas sobre os dados, que são utilizadas para ajustar parâmetros dos algoritmos de busca em cada consulta, num processo conhecido como otimização de consultas. Como as buscas por similaridade começaram a ser estudadas seriamente para integração em SGBDs muito mais recentemente do que as buscas tradicionais, a otimização de consultas, por enquanto, e um recurso que tem sido utilizado para responder apenas a consultas tradicionais. Mesmo utilizando as melhores estruturas existentes, a execução de consultas por similaridade tende a ser mais custosa do que as operações tradicionais. Assim, duas estratégias podem ser utilizadas para agilizar a execução de qualquer consulta e, assim, podem ser empregadas também para responder às consultas por similaridade. A primeira estratégia e a reescrita de consultas baseada em propriedades algébricas e em funções de custo. A segunda técnica faz uso de fatores externos à consulta, tais como a semântica esperada pelo usuário, para restringir o espaço das respostas. Esta tese pretende contribuir para o desenvolvimento de técnicas que melhorem o processo de otimização de consultas por similaridade, explorando propriedades algebricas e restrições semânticas como refinamento de consultas
658

Service recommendation for individual and process use / Recommandation de services pour un usage individuel et la conception de procédés métiers

Nguyen, Ngoc Chan 13 December 2012 (has links)
Les services Web proposent un paradigme intéressant pour la publication, la découverte et la consommation de services. Ce sont des applications faiblement couplées qui peuvent être exécutées seules ou être composées pour créer de nouveaux services à valeur ajoutée. Ils peuvent être consommés comme des services individuels qui fournissent une interface unique qui reçoit des inputs et retourne des outputs (cas 1), ou bien ils peuvent être consommés en tant que composants à intégrer dans des procédés métier (cas 2). Nous appelons le premier cas de consommation « utilisation individuelle » et le second cas de consommation « utilisation en procédé métier ». La nécessité d'avoir des outils dédiés pour aider les consommateurs dans les deux cas de consommation a impliqué de nombreux travaux de recherche dans les milieux académiques ou industriels. D'une part, beaucoup de portails et de moteurs de recherche de services ont été développés pour aider les utilisateurs à rechercher et invoquer les services Web pour une utilisation individuelle. Cependant, les approches actuelles prennent principalement en compte les connaissances explicites présentées par les descriptions de service. Ils font des recommandations sans tenir compte des données qui reflètent l'intérêt des utilisateurs et peuvent demander des informations supplémentaires aux utilisateurs. D'autre part, plusieurs techniques et mécanismes associées aux procédés métier ont été élaborés pour rechercher des modèles de procédé métiers similaires, ou utiliser des modèles de référence. Ces mécanismes sont utilisés pour assister les analystes métiers à la conception de procédés métiers. Cependant, ils sont lents, source d'erreurs, grands consommateurs de ressources humaines, et peuvent induire à l’erreur les analystes métier. Dans notre travail, nous cherchons à faciliter la consommation de services Web pour une utilisation individuelle ou en procédé métier en proposant des techniques de recommandation. Notre objectif est de recommander aux utilisateurs des services qui sont proches de leur intérêt et de recommander aux analystes métier des services qui sont pertinents pour un procédé métier en cours de conception. Pour recommander des services pour une utilisation individuelle, nous prenons en compte l’historique des données d'utilisation de l'utilisateur qui reflètent ses intérêts. Nous appliquons des techniques de filtrage collaboratif bien connues pour faire des recommandations. Nous avons proposé cinq algorithmes et développé une application Web qui permet aux utilisateurs d'utiliser des services recommandés. Pour recommander des services pour une utilisation en procédé métier, nous prenons en compte les relations entre les services du procédé métier. Nous proposons de recommander les services en fonction de leurs localisations dans le procédé métier. Nous avons définit le contexte de voisinage d'un service. Nous avons présenté des recommandations basées sur l'appariement de contexte de voisinage. Par ailleurs, nous avons développé un langage de requête pour permettre aux analystes métier d'exprimer formellement des contraintes de filtrage. Nous avons proposé également une approche pour extraire le contexte de voisinage à partir de traces d’exécution de procédés métier. Enfin, nous avons développé trois applications afin de valider notre approche. Nous avons effectué des expérimentations sur des données recueillies par nos applications et sur deux grands ensembles de données publiques. Les résultats expérimentaux montrent que notre approche est faisable, précise et performante dans des cas d'utilisation réels / Web services have been developed as an attractive paradigm for publishing, discovering and consuming services. They are loosely-coupled applications that can be run alone or be composed to create new value-added services. They can be consumed as individual services which provide a unique interface to receive inputs and return outputs; or they can be consumed as components to be integrated into business processes. We call the first consumption case individual use and the second case business process use. The requirement of specific tools to assist consumers in the two service consumption cases involves many researches in both academics and industry. On the one hand, many service portals and service crawlers have been developed as specific tools to assist users to search and invoke Web services for individual use. However, current approaches take mainly into account explicit knowledge presented by service descriptions. They make recommendations without considering data that reflect user interest and may require additional information from users. On the other hand, some business process mechanisms to search for similar business process models or to use reference models have been developed. These mechanisms are used to assist process analysts to facilitate business process design. However, they are labor-intense, error-prone, time-consuming, and may make business analyst confused. In our work, we aim at facilitating the service consumption for individual use and business process use using recommendation techniques. We target to recommend users services that are close to their interest and to recommend business analysts services that are relevant to an ongoing designed business process. To recommend services for individual use, we take into account the user's usage data which reflect the user's interest. We apply well-known collaborative filtering techniques which are developed for making recommendations. We propose five algorithms and develop a web-based application that allows users to use services. To recommend services for business process use, we take into account the relations between services in business processes. We target to recommend relevant services to selected positions in a business process. We define the neighborhood context of a service. We make recommendations based on the neighborhood context matching. Besides, we develop a query language to allow business analysts to formally express constraints to filter services. We also propose an approach to extract the service's neighborhood context from business process logs. Finally, we develop three applications to validate our approach. We perform experiments on the data collected by our applications and on two large public datasets. Experimental results show that our approach is feasible, accurate and has good performance in real use-cases
659

User-Centric Privacy Preservation in Mobile and Location-Aware Applications

Guo, Mingming 10 April 2018 (has links)
The mobile and wireless community has brought a significant growth of location-aware devices including smart phones, connected vehicles and IoT devices. The combination of location-aware sensing, data processing and wireless communication in these devices leads to the rapid development of mobile and location-aware applications. Meanwhile, user privacy is becoming an indispensable concern. These mobile and location-aware applications, which collect data from mobile sensors carried by users or vehicles, return valuable data collection services (e.g., health condition monitoring, traffic monitoring, and natural disaster forecasting) in real time. The sequential spatial-temporal data queries sent by users provide their location trajectory information. The location trajectory information not only contains users’ movement patterns, but also reveals sensitive attributes such as users’ personal habits, preferences, as well as home and work addresses. By exploring this type of information, the attackers can extract and sell user profile data, decrease subscribed data services, and even jeopardize personal safety. This research spans from the realization that user privacy is lost along with the popular usage of emerging location-aware applications. The outcome seeks to relive user location and trajectory privacy problems. First, we develop a pseudonym-based anonymity zone generation scheme against a strong adversary model in continuous location-based services. Based on a geometric transformation algorithm, this scheme generates distributed anonymity zones with personalized privacy parameters to conceal users’ real location trajectories. Second, based on the historical query data analysis, we introduce a query-feature-based probabilistic inference attack, and propose query-aware randomized algorithms to preserve user privacy by distorting the probabilistic inference conducted by attackers. Finally, we develop a privacy-aware mobile sensing mechanism to help vehicular users reduce the number of queries to be sent to the adversarial servers. In this mechanism, mobile vehicular users can selectively query nearby nodes in a peer-to-peer way for privacy protection in vehicular networks.
660

地理資訊系統在不動產查詢與分析上之應用

王琬宜, Wang, Wan-I Unknown Date (has links)
本研究整合不動產相關之資訊及相關之空間資料,以使用者的觀點,利用地理資訊系統輔助建立一套結合不動產本身條件及其空間關係的不動產查詢分析系統。 本研究所開發之不動產查詢分析系統,在地理資料庫之建置上採用ARC/INFO軟體,納入了購屋考量之基本屬性因素及空間環境屬性因素,以兼顧不動產之「點」及「面」的資訊。系統的開發則採用Visual Basic(VB) 6.0中文企業版。 本系統在開發上未使用任何商業地理資訊系統軟體,而是以Visual Basic撰寫程式方式開發完成,除了具有開發使用之載台成本低廉,工具取得容易等優點外,其最大的效益在於系統的可攜性及可推廣性。本系統最大之功能特色在於使用者可自行選擇所需的個案條件,以及依據其對各因素之重視程度自行決定各因素之權重。

Page generated in 0.0444 seconds