1 |
Scaling Continuous Query Services for Future Computing Platforms and ApplicationsGedik, Bugra 13 June 2006 (has links)
The ever increasing rate of digital information available from on-line sources drives the need for building information monitoring applications to assist users in tracking relevant changes in these sources and accessing information that is of interest to them in a timely manner. Continuous queries (CQs) are standing queries that are continuously evaluated over dynamic sources to track information changes that meet user specified thresholds and notify users of new results in real-time. CQ systems can be considered as powerful middleware for supporting information monitoring applications. A significant challenge in building CQ systems is scalability, caused by the large number of users and queries, and by the large and growing number of information sources with high update rates.
In this thesis we use CQs to shepherd through and address the challenges involved in supporting information monitoring applications in future computing platforms. The focus is on P2P web monitoring in Internet systems, location monitoring in mobile systems, and environmental monitoring in sensor systems. Although different computing platforms require different software architectures for building scalable CQ services, there is a common design philosophy that this thesis advocates for making CQ services scalable and efficient. This can be summarized as "move computation close to the places where the data is produced." A common challenge in scaling CQ systems is the resource-intensive nature of query evaluation, which involves continuously checking updates in a large number of data sources and evaluating trigger conditions of a large number of queries over these updates, consuming both cpu and network bandwidth resources. If some part of the query evaluation can be pushed close to the sources where the data is produced, the resulting early filtering of updates will save both bandwidth and cpu resources.
In summary, in this thesis we show that distributed CQ architectures that are designed to take advantage of the opportunities provided by ubiquitous computing platforms and pervasive networks, while at the same time recognizing and resolving the challenges posed by these platforms, lead to building scalable and effective CQ systems to better support the demanding information monitoring applications of the future.
|
2 |
Skyline queries for multi-criteria decision support systemsGudala, Satyaveer Goud January 1900 (has links)
Master of Science / Department of Computing and Information Sciences / William H. Hsu / In decision-making applications, the Skyline query is used to find a set of non-dominated data points (called Skyline points) in a multi-dimensional dataset. A data point dominates another data point if it is at least as good as the other data point in all dimensions and better in at least one dimension. The skyline consists of data points not dominated by any other data point. Computing the skyline points of a dataset is essential for applications that involve multi-criteria decision making. Skyline queries filter out the interesting tuples from a potentially large dataset. No matter how we weigh our preferences along the attributes, only those tuples which score best under a monotone scoring function are part of the skyline. In other words, the skyline does not contain tuples which are nobody's favorite. With a growing number of real-world applications involving multi-criteria decision making over multiple dimensions, skyline queries can be used to answer those problems accurately and efficiently.
This report mainly focuses on various skyline computing algorithms which can be used for online processing efficiently and are suitable to present multi-criteria decision making scenario. I implemented the Branch-and-Bound skyline Algorithm on two different data sets; one is a synthetic dataset and the other is a real dataset. My aim is to explore various subspaces of a given dataset and compute skylines over them, especially those subspace skylines which contain the least number of the skyline points.
|
3 |
A Natural Language Search Interface for Accommodation QueriesChavanne, Erin 01 January 2015 (has links)
Services that once required human interaction are now completed with the click of a few buttons. In general, this allows for a more streamlined process for activities such as sending messages (email or text messages), filing taxes, or even shopping for groceries. In terms of searching for hotels and travel accommodations however, this process has not proven to be the most effective as the speed and efficiency is hindered by the interface through which this information is available. Choosing a travel specific site, filling in the required fields, combing through results for the desired specifications, and then possibly repeating the process elsewhere, does not provide the ability for the user to express the entirety of their preferences for the accommodation and is therefore not an effective method for searching.
Natural language search provides a more accessible and intuitive interface for accommodation searching. Instead of specifying fields that may not encompass the the entirety of the desired search, the user is able to express all of the aspects in a single, natural language, search.
In this project, we propose a natural language search interface for accommodations such as hotels, hostels, or apartments. Data acquired through Amazon Mechanical Turk is used to create a system for extracting various accommodation fields. Zilyo and Expedia APIs are then queried for real-time accommodation listings. These results are then adjusted based on the specifics of the search that were not included in the original query. A natural language search of this kind is not only more accessible on the consumer end, but provides data that pertains directly to the the entirety of the intended search.
|
4 |
Spatial Range Querying for Gaussian-Based Imprecise Query ObjectsIshikawa, Yoshiharu, Iijima, Yuichi, Yu, Jeffrey Xu 03 1900 (has links)
No description available.
|
5 |
Using Resampling to Optimizing Continuous Queries in Wireless Sensor NetworksLiu, Pin-yu 17 July 2007 (has links)
The advances of communication and computer techniques have enabled the development of low-cost, low-power, multifunctional sensor nodes that are small in size and capable of communicating in short distances. A sensor network is composed of a large number of sensor nodes that are densely deployed either inside the phenomenon to be observed or very close to it. Sensor networks open up new opportunities to observe and interact with the physical world around us.
Despite the recent advances in sensor network applications and technology, sensor networks still suffer from the major problems of limited energy. It is because most sensor nodes use battery as their energy srouce and are inconvenient and sometimes difficult to be replaced when the battery run out. Understanding the events, measures, and tasks required by certain applications has the potential to provide efficient communication techniques for the sensor network.
Our focus in this work is on the efficient processing of continuous queries, by which query results have to be generated according to the sampling rate specified by the user for an extended period of time. In this thesis, we will deal with two types of continuous queries. The first type of queries requires data from all sensor nodes; while the other is only interested in the data returned by some selected nodes. To answer these queries, data have to be sent to the base station at some designated rate, which may consume much energy. Previous works have developed two methods to reduce the energy consumption. They both base on the error range which the user can tolerate to determine whether current sensing data should be transmitted. While the first uses simple cache method, the second uses complex multi-dimensional model. However, the proposed methods required the user to specify the error range, which may not be easy to specify. In addition, the sensed data reported by the sensors were assumed to be accurate, which is by no means true in the real world. This thesis is based on Kalman filter to correct and predict sensing data. As a result, the sampling frequency of each sensor is dynamically adjusted, referred to as resampling which systematically determine the data sensing/transferring rate of sensors. We evaluate our proposed methods using empirical data collected from a real sensor network.
|
6 |
WBG (Whois Based Geolocation): uma estratégia para localização geográfica de hosts na InternetEndo, Patricia Takako 31 January 2008 (has links)
Made available in DSpace on 2014-06-12T15:54:02Z (GMT). No. of bitstreams: 2
arquivo1953_1.pdf: 1722836 bytes, checksum: 2be769931d2befdf5a296cf78a205f34 (MD5)
license.txt: 1748 bytes, checksum: 8a4605be74aa9ea9d79846c1fba20a33 (MD5)
Previous issue date: 2008 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Baseado, por exemplo, na localização geográfica de um determinado host na Internet, podese
oferecer serviços especializados, como: a) páginas web com preferências regionais (e.g.
usuários online podem receber propagandas direcionadas ou ter a linguagem para apresentação
de conteúdo selecionada automaticamente); b) controle de disponibilidade de dados, de acordo
com a localização do usuário (e.g. pode-se restringir o acesso a determinados dados através
de políticas regionais e autorização de transações a partir de localidades pré-estabelecidas), c)
estudo e análise de tráfego geolocalizado para entender quem se comunica com quem no nível
de usuários, regiões e países e identificação de anomalias de roteamento. Os aspectos comuns
destas aplicações são a sua dependência em relação a estratégias, denominadas geolocalização.
Contudo, alguns destes mecanismos apresentam uma baixa acurácia ou uma estimativa
de localização geográfica não-aceitável para determinadas aplicações. Portanto, torna-se de
grande importância estudos que melhorem a precisão, bem como a completude das estratégias
utilizadas para inferir a geolocalização de hosts na Internet.
Este trabalho tem como principais objetivos o estudo sobre as estratégias de geolocalização
existentes; a proposta de uma estratégia que melhore a precisão das inferências de localização
geográfica de hosts na Internet e a completude dos resultados; e o estudo de tráfego geolocalizado
de uma base de dados da rede acadêmica do Estado de Pernambuco. A estratégia
desenvolvida, denominada WBG (Whois Based Geolocation), é baseada em buscas whois online
e possui uma heurística baseada na ferramenta traceroute
|
7 |
Secure Geometric Search on Encrypted Spatial DataWang, Boyang, Wang, Boyang January 2017 (has links)
Spatial data (e.g., points) have extensive applications in practice, such as spatial databases, Location-Based Services, spatial computing, social analyses, computational geometry, graph design, medical imaging, etc. Geometric queries, such as geometric range queries (i.e., finding points inside a geometric range) and nearest neighbor queries (i.e., finding the closest point to a given point), are fundamental primitives to analyze and retrieve information over spatial data. For example, a medical researcher can query a spatial dataset to collect information about patients in a certain geometric area to predict whether there will be a dangerous outbreak of a particular disease (e.g., Ebola or Zika).
With the dramatic increase on the scale and size of data, many companies and organizations are outsourcing significant amounts of data, including significant amounts of spatial data, to public cloud data services in order to minimize data storage and query processing costs. For instance, major companies and organizations, such as Yelp, Foursquare and NASA, are using Amazon Web Services as their public cloud data services, which can save billions of dollars per year for those companies and organizations. However, due to the existence of attackers (e.g., a curious administrator or a hacker) on remote servers, users are worried about the leakage of their private data while storing and querying those data on public clouds.
Searchable Encryption (SE) is an innovative technique to protect the data privacy of users on public clouds without losing search functionalities on the server side. Specifically, a user can encrypt its data with SE before outsourcing data to a public server, and this public server is able to search encrypted data without decryption. Many SE schemes have been proposed to support simple queries, such as keyword search. Unfortunately, how to efficiently and securely support geometric queries over encrypted spatial data remains open.
In this dissertation, to protect the privacy of spatial data in public clouds while still maintaining search functions without decryption, we propose a set of new SE solutions to support geometric queries, including geometric range queries and nearest neighbor queries, over encrypted spatial data. The major contributions of this dissertation focus on two aspects. First, we enrich search functionalities by designing new solutions to carry out secure fundamental geometric search queries, which were not supported in previous works. Second, we minimize the performance gap between theory and practice by building novel schemes to perform geometric queries with highly efficient search time and updates over large-scale encrypted spatial data.
Specifically, we first design a scheme supporting circular range queries (i.e., retrieving points inside a circle) over encrypted spatial data. Instead of directly evaluating compute-then-compare operations, which are inefficient over encrypted data, we use a set of concentric circles to represent a circular range query, and then verify whether a data point is on any of those concentric circles by securely evaluating inner products over encrypted data.
Next, to enrich search functionalities, we propose a new scheme, which can support arbitrary geometric range queries, such as circles, triangles and polygons in general, over encrypted spatial data. By leveraging the properties of Bloom filters, we convert a geometric range search problem to a membership testing problem, which can be securely evaluated with inner products. Moving a step forward, we also build another new scheme, which not only supports arbitrary geometric range queries and sub-linear search time but also enables highly efficient updates.
Finally, we address the problem of secure nearest neighbor search on encrypted large-scale datasets. Specifically, we modify the algorithm of nearest neighbor search in advanced tree structures (e.g., R-trees) by simplifying operations, where evaluating comparisons alone on encrypted data is sufficient to efficiently and correctly find nearest neighbors over datasets with millions of tuples.
|
8 |
An Automated Conversion Of Temporal Databases Into Xml With Fuzziness OptionIsikman, Omer Ozgun 01 September 2010 (has links) (PDF)
The importance of incorporating time in databases has been well realized by the community and time varying databases have been extensively studied by researchers. The main idea is to model up-to-date changes to data since it became available. Time information is mostly overlaid on the
traditional databases, and extensional time dimension helps in inquiring or past data / this all becomes possible only once the idea is realized and favored by commercial database management systems. Unfortunately, one disadvantage of the temporal database management system is that it
has not been commercialized. Firstly XML ,eXtensible Markup Language, is a defacto standard for data interchange and hence integrating XML as the data model is decided. The motivation for the work described in this thesis is two-fold / transferring databases into XML with changing crisp values into fuzzy variables describing fuzzy sets and second bitemporal databases form one interesting type of temporal databases. Thus, purpose is to suggest a complete automated system that converts any
bitemporal database to its fuzzy XML schema definition. However, the implemented temporal database operators are database content independent. Fuzzy elements are capable of having different membership functions and varying number of linguistic variables. A scheme for determining membership function parameters is proposed.
Finally, fuzzy queries have also been implemented as part of the system.
|
9 |
REVISING HORN FORMULASDoshi, Jignesh Umesh 01 January 2003 (has links)
Boolean formulas can be used to model real-world facts. In some situation we may havea Boolean formula that closely approximates a real-world fact, but we need to fine-tune itso that it models the real-world fact exactly. This is a problem of theory revision where thetheory is in the form of a Boolean formula. An algorithm is presented for revising a class ofBoolean formulas that are expressible as conjunctions of Horn clauses. Each of the clausesin the formulas considered here has a unique unnegated variable that does not appear inany other clauses, and is not `F'. The revision algorithm uses equivalence and membershipqueries to revise a given formula into a formula that is equivalent to an unknown targetformula having the same set of unnegated variables. The amount of time required by thealgorithm to perform this revision is logarithmic in the number of variables, and polynomialin the number of clauses in the unknown formula. An early version of this work waspresented at the 2003 Midwest Artificial Intelligence and Cognitive Science Conference [4].
|
10 |
Avaliação da qualidade de funções de similaridade no contexto de consultas por abrangência / Quality evaluation of similarity functions for range queriesStasiu, Raquel Kolitski January 2007 (has links)
Em sistemas reais, os dados armazenados tipicamente apresentam inconsistências causadas por erros de gra a, abreviações, caracteres trocados, entre outros. Isto faz com que diferentes representações do mesmo objeto do mundo real sejam registrados como elementos distintos, causando um problema no momento de consultar os dados. Portanto, o problema investigado nesta tese refere-se às consultas por abrangência, que procuram encontrar objetos que representam o mesmo objeto real consultado . Esse tipo de consulta não pode ser processado por coincidência exata, necessitando de um mecanismo de consulta com suporte à similaridade. Para cada consulta submetida a uma determinada coleção, a função de similaridade produz um ranking dos elementos dessa coleção ordenados pelo valor de similaridade entre cada elemento e o objeto consulta. Como somente os elementos que são variações do objeto consulta são relevantes e deveriam ser retornados, é necessário o uso de um limiar para delimitar o resultado. O primeiro desa o das consultas por abrangência é a de nição do limiar. Geralmente é o especialista humano que faz a estimativa manualmente através da identi - cação de elementos relevantes e irrelevantes para cada consulta e em seguida, utiliza uma medida como revocação e precisão (R&P). A alta dependência do especialista humano di culta o uso de consultas por abrangência na prática, principalmente em grandes coleções. Por esta razão, o método apresentado nesta tese tem por objetivo estimar R&P para vários limiares com baixa dependência do especialista humano. Como um sub-produto do método, também é possível selecionar o limiar mais adequado para uma função sobre uma determinada coleção. Considerando que as funções de similaridade são imperfeitas e que apresentam níveis diferentes de qualidade, é necessário avaliar a função de similaridade para cada coleção, pois o resultado é dependente dos dados. Um limiar para uma coleção pode ser totalmente inadequado para outra coleção, embora utilizando a mesma função de similaridade. Como forma de medir a qualidade de funções de similaridade no contexto de consultas por abrangência, esta tese apresenta a discernibilidade. Trata-se de uma medida que de ne a habilidade da função de similaridade de separar elementos relevantes e irrelevantes. Comparando com a precisão média, a discernibilidade captura variações que não são percebidas pela precisão média, o que mostra que a discernibilidade é mais apropriada para consultas por abrangência. Uma extensa avaliação experimental usando dados reais mostra a viabilidade tanto do método de estimativas como da medida de discernibilidade para consultas por abrangência. / In real systems, stored data typically have inconsistencies caused by typing errors, abbreviations, transposed characters, amongst others. For this reason, di erent representations of the same real world object are stored as distinct elements, causing problems during query processing. In this sense, this thesis investigates range queries which nd objects that represent the same real world object being queried . This type of query cannot be processed by exact matching, thus requiring the support for querying by similarity. For each query submitted to a given collection, the similarity function produces a ranked list of all elements in this collection. This ranked list is sorted decreasingly by the similarity score value with the query object. Only the variations of the query object should be part of the result as only those items are relevant. For this reason, it is necessary to apply a threshold value to properly split the ranking. The rst challenge of range queries is the de nition of a proper threshold. Usually, a human specialist makes the estimation manually through the identi cation of relevant and irrelevant elements for each query. Then, he/she uses measures such as recall and precision (R&P). The high dependency on the human specialist is the main di culty related to use of range queries in real situations, specially for large collections. In this sense, the method presented in this thesis has the objective of estimating R&P at several thresholds with low human intervention. As a by-product of this method, it is possible to select the optimal threshold for a similarity function in a given collection. Considering the fact that the similarity functions are imperfect and vary in quality, it is necessary to evaluate the similarity function for each collection as the result is domain dependent. A threshold value for a collection could be totally inappropriate for another, even though the same similarity function is applied. As a measure of quality of similarity functions for range queries, this thesis introduces discernability. This is a measure to quantify the ability of the similarity function in separating relevant and irrelevant elements. Comparing discernability and mean average precision, the rst one can capture variations that are not noticed by precision-based measures. This property shows that discernability presents better results for evaluating similarity functions for range queries. An extended experimental evaluation using real data shows the viability of both, the estimation method and the discernability measure, applied to range queries.
|
Page generated in 0.0543 seconds