Spelling suggestions: "subject:"top1k"" "subject:"top2k""
1 |
Analýza a problémy Top-K dotazu nad relační databází / Top-k querying over a relational databases: analysis and problemsČech, Martin January 2011 (has links)
Due to increasing capacity of storage devices and speed of computer networks during last years, it is still more required to sort and search data effectively. Query result containing thousands of rows is usually useless and unreadable. In that situation, users may prefer to define constraints and sorting priorities in the query, and see only several top rows from the result. This thesis deals with top-k queries problems, extension of relational algebra by new operators and their implementation in database system. It focuses on optimization of operations join and sort. The thesis includes implementation and comparison of some algorithms in standalone .NET library NRank.
|
2 |
Top-k ranking with uncertain dataWang, Chonghai 06 1900 (has links)
The goal of top-k ranking is to rank individuals so that the best k of them can be determined. Depending on the application domain, an individual can be a person, a product, an event, or just a collection of data or information for which an ordering makes sense.
In the context of databases, top-k ranking has been studied in two distinct directions, depending on whether the stored
information is certain or uncertain. In the former, the past research has focused on efficient query processing. In the latter case, a number of semantics based on possible worlds have been proposed and computational mechanisms investigated for what are called uncertain databases or probabilistic databases, where a tuple is associated with a membership probability indicating the level of confidence on the stored information.
In this thesis, we study top-k ranking with uncertain data in two general areas. The first is on pruning for the computation of top-k tuples in a probabilistic database. We investigate the theoretical basis and practical means of pruning for the recently proposed, unifying framework based on parameterized ranking functions. As such, our results are applicable to a wide range of ranking functions. We show experimentally that pruning can generate orders of magnitude performance gains. In the second area of our investigation, we study the problem of top-k ranking for objects with multiple attributes whose values are modeled by probability distributions and constraints. We formulate a theory of top-k ranking for objects by a characterization of what constitutes the strength of an object, and show that a number of previous proposals for top-k ranking are special cases of our theory. We carry out a limited study on computation of top-k objects under our theory. We reveal the close connection between top-k ranking in this context
and high-dimensional space studied in mathematics, in particular, the problem of computing the volumes of high-dimensional polyhedra expressed by linear inequations is a special case of top-k ranking of objects, and as such, the algorithms formulated for the former can be employed for the latter under the same conditions.
|
3 |
Top-k ranking with uncertain dataWang, Chonghai Unknown Date
No description available.
|
4 |
Lightweight Top-K Analysis in DBMSs Using Data Stream Analysis TechniquesHuang, Jing 03 September 2009 (has links)
Problem determination is the identification of problems and performance issues that occur in an observed system and the discovery of solutions to resolve them. Top-k analysis is common task in problem determination in database management systems. It involves the identification of the set of most frequently occurring objects according to some criteria, such as the top-k most frequently used tables or most frequent queries, or the top-k queries with respect to CPU usage or amount of I/O.
Effective problem determination requires sufficient monitoring and rapid analysis of the collected monitoring statistics. System monitoring often incurs a great deal of overhead and can interfere with the performance of the observed system. Processing vast amounts of data may require several passes through the analysis system and thus be very time consuming.
In this thesis, we present our lightweight top-k analysis framework in which lightweight monitoring tools are used to continuously poll system statistics producing several continuous data streams which are then processed by stream mining techniques. The results produced by our tool are the “top-k” values for the observed statistics. This information can be valuable to an administrator in determining the source of a problem.
We implement the framework as a prototype system called Tempo. Tempo uses IBM DB2’s snapshot API and a lightweight monitoring tool called DB2PD to generate the data streams. The system reports the top-k executed SQL statements and the top-k most frequently accessed tables in an on-line fashion. Several experiments are conducted to verify the feasibility and effectiveness of our approach. The experimental results show that our approach achieves low system overhead. / Thesis (Master, Computing) -- Queen's University, 2009-08-31 12:42:48.944
|
5 |
MTopS: Multi-Query Optimization for Continuous Top-K Query WorkloadsShastri, Avani 05 May 2011 (has links)
A continuous top-k query retrieves the k most preferred objects from a data stream according to a given preference function. These queries are important for a broad spectrum of applications from web-based advertising, network traffic monitoring, to financial analysis. Given the nature of such applications, a data stream may be subjected at any given time to multiple top-k queries with varying parameter settings requested simultaneously by different users. This workload of simultaneous top-k queries must be executed efficiently to assure real time responsiveness. However, existing methods in the literature focus on optimizing single top-k query processing, thus would handle each query independently. They are thus not suitable for handling large numbers of such simultaneous top-k queries due to their unsustainable resource demands. In this thesis, we present a comprehensive framework, called MTopS for Multiple Top-K Optimized Processing System. MTopS achieves resource sharing at the query level by analyzing parameter settings of all queries in the workload, including window-specific parameters and top-k parameters. We further optimize the shared processing by identifying the minimal object set from the data stream that is both necessary and sufficient for top-k monitoring of all queries in the workload. Within this framework, we design the MTopBand algorithm that maintains the up-to-date top-k result set in the size of O (k), where k is the required top-k result set, eliminating the need for any recomputation. To overcome the overhead caused by MTopBand to maintain replicas of the top-k result set across sliding windows, we optimize this algorithm further by integrating these views into one integrated structure, called MTopList. Our associated top-k maintenance algorithm, also called MTopList algorithm, is able to maintain this linear integrated structure, thus able to efficiently answer all queries in the workload. MTopList is shown to be memory optimal because it maintains only the distinct objects that are part of top-k results of at least one query. Our experimental study, using real data streams from domains of stock trades and moving object monitoring, demonstrates that both the efficiency and scalability in the query workload of our proposed technique is superior to the state-of-the-art solutions.
|
6 |
Conception d'un famework pour la relaxation des requêtes SPARQL / Design of a Framework for Cooperative Answering of SPARQL Query in RDF DatabaseFokou Pelap, Géraud 21 November 2016 (has links)
Une ontologie (ou base de connaissances) est une représentation formelle de connaissances sous la forme d'entités et de faits sur ces entités. Ces dernières années de nombreuses ontologies ont été développées dans des contextes académiques et industriels. Elles sont généralement définies à l’aide du langage forme lRDF et interrogées avec le langage de requêtes SPARQL. Une connaissance partielle du contenu et de la structure d’une ontologie peut amener les utilisateurs à exécuter des requêtes qui retournent un résultat vide de réponses, considéré comme insatisfaisant. Parmi les techniques d’interrogation coopératives développées pour résoudre ce problème se trouve la technique de relaxation de requêtes. Elle consiste à affaiblir les conditions exprimées dans les requêtes pour retourner des résultats alternatifs à l'utilisateur. En étudiant les travaux existants sur la relaxation de requêtes SPARQL nous avons constaté qu’ils présentent plusieurs limitations :(1) ils ne permettent pas de définir précisément la relaxation à effectuer tout en offrant la possibilité de contrôler le processus de relaxation (2) ils n’identifient pas les causes réelles d'échec de la requête formulée par l'utilisateur et (3) ils n’intègrent pas d’outils interactifs pour mieux exploiter les techniques de relaxation proposées. Pour répondre à ces limitations, ce travail de thèse propose un framework pour la relaxation de requêtes SPARQL. Ce framework inclut un ensemble d'opérateurs de relaxation des requêtes SPARQL permettant de relaxer incrémentalement des parties précises de la requête utilisateur tout en contrôlant la pertinence des réponses alternatives retournées par rapport aux besoins exprimés par l’utilisateur dans sa requête. Notre framework propose également plusieurs algorithmes qui identifient les causes d’échec de la requête utilisateur et les requêtes qui réussissent (c'est-à-dire, qui ont des résultats) ayant un nombre maximal de conditions de la requête initialement exprimée. Ces informations permettent à l’utilisateur de mieux comprendre pourquoi sa requête échoue et d’exécuter des requêtes qui retournent des résultats alternatifs.Enfin, notre framework propose des stratégies de relaxation qui élargissent les conditions de la requête utilisateur en s’appuyant sur les causes d’échec de celle-ci. Ces stratégies permettent de réduire le temps d’exécution du processus de relaxation par rapport à l’approche classique, qui consiste à exécuter les requêtes relaxées, en fonction de leur similarité avec la requête utilisateur, jusqu’à l’obtention d’un nombre satisfaisant de résultats alternatifs. Les contributions proposées dans ce framework ont été implémentées et validées par des scénarios et expérimentations basés sur le banc d'essai LUBM. Ils montrent l’intérêt de nos contributions par rapport à l'état de l'art. / Ontology (or Knowledge base) is a formal representation of knowledge as entities and facts related to these entities. In the past years, several ontologies have been developed in academic and industrial contexts.They are generally defined with RDF language and querying with SPARQL language. A partial knowledge of instances and schema of ontology may lead user to execute queries that result in empty answers, considered as unsatisfactory. Among cooperative querying techniques which have been developed to solve the problem of empty answers, query relaxation technique is the well-known and used. It aims at weakening the conditions expressed in the original query to return alternative answers to the user. Existing work on relaxation of SPARQL queries we suffer from many drawbacks : (1) they do not allow defining in precise way the relaxation to perform with the ability to control the relaxation process (2) they do not identify the causes of failure of the request expressed by the user and (3) they do not include interactive tools to better exploit the relaxation techniques proposed. To address these limitations, this thesis proposes an advanced framework forquery relaxation SPARQL. First, this framework includes a set of relaxation operators dedicated to SPARQLqueries, to incrementally relax specific parts of the user request while controlling the relevance of the alternative responses returned w.r.t. to the user needs expressed in his request. Our framework also provides both several algorithms that identify the causes of failure of the user query and queries that are successful with a maximum number of conditions initially expressed in the failing request. This information allows the user to better understand why his request fails and execute queries that return non-empty alternative results. Finally,our framework offers intelligent relaxation strategies that rely on the causes of query failure. Such strategies reduce the execution time of the relaxation process compared to the traditional approach, which executes relaxed requests, based on their similarity to the user request, until a number of satisfactory alternative results is obtained. All contributions proposed in this framework were implemented and validated by experiments and scenarios based on the tests bench LUBM. They show the interest of our contributions w.r.t. the state of theart.
|
7 |
Recherche top-k pour le contenu du Web / Top-k search over rich web contentBonaque, Raphaël 30 September 2016 (has links)
Les réseaux sociaux sont de plus en plus présents dans notre vie de tous les jours et sont en passe de devenir notre moyen de communication et d'information principal. Avec l'augmentation des données qu'ils contiennent sur nous et notre environnement, il devient décisif d'être en mesure d'accéder et d'analyser ces données. Aujourd'hui la manière la plus commune d'accéder à ces données est d'utiliser la recherche par mots-clés : on tape une requête de quelques mots et le réseau social renvoie un nombre fixe de documents qu'il juge pertinents. Dans les approches actuelles de recherche top-k dans un contexte social, la pertinence d'un document dépend de deux facteurs: la proximité sociale entre le document et l'utilisateur faisant la requête et le recoupement entre les mots-clés de la requête et les mots contenus dans le document. Nous trouvons cela limité et proposons de prendre en compte les interactions complexes entres les utilisateurs liés à ce document mais aussi sa structure et le sens des mots qu'il contient, au lieu de leur formulation. Dans ce but, nous identifions les exigences propres à la création d'un modèle qui intégrerait pleinement des données sémantiques, structurées et sociales et proposons un nouveau modèle, S3, satisfaisant ces exigences. Nous rajoutons un modèle de requêtes à S3 et développons S3k, un algorithme personnalisable de recherche top-k par mots-clés sur S3. Nous prouvons la correction de notre algorithme et en proposons une implémentation. Nous la comparons, à l'aide de jeux de données créés à partir du monde réel, avec celle d'une autre approche de recherche top-k par mots-clés dans un contexte social et montrons les différences fondamentales entre ces approches ainsi que les avantages qu'on peut tirer de la nôtre. / Social networks are increasingly present in our everyday life and are fast becoming our primary means of information and communication. As they contain more and more data about our surrounding and ourselves, it becomes vital to access and analyze this data. Currently, the primary means to query this data is through top-k keyword search: you enter a few words and the social network service sends you back a fixed number of relevant documents. In current top-k searches in a social context the relevance of a document is evaluated based on two factors: the overlapping of the query keywords with the words of the document and the social proximity between the document and the user making the query. We argue that this is limited and propose to take into account the complex interactions between the users linked to the document, its structure and the meaning of the words it contains instead of their phrasing. To this end we highlight the requirements for a model integrating fully structured, semantic and social data and propose a new model, called S3, satisfying these requirements. We introduce querying capabilities to S3 and develop an algorithm, S3k, for customizable top-k keyword search on S3. We prove the correctness of our algorithm and propose an implementation for it. We compare this implementation with another top-k keyword search in a social context, using datasets created from real world data, and show their differences and the benefits of our approach.
|
8 |
Causal modeling and prediction over event streamsAcharya, Saurav 01 January 2014 (has links)
In recent years, there has been a growing need for causal analysis in many modern stream applications such as web page click monitoring, patient health care monitoring, stock market prediction, electric grid monitoring, and network intrusion detection systems. The detection and prediction of causal relationships help in monitoring, planning, decision making, and prevention of unwanted consequences.
An event stream is a continuous unbounded sequence of event instances. The availability of a large amount of continuous data along with high data throughput poses new challenges related to causal modeling over event streams, such as (1) the need for incremental causal inference for the unbounded data, (2) the need for fast causal inference for the high throughput data, and (3) the need for real-time prediction of effects from the events seen so far in the continuous event streams.
This dissertation research addresses these three problems by focusing on utilizing temporal precedence information which is readily available in event streams: (1) an incremental causal model to update the causal network incrementally with the arrival of a new batch of events instead of storing the complete set of events seen so far and building the causal network from scratch with those stored events, (2) a fast causal model to speed up the causal network inference time, and (3) a real-time top-k predictive query processing mechanism to find the most probable k effects with the highest scores by proposing a run-time causal inference mechanism which addresses cyclic causal relationships.
In this dissertation, the motivation, related work, proposed approaches, and the results are presented in each of the three problems.
|
9 |
A Framework for Top-K Queries over Weighted RDF GraphsJanuary 2010 (has links)
abstract: The Resource Description Framework (RDF) is a specification that aims to support the conceptual modeling of metadata or information about resources in the form of a directed graph composed of triples of knowledge (facts). RDF also provides mechanisms to encode meta-information (such as source, trust, and certainty) about facts already existing in a knowledge base through a process called reification. In this thesis, an extension to the current RDF specification is proposed in order to enhance RDF triples with an application specific weight (cost). Unlike reification, this extension treats these additional weights as first class knowledge attributes in the RDF model, which can be leveraged by the underlying query engine. Additionally, current RDF query languages, such as SPARQL, have a limited expressive power which limits the capabilities of applications that use them. Plus, even in the presence of language extensions, current RDF stores could not provide methods and tools to process extended queries in an efficient and effective way. To overcome these limitations, a set of novel primitives for the SPARQL language is proposed to express Top-k queries using traditional query patterns as well as novel predicates inspired by those from the XPath language. Plus, an extended query processor engine is developed to support efficient ranked path search, join, and indexing. In addition, several query optimization strategies are proposed, which employ heuristics, advanced indexing tools, and two graph metrics: proximity and sub-result inter-arrival time. These strategies aim to find join orders that reduce the total query execution time while avoiding worst-case pattern combinations. Finally, extensive experimental evaluation shows that using these two metrics in query optimization has a significant impact on the performance and efficiency of Top-k queries. Further experiments also show that proximity and inter-arrival have an even greater, although sometimes undesirable, impact when combined through aggregation functions. Based on these results, a hybrid algorithm is proposed which acknowledges that proximity is more important than inter-arrival time, due to its more complete nature, and performs a fine-grained combination of both metrics by analyzing the differences between their individual scores and performing the aggregation only if these differences are negligible. / Dissertation/Thesis / M.S. Computer Science 2010
|
10 |
CPR: Complex Pattern Ranking for Evaluating Top-k Pattern Queries over Event StreamsJanuary 2011 (has links)
abstract: Most existing approaches to complex event processing over streaming data rely on the assumption that the matches to the queries are rare and that the goal of the system is to identify these few matches within the incoming deluge of data. In many applications, such as stock market analysis and user credit card purchase pattern monitoring, however the matches to the user queries are in fact plentiful and the system has to efficiently sift through these many matches to locate only the few most preferable matches. In this work, we propose a complex pattern ranking (CPR) framework for specifying top-k pattern queries over streaming data, present new algorithms to support top-k pattern queries in data streaming environments, and verify the effectiveness and efficiency of the proposed algorithms. The developed algorithms identify top-k matching results satisfying both patterns as well as additional criteria. To support real-time processing of the data streams, instead of computing top-k results from scratch for each time window, we maintain top-k results dynamically as new events come and old ones expire. We also develop new top-k join execution strategies that are able to adapt to the changing situations (e.g., sorted and random access costs, join rates) without having to assume a priori presence of data statistics. Experiments show significant improvements over existing approaches. / Dissertation/Thesis / M.S. Computer Science 2011
|
Page generated in 0.0459 seconds