Global ETD Search

11	Preferenční dotazováni, indexy, optimalizace / Preferencev querying, indexing, optimisation Horničák, Erik January 2009 (has links) In this thesis we discuss the issue of searching the best k objects from the multi-users point of view. Every user has his own preferences, which are represented by fuzzy functions and aggregation function. This thesis designs and implements several solutions of searching the best k objects when attributes data are stored on remote servers. It was necessary to modificate existing algorithms for this type of obtaining data. This thesis uses several variants of Fagin algorithm, indexing methods using B+ trees and communication via web services.
12	Preferenční dotazováni, indexy, optimalizace / Preferencev querying, indexing, optimisation Horničák, Erik January 2011 (has links) In this thesis we discuss the issue of searching the best k objects from the multi-users point of view. Every user has his own preferences, which are represented by fuzzy functions and aggregation function. This thesis designs and implements several solutions of searching the best k objects when attributes data are stored on remote servers. It was necessary to modificate existing algorithms for this type of obtaining data. This thesis uses several variants of Fagin algorithm, indexing methods using B+ trees and communication via web services.
13	Smart Clustering System for Filtering and Cleaning User Generated Content : Creating a profanity filter for Truecaller / System för filtrering och sanering av oönskad text i användarskapat innehåll Moradi, Arvin January 2013 (has links) This thesis focuses on investigating and creating an application for filtering user-generated content. The method was to examine how profanity and racist expressions are used and manipulated to evade filtering processes in similar systems. Focus also went on to study different algorithms to get this process to be quick and efficient, i.e., to process as many names in the shortest amount of time possible. This is because the client needs to filter millions of new uploads every day. The result shows that the application detects profanity and manipulated profanity. Data from the customer’s database was also used for testing purposes, and the result showed that the application also works in practice. The performance test shows that the application has a fast execution time. We could see this by approximating it to a linear func-tion with respect to time and the number of names entered. The conclusion was that the filter works and discovers profanity not detected earlier. Future updates to strengthen the decision process could be to introduce a third-party service, or a web interface where you can manually control decisions. Execution time is good and shows that 10 million names can be pro-cessed in about 6 hours. In the future, one can parallelize queries to the database so that multiple names can be processed simultaneously. / Denna avhandling fokuserar på att utreda och skapa en applikation för filtrering av användargenererat innehåll. Metoden gick ut på att undersöka hur svordomar samt rasistiska uttryck används och manipuleras för att undgå filtrerings processer i liknande system. Fokus gick även ut på att studera olika algoritmer för att få denna process att vara snabb och effektiv, dvs kunna bearbeta så många namn på kortast möjliga tid. Detta beror på att kunden i detta sammanhang får in miljontals nya uppladdningar varje dag, som måste filtreras innan använding. Resultatet visar att applikationen upptäcker svordomar i olika former. Data från kundens databas användes också för test syfte, och resultatet visade att applikationen även fungerar i praktiken. Prestanda testet visar att applikationen har en snabb exekveringstid. Detta kunde vi se genom att estimera den till en linjär funktion med hänsyn till tid och antal namn som matats in. Slutsatsen blev att filtret fungerar och upptäcker svordomar som inte upptäckts tidigare i kundens databas. För att stärka besluten i processen kan man i framtida uppdateringar införa tredje parts tjänster, eller ett web interface där man manuelt kan styra beslut. Exekverings tiden är bra och visar att 10 miljoner namn kan bearbetas på cirka 6 timmar. I framtiden kan man parallellisera förfrågningarna till databasen så att flera namn kan bearbetas samtidigt. Java REST Jersey filter linear function MongoDB Maven String matching algorithm B-Tree Hashmap Aho-Corasick Engineering and Technology Teknik och teknologier
14	EFFICIENT LSM SECONDARY INDEXING FOR UPDATE-INTENSIVE WORKLOADS Jaewoo Shin (17069089) 29 September 2023 (has links) <p dir="ltr">In recent years, massive amounts of data have been generated from various types of devices or services. For these data, update-intensive workloads where the data update their status periodically and continuously are common. The Log-Structured-Merge (LSM, for short) is a widely-used indexing technique in various systems, where index structures buffer insert operations into the memory layer and flush them into disk when the data size in memory exceeds a threshold. Despite its noble ability to handle write-intensive (i.e., insert-intensive) workloads, LSM suffers from degraded query performance due to its inefficiency on index maintenance of secondary keys to handle update-intensive workloads.</p><p dir="ltr">This dissertation focuses on the efficient support of update-intensive workloads for LSM-based indexes. First, the focus is on the optimization of LSM secondary-key indexes and their support for update-intensive workloads. A mechanism to enable the LSM R-tree to handle update-intensive workloads efficiently is introduced. The new LSM indexing structure is termed the LSM RUM-tree, an LSM R-tree with Update Memo. The key insights are to reduce the maintenance cost of the LSM R-tree by leveraging an additional in-memory memo structure to control the size of the memo to fit in memory. In the experiments, the LSM RUM-tree achieves up to 9.6x speedup on update operations and up to 2400x speedup on query operations.</p><p dir="ltr">Second, the focus is to offer several significant advancements in the context of the LSM RUM-tree. We provide an extended examination of LSM-aware Update Memo (UM) cleaning strategies, elucidating how effectively each strategy reduces UM size and contributes to performance enhancements. Moreover, in recognition of the imperative need to facilitate concurrent activities within the LSM RUM-Tree, particularly in multi-threaded/multi-core environments, we introduce a pivotal feature of concurrency control for the update memo. The novel atomic operation known as Compare and If Less than Swap (CILS) is introduced to enable seamless concurrent operations on the Update Memo. Experimental results attest to a notable 4.5x improvement in the speed of concurrent update operations when compared to existing and baseline implementations.</p><p dir="ltr">Finally, we present a novel technique designed to improve query processing performance and optimize storage management in any secondary LSM tree. Our proposed approach introduces a new framework and mechanisms aimed at addressing the specific challenges associated with secondary indexing in the structure of the LSM tree, especially in the context of secondary LSM B+-tree (LSM BUM-tree). Experimental results show that the LSM BUM-tree achieves up to 5.1x speedup on update-intensive workloads and 107x speedup on update and query mixed workloads over existing LSM B+-tree implementations.</p> Data models, storage and indexing Database systems LSM-based index Secondary index Query Processing R-trees B-Tree spatial data processing
15	Hybride Indexstrukturen Kropf, Carsten 10 October 2014 (has links) (PDF) Im Folgenden wird ein Promotionsprojekt zur Implementierung und Optimierung von hybriden Indexstrukturen beschrieben. Die erhöhte Suchperformance wird bei hybriden Indexstrukturen durch einen höheren Aufwand an Vorberechnungen bei Einfügeoperationen erreicht. Dadurch ergibt sich, im Gegensatz zu Ansätzen, welche mehrere Indexstrukturen miteinander verbinden oder getrennte Suchanfragen ausführen eine Effizienz der Reorganisation hybrider Indexstrukturen, die prohibitiv für den Einsatz in den meisten Anwendungen ist. Diese sollen innerhalb des Promotionsprojekts optimiert werden, um eine Einsatzfähigkeit in realistischen Szenarien gewährleisten zu können. Hybride Indexstrukturen Algorithmen Optimierung B-Baum R-Baum Betriebliche Informationssysteme B-Tree R-Tree business information systems algorithm optimization ddc:378 rvk:AL 59872 rvk:AL 59878 rvk:NZ 14420
16	Preferenčné vyhľadávanie založené na viacrozmernom B-strome / Preference Top-k Search Based on Multidimensional B-tree Ondreička, Matúš January 2013 (has links) Title: Preference Top-k Search Based on Multidimensional B-Tree Author: RNDr. Matúš Ondreička Department: Department of Software Engineering Faculty of Mathematics and Physics Charles University in Prague Supervisor: Prof. RNDr. Jaroslav Pokorný, CSc. Author's e-mail address: ondreicka@ksi.mff.cuni.cz Supervisor's e-mail address: pokorny@ksi.mff.cuni.cz Abstract: In this thesis, we focus on the top-k search according to user pref- erences by using B+ -trees and the multidimensional B-tree (MDB-tree). We use model of user preferences based on fuzzy functions, which enable us to search according to a non-monotone ranking function. We propose model of sorted list based on the B+ -tree, which enables Fagin's algorithms to search for the top-k objects according to a non-monotone ranking function. We apply this model in the Internet environment with data on different remote servers. Furthermore, we designed novel dynamic tree-based data structures, namely, MDB-tree composed of B+ -trees, MDB-tree with lists, MDB-tree with groups of B+ -trees and multiple-ordered MDB-tree. Concurrently, we have developed novel top-k algorithms, namely, the MD algorithm, the MXT algorithm and their variants which are able search for the top-k objects ac- cording to a non-monotone ranking function. These top-k algorithms are efficient...
17	Hybride Indexstrukturen Kropf, Carsten 10 October 2014 (has links) Im Folgenden wird ein Promotionsprojekt zur Implementierung und Optimierung von hybriden Indexstrukturen beschrieben. Die erhöhte Suchperformance wird bei hybriden Indexstrukturen durch einen höheren Aufwand an Vorberechnungen bei Einfügeoperationen erreicht. Dadurch ergibt sich, im Gegensatz zu Ansätzen, welche mehrere Indexstrukturen miteinander verbinden oder getrennte Suchanfragen ausführen eine Effizienz der Reorganisation hybrider Indexstrukturen, die prohibitiv für den Einsatz in den meisten Anwendungen ist. Diese sollen innerhalb des Promotionsprojekts optimiert werden, um eine Einsatzfähigkeit in realistischen Szenarien gewährleisten zu können. info:eu-repo/classification/ddc/378 ddc:378
18	Indexování dat pohybujících se objektů / Moving Objects Indexing Křížová, Martina January 2010 (has links) This thesis deals with indexing of spatio-temporal data. It describes existing approaches to indexing data and support for indexing in Oracle Database 11g. The aim of this work is to design structures of databases for storing spatio-temporal data over Oracle Database 11g to propose experiments for these databases. Ways of spatio-temporal data storage are evaluated according to these experiments in terms of time demands of queries and appropriateness of using available indexing structure and spatial operators.
19	Hotlinks and dictionaries Douieb, Karim 29 September 2008 (has links) Knowledge has always been a decisive factor of humankind's social evolutions. Collecting the world's knowledge is one of the greatest challenges of our civilization. Knowledge involves the use of information but information is not knowledge. It is a way of acquiring and understanding information. Improving the visibility and the accessibility of information requires to organize it efficiently. This thesis focuses on this general purpose.<p><p>A fundamental objective of computer science is to store and retrieve information efficiently. This is known as the dictionary problem. A dictionary asks for a data structure which allows essentially the search operation. In general, information that is important and popular at a given time has to be accessed faster than less relevant information. This can be achieved by dynamically managing the data structure periodically such that relevant information is located closer from the search starting point. The second part of this thesis is devoted to the development and the understanding of self-adjusting dictionaries in various models of computation. In particular, we focus our attention on dictionaries which do not have any knowledge of the future accesses. Those dictionaries have to auto-adapt themselves to be competitive with dictionaries specifically tuned for a given access sequence. <p><p>This approach, which transforms the information structure, is not always feasible. Reasons can be that the structure is based on the semantic of the information such as categorization. In this context, the search procedure is linked to the structure itself and modifying the structure will affect how a search is performed. A solution developed to improve search in static structure is the hotlink assignment. It is a way to enhance a structure without altering its original design. This approach speeds up the search by creating shortcuts in the structure. The first part of this thesis is devoted to this approach. / Doctorat en Sciences / info:eu-repo/semantics/nonPublished Informatique générale Sciences exactes et naturelles Information organization Data structures (Computer science) Indexing -- Data processing Hypertext systems Systèmes d'information Organisation de l'information Structures de données (Informatique) Hypertexte Skiplist B-tree Hotlink Assignment Dynamic Optimality Web Self-Adjusting Data Structures Approximation
20	Vyhledávání ve videu / Video Retrieval Černý, Petr January 2012 (has links) This thesis summarizes the information retrieval theory, the relational model basic and focuses on the data indexing in relational database systems. The thesis focuses on multimedia data searching. It includes description of automatic multimedia data content extraction and multimedia data indexing. Practical part discusses design and solution implementation for improving query effectivity for multidimensional vector similarity which describes multimedia data. Thesis final part discusses experiments with this solution.

Search results