1 |
Multicore Scalability Through Asynchronous WorkMathew, Ajit 13 January 2020 (has links)
With the end of Moore's Law, computer architects have turned to multicore architecture to provide high performance. Unfortunately, to achieve higher performance, multicores require programs to be parallelized which is an untamed problem. Amdahl's law tells that the maximum theoretical speedup of a program is dictated by the size of the non-parallelizable section of a program. Hence to achieve higher performance, programmers need to reduce the size of sequential code in the program. This thesis explores asynchronous work as a means to reduce sequential portions of program. Using asynchronous work, a programmer can remove tasks which do not affect data consistency from the critical path and can be performed using background thread. Using this idea, the thesis introduces two systems. First, a synchronization mechanism, Multi-Version Read-Log-Update(MV-RLU), which extends Read-Log-Update (RLU) through multi-versioning. At the core of MV-RLU design is a concurrent garbage collection algorithm which reclaims obsolete versions asynchronously reducing blocking of threads. Second, a concurrent and highly scalable index-structure called Hydralist for multi-core. The key idea behind design of Hydralist is that an index-structure can be divided into two component (search layer and data layer) and updates to data layer can be done synchronously while updates to search layer can be propagated asynchronously using background threads. / Master of Science / Up until mid-2000s, Moore's law predicted that performance CPU doubled every two years. This is because improvement in transistor technology allowed smaller transistor which can switch at higher frequency leading to faster CPU clocks. But faster clock leads to higher heat dissipation and as chips reached their thermal limits, computer architects could no longer increase clock speeds. Hence they moved to multicore architecture, wherein a single die contains multiple CPUs, to allow higher performance. Now programmers are required to parallelize their code to take advangtage of all the CPUs in a chip which is a non trivial problem. The theoretical speedup achieved by a program on multicore architecture is dictated by Amdahl's law which describes the non parallelizable code in a program as the limiting factor for speedup. For example, a program with 99% parallelizable code can achieve speedup of 20 whereas a program with 50% parallelizable code can only achieve speedup of 2. Therefore to achieve high speedup, programmers need to reduce size of serial section in their program. One way to reduce sequential section in a program is to remove non-critical task from the sequential section and perform the tasks asynchronously using background thread. This thesis explores this technique in two systems. First, a synchronization mechanism which is used co-ordinate access to shared resource called Multi-Version Read-Log-Update (MV-RLU). MV-RLU achieves high performance by removing garbage collection from critical path and performing it asynchronously using background thread. Second, an index structure, Hydralist, which based on the insight that an index structure can be decomposed into two components, search layer and data layer, and decouples updates to both the layer which allows higher performance. Updates to search layer is done synchronously while updates to data layer is done asynchronously using background threads. Evaluation shows that both the systems perform better than state-of-the-art competitors in a variety of workloads.
|
2 |
Femtosecond laser irradiation of Poly (methyl methacrylate) for refractive index modification and photochemical analysisTaranu, Anca January 2013 (has links)
This thesis explores a new technique for investigating the photochemical mechanisms of femtosecond laser inscription of permanent photonic structures in Poly(methylmethacrylate) (PMMA). The refractive index (RI) structures were fabricated with a direct writing method without ablation, and analysed using a non-invasive method - namely: Raman mapping spectrometry. The writing conditions for the photonic structures under investigation are mainly represented by 800nm and 400nm wavelength with 44fs and 100fs pulse length and a low repetition rate in the kHz domain. The mass percentage of the induced monomer and end groups modification (MMA) as a measure of the modification of the ratio of C=C and C=O Raman transition varies linearly with the total fluence (total). The mass percentage of the induced monomer and end groups change is defined by the modification of normalised ratio of the Raman intensity of C=C bond (I(C=C)) and the Raman intensity of C=O bond (I(C=O)) which is denoted by I(C=C/C=O)n. The modification of this ratio is denoted by I (C=C/C=O)n and also by MMA. MMA varies linearly with total with a positive slope for both writing conditions due to the induced main chain scission and unzipping. If total increases by 1J/cm2, it is predicted an increase in MMA, by (1.550±0.11)x10-2 (cntsxcm2)/J, for the near infrared (NIR) irradiated samples that is higher than the increase of MMA for the ultraviolet (UV) irradiated sample that show a value of (1.9200.274)x10-3 ( (cntsxcm2)/J). The same trend was found for the variation of MMA with diffraction efficiency () for NIR irradiated structures and also for UV irradiated structures. If increases by 1cnt, it is predicted that there will be an increase in MMA, by (4.233±0.383) cnts for NIR irradiated samples that is lower than the increase of MMA for the UV irradiated sample that shows a value of (14.3922.477) cnts. The variation of MMA with is higher for UV irradiated samples than for NIR irradiated samples, and this indicates that the nonlinear absorption of two photons produces a larger percentage of the monomer and end groups than the nonlinear absorption of three photons. Gel Permeation Chromatography (GPC), which is a destructive analytical method, was applied only for the investigation of the time dependent behaviour of the molecular weight of the photonics structures which were written with the parallel writing technique using 775nm wavelength and 160fs pulse length that shows an increase of 66 in after seven days from the laser irradiation. Twenty-four hours after laser irradiation, the GPC results show that the weighted average molecular weight (Mw) of the exposed sample of 28,610,000 Daltons is about thirty times higher than the MW of the unexposed sample of 963,425 Daltons. This is an indication of the photo-cross-linking reaction. As a result of this reaction, the polymer chains link together through intermolecular forces to form a 3D network which produces an increase of molecular weight. It was also observed that there was a further decrease of molecular weight after three days to 437,441 Daltons due to main chain scission and unzipping. The main chain scission is actually the breaking of C-C bonds between structural units and the formation of radicals which further produce the monomer and end groups (MMA) through the unzipping reaction which leads to a decrease of the molecular weight. The main chain scission occurred with the greatest efficiency after three days following the end of irradiation, when the number of the main chain scissions (Ns) reached the maximum value of 1.193. An increase of molecular weight signifies an increase of the refractive index since the optical density has increased. The mechanical properties of PMMA optical fibres (e.g., Young's modulus) and of bulk PMMA (e.g., glass transition temperature) were investigated using Dynamical Mechanical Analysis (DMA) tests (e.g., stress-strain test and temperature ramp/frequency sweep test). These measurements were performed to study the effect of the manufacturing process that involves stretching and heating or cooling on the mechanical properties of PMMA optical fibres and unmodified PMMA material. T he ultimate aim of this section was to see the effect of the laser irradiation on the strain properties of an optical fibre sensor with gratings. The stress strain results show an increase of Young's modulus of the PMMA optical fibre of 5%, and this is an indication of decreased elasticity which is induced during the fabrication process. For a femtosecond laser irradiated region with UV wavelength, it is expected that there will be an increase of Young's modulus to 65%. This variation was obtained inthe research group from The Photon Science Institute by measuring Young's modulus for a diffraction grating which was written in PMMA with 180fs pulse length and 387nm wavelength and which was subjected to a strain. The elasticity was measured using the displacement of the first order diffracted beams as a result of a modification due to the applied strain [ ]. The temperature ramp/frequency sweep test shows an increase of glass transition temperature of the bulk PMMA of 54.12% which is also an indication of decreased elasticity induced during the fabrication process. A further increase in this temperature is expected for UV irradiated samples.
|
3 |
Main-Memory Query Processing Utilizing External IndexesTruong, Thanh January 2016 (has links)
Many applications require storage and indexing of new kinds of data in main-memory, e.g. color histograms, textures, shape features, gene sequences, sensor readings, or financial time series. Even though, many domain index structures were developed, very a few of them are implemented in any database management system (DBMS), usually only B-trees and hash indexes. A major reason is that the manual effort to include a new index implementation in a regular DBMS is very costly and time-consuming because it requires integration with all components of the DBMS kernel. To alleviate this, there are some extensible indexing frameworks. However, they all require re-engineering the index implementations, which is a problem when the index has third-party ownership, when only binary code is available, or simply when the index implementation is complex to re-engineer. Therefore, the DBMS should allow including new index implementations without code changes and performance degradation. Furthermore, for high performance the query processor needs knowledge of how to process queries to utilize plugged-in index. Moreover, it is important that all functionalities of a plugged-in index implementation are correct. The extensible main memory database system (MMDB) Mexima (Main-memory External Index Manager) addresses these challenges. It enables transparent plugging in main-memory index implementations without code changes. Index specific rewrite rules transform complex queries to utilize the indexes. Automatic test procedures validate the correctness of them based on user provided index meta-data. Moreover, the same optimization framework can also optimize complex queries sent to a back-end DBMS by exposing hidden indexes for its query optimizer. Altogether, Mexima is a complete and extensible platform for transparently index integration, utilization, and evaluation.
|
4 |
Indexing and Querying Natural Language TextChubak, Pirooz Unknown Date
No description available.
|
5 |
Effective and efficient similarity search in databasesLange, Dustin January 2013 (has links)
Given a large set of records in a database and a query record, similarity search aims to find all records sufficiently similar to the query record. To solve this problem, two main aspects need to be considered: First, to perform effective search, the set of relevant records is defined using a similarity measure. Second, an efficient access method is to be found that performs only few database accesses and comparisons using the similarity measure. This thesis solves both aspects with an emphasis on the latter.
In the first part of this thesis, a frequency-aware similarity measure is introduced. Compared record pairs are partitioned according to frequencies of attribute values. For each partition, a different similarity measure is created: machine learning techniques combine a set of base similarity measures into an overall similarity measure. After that, a similarity index for string attributes is proposed, the State Set Index (SSI), which is based on a trie (prefix tree) that is interpreted as a nondeterministic finite automaton. For processing range queries, the notion of query plans is introduced in this thesis to describe which similarity indexes to access and which thresholds to apply. The query result should be as complete as possible under some cost threshold. Two query planning variants are introduced: (1) Static planning selects a plan at compile time that is used for all queries. (2) Query-specific planning selects a different plan for each query. For answering top-k queries, the Bulk Sorted Access Algorithm (BSA) is introduced, which retrieves large chunks of records from the similarity indexes using fixed thresholds, and which focuses its efforts on records that are ranked high in more than one attribute and thus promising candidates.
The described components form a complete similarity search system. Based on prototypical implementations, this thesis shows comparative evaluation results for all proposed approaches on different real-world data sets, one of which is a large person data set from a German credit rating agency. / Ziel von Ähnlichkeitssuche ist es, in einer Menge von Tupeln in einer Datenbank zu einem gegebenen Anfragetupel all diejenigen Tupel zu finden, die ausreichend ähnlich zum Anfragetupel sind.
Um dieses Problem zu lösen, müssen zwei zentrale Aspekte betrachtet werden: Erstens, um eine effektive Suche durchzuführen, muss die Menge der relevanten Tupel mithilfe eines Ähnlichkeitsmaßes definiert werden. Zweitens muss eine effiziente Zugriffsmethode gefunden werden, die nur wenige Datenbankzugriffe und Vergleiche mithilfe des Ähnlichkeitsmaßes durchführt. Diese Arbeit beschäftigt sich mit beiden Aspekten und legt den Fokus auf Effizienz.
Im ersten Teil dieser Arbeit wird ein häufigkeitsbasiertes Ähnlichkeitsmaß eingeführt. Verglichene Tupelpaare werden entsprechend der Häufigkeiten ihrer Attributwerte partitioniert. Für jede Partition wird ein unterschiedliches Ähnlichkeitsmaß erstellt: Mithilfe von Verfahren des Maschinellen Lernens werden Basisähnlichkeitsmaßes zu einem Gesamtähnlichkeitsmaß verbunden. Danach wird ein Ähnlichkeitsindex für String-Attribute vorgeschlagen, der State Set Index (SSI), welcher auf einem Trie (Präfixbaum) basiert, der als nichtdeterministischer endlicher Automat interpretiert wird. Zur Verarbeitung von Bereichsanfragen wird in dieser Arbeit die Notation der Anfragepläne eingeführt, um zu beschreiben welche Ähnlichkeitsindexe angefragt und welche Schwellwerte dabei verwendet werden sollen. Das Anfrageergebnis sollte dabei so vollständig wie möglich sein und die Kosten sollten einen gegebenen Schwellwert nicht überschreiten. Es werden zwei Verfahren zur Anfrageplanung vorgeschlagen: (1) Beim statischen Planen wird zur Übersetzungszeit ein Plan ausgewählt, der dann für alle Anfragen verwendet wird. (2) Beim anfragespezifischen Planen wird für jede Anfrage ein unterschiedlicher Plan ausgewählt. Zur Beantwortung von Top-k-Anfragen stellt diese Arbeit den Bulk Sorted Access-Algorithmus (BSA) vor, der große Mengen von Tupeln mithilfe fixer Schwellwerte von den Ähnlichkeitsindexen abfragt und der Tupel bevorzugt, die hohe Ähnlichkeitswerte in mehr als einem Attribut haben und damit vielversprechende Kandidaten sind.
Die vorgestellten Komponenten bilden ein vollständiges Ähnlichkeitssuchsystem. Basierend auf einer prototypischen Implementierung zeigt diese Arbeit vergleichende Evaluierungsergebnisse für alle vorgestellten Ansätze auf verschiedenen Realwelt-Datensätzen; einer davon ist ein großer Personendatensatz einer deutschen Wirtschaftsauskunftei.
|
6 |
Efficient Techniques For Relevance Feedback Processing In Content-based Image RetrievalLiu, Danzhou 01 January 2009 (has links)
In content-based image retrieval (CBIR) systems, there are two general types of search: target search and category search. Unlike queries in traditional database systems, users in most cases cannot specify an ideal query to retrieve the desired results for either target search or category search in multimedia database systems, and have to rely on iterative feedback to refine their query. Efficient evaluation of such iterative queries can be a challenge, especially when the multimedia database contains a large number of entries, and the search needs many iterations, and when the underlying distance measure is computationally expensive. The overall processing costs, including CPU and disk I/O, are further emphasized if there are numerous concurrent accesses. To address these limitations involved in relevance feedback processing, we propose a generic framework, including a query model, index structures, and query optimization techniques. Specifically, this thesis has five main contributions as follows. The first contribution is an efficient target search technique. We propose four target search methods: naive random scan (NRS), local neighboring movement (LNM), neighboring divide-and-conquer (NDC), and global divide-and-conquer (GDC) methods. All these methods are built around a common strategy: they do not retrieve checked images (i.e., shrink the search space). Furthermore, NDC and GDC exploit Voronoi diagrams to aggressively prune the search space and move towards target images. We theoretically and experimentally prove that the convergence speeds of GDC and NDC are much faster than those of NRS and recent methods. The second contribution is a method to reduce the number of expensive distance computation when answering k-NN queries with non-metric distance measures. We propose an efficient distance mapping function that transfers non-metric measures into metric, and still preserves the original distance orderings. Then existing metric index structures (e.g., M-tree) can be used to reduce the computational cost by exploiting the triangular inequality property. The third contribution is an incremental query processing technique for Support Vector Machines (SVMs). SVMs have been widely used in multimedia retrieval to learn a concept in order to find the best matches. SVMs, however, suffer from the scalability problem associated with larger database sizes. To address this limitation, we propose an efficient query evaluation technique by employing incremental update. The proposed technique also takes advantage of a tuned index structure to efficiently prune irrelevant data. As a result, only a small portion of the data set needs to be accessed for query processing. This index structure also provides an inexpensive means to process the set of candidates to evaluate the final query result. This technique can work with different kernel functions and kernel parameters. The fourth contribution is a method to avoid local optimum traps. Existing CBIR systems, designed around query refinement based on relevance feedback, suffer from local optimum traps that may severely impair the overall retrieval performance. We therefore propose a simulated annealing-based approach to address this important issue. When a stuck-at-a-local-optimum occurs, we employ a neighborhood search technique (i.e., simulated annealing) to continue the search for additional matching images, thus escaping from the local optimum. We also propose an index structure to speed up such neighborhood search. Finally, the fifth contribution is a generic framework to support concurrent accesses. We develop new storage and query processing techniques to exploit sequential access and leverage inter-query concurrency to share computation. Our experimental results, based on the Corel dataset, indicate that the proposed optimization can significantly reduce average response time while achieving better precision and recall, and is scalable to support a large user community. This latter performance characteristic is largely neglected in existing systems making them less suitable for large-scale deployment. With the growing interest in Internet-scale image search applications, our framework offers an effective solution to the scalability problem.
|
7 |
Efficient Reorganisation of Hybrid Index Structures Supporting Multimedia Search CriteriaKropf, Carsten 11 February 2017 (has links) (PDF)
This thesis describes the development and setup of hybrid index structures. They are access methods for retrieval techniques in hybrid data spaces which are formed by one or more relational or normalised columns in conjunction with one non-relational or non-normalised column. Examples for these hybrid data spaces are, among others, textual data combined with geographical ones or data from enterprise content management systems. However, all non-relational data types may be stored as well as image feature vectors or comparable types.
Hybrid index structures are known to function efficiently regarding retrieval operations. Unfortunately, little information is available about reorganisation operations which insert or update the row tuples. The fundamental research is mainly executed in simulation based environments. This work is written ensuing from a previous thesis that implements hybrid access structures in realistic database surroundings. During this implementation it has become obvious that retrieval works efficiently. Yet, the restructuring approaches require too much effort to be set up, e.g., in web search engine environments where several thousands of documents are inserted or modified every day. These search engines rely on relational database systems as storage backends. Hence, the setup of these access methods for hybrid data spaces is required in real world database management systems.
This thesis tries to apply a systematic approach for the optimisation of the rearrangement algorithms inside realistic scenarios. Thus, a measurement and evaluation scheme is created which is repeatedly deployed to an evolving state and a model of hybrid index structures in order to optimise the regrouping algorithms to make a setup of hybrid index structures in real world information systems possible. Thus, a set of input corpora is selected which is applied to the test suite as well as an evaluation scheme.
To sum up, it can be said that this thesis describes input sets, a test suite including an evaluation scheme as well as optimisation iterations on reorganisation algorithms reflecting a theoretical model framework to provide efficient reorganisations of hybrid index structures supporting multimedia search criteria.
|
8 |
KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern ArchitecturesKissinger, Thomas, Schlegel, Benjamin, Habich, Dirk, Lehner, Wolfgang 30 May 2022 (has links)
Growing main memory capacities and an increasing number of hardware threads in modern server systems led to fundamental changes in database architectures. Most importantly, query processing is nowadays performed on data that is often completely stored in main memory. Despite of a high main memory scan performance, index structures are still important components, but they have to be designed from scratch to cope with the specific characteristics of main memory and to exploit the high degree of parallelism. Current research mainly focused on adapting block-optimized B+-Trees, but these data structures were designed for secondary memory and involve comprehensive structural maintenance for updates.
In this paper, we present the KISS-Tree, a latch-free in-memory index that is optimized for a minimum number of memory accesses and a high number of concurrent updates. More specifically, we aim for the same performance as modern hash-based algorithms but keeping the order-preserving nature of trees. We achieve this by using a prefix tree that incorporates virtual memory management functionality and compression schemes. In our experiments, we evaluate the KISS-Tree on different workloads and hardware platforms and compare the results to existing in-memory indexes. The KISS-Tree offers the highest reported read performance on current architectures, a balanced read/write performance, and has a low memory footprint.
|
9 |
Efficient Reorganisation of Hybrid Index Structures Supporting Multimedia Search CriteriaKropf, Carsten 21 November 2016 (has links)
This thesis describes the development and setup of hybrid index structures. They are access methods for retrieval techniques in hybrid data spaces which are formed by one or more relational or normalised columns in conjunction with one non-relational or non-normalised column. Examples for these hybrid data spaces are, among others, textual data combined with geographical ones or data from enterprise content management systems. However, all non-relational data types may be stored as well as image feature vectors or comparable types.
Hybrid index structures are known to function efficiently regarding retrieval operations. Unfortunately, little information is available about reorganisation operations which insert or update the row tuples. The fundamental research is mainly executed in simulation based environments. This work is written ensuing from a previous thesis that implements hybrid access structures in realistic database surroundings. During this implementation it has become obvious that retrieval works efficiently. Yet, the restructuring approaches require too much effort to be set up, e.g., in web search engine environments where several thousands of documents are inserted or modified every day. These search engines rely on relational database systems as storage backends. Hence, the setup of these access methods for hybrid data spaces is required in real world database management systems.
This thesis tries to apply a systematic approach for the optimisation of the rearrangement algorithms inside realistic scenarios. Thus, a measurement and evaluation scheme is created which is repeatedly deployed to an evolving state and a model of hybrid index structures in order to optimise the regrouping algorithms to make a setup of hybrid index structures in real world information systems possible. Thus, a set of input corpora is selected which is applied to the test suite as well as an evaluation scheme.
To sum up, it can be said that this thesis describes input sets, a test suite including an evaluation scheme as well as optimisation iterations on reorganisation algorithms reflecting a theoretical model framework to provide efficient reorganisations of hybrid index structures supporting multimedia search criteria.
|
10 |
Systém řízení báze dat v operační paměti / In-Memory Database Management SystemPehal, Petr January 2013 (has links)
The focus of this thesis is a proprietary database interface for management tables in memory. At the beginning, there is given a short introduction to the databases. Then the concept of in-memory database systems is presented. Also the main advantages and disadvantages of this solution are discussed. The theoretical introduction is ended by brief overview of existing systems. After that the basic information about energetic management system RIS are presented together with system's in-memory database interface. Further the work aims at the specification and design of required modifications and extensions of the interface. Then the implementation details and tests results are presented. In conclusion the results are summarized and future development is discussed.
|
Page generated in 0.0891 seconds