Spelling suggestions: "subject:"riskbased"" "subject:"dishbased""
1 |
Shortest Path Queries in Very Large Spatial DatabasesZhang, Ning January 2001 (has links)
Finding the shortest paths in a graph has been studied for a long time, and there are many main memory based algorithms dealing with this problem. Among these, Dijkstra's shortest path algorithm is one of the most commonly used efficient algorithms to the non-negative graphs. Even more efficient algorithms have been developed recently for graphs with particular properties such as the weights of edges fall into a range of integer. All of the mentioned algorithms require the graph totally reside in the main memory. Howevery, for very large graphs, such as the digital maps managed by Geographic Information Systems (GIS), the requirement cannot be satisfied in most cases, so the algorithms mentioned above are not appropriate. My objective in this thesis is to design and evaluate the performance of external memory (disk-based) shortest path algorithms and data structures to solve the shortest path problem in very large digital maps. In particular the following questions are studied:What have other researchers done on the shortest path queries in very large digital maps?What could be improved on the previous works? How efficient are our new shortest paths algorithms on the digital maps, and what factors affect the efficiency? What can be done based on the algorithm? In this thesis, we give a disk-based Dijkstra's-like algorithm to answer shortest path queries based on pre-processing information. Experiments based on our Java implementation are given to show what factors affect the running time of our algorithms.
|
2 |
Efficient Disk-Based Techniques for Manipulating Very Large String DatabasesAllam, Amin 18 May 2017 (has links)
Indexing and processing strings are very important topics in database management. Strings can be database records, DNA sequences, protein sequences, or plain text. Various string operations are required for several application categories, such as bioinformatics and entity resolution. When the string count or sizes become very large, several state-of-the-art techniques for indexing and processing such strings may fail or behave very inefficiently. Modifying an existing technique to overcome these issues is not usually straightforward or even possible.
A category of string operations can be facilitated by the suffix tree data structure, which basically indexes a long string to enable efficient finding of any substring of the indexed string, and can be used in other operations as well, such as approximate string matching. In this document, we introduce a novel efficient method to construct the suffix tree index for very long strings using parallel architectures, which is a major challenge in this category.
Another category of string operations require clustering similar strings in order to perform application-specific processing on the resulting possibly-overlapping clusters. In this document, based on clustering similar strings, we introduce a novel efficient technique for record linkage and entity resolution, and a novel method for correcting errors in a large number of small strings (read sequences) generated by the DNA sequencing machines.
|
3 |
Code optimization of speckle reduction algorithms for image processing of rocket motor hologramsKaeser, Dana S. 12 1900 (has links)
Approved for public release; distribution is unlimited / This thesis supplements and updates previous research completed in the digital analysis
of rocket motor combustion chamber holographic images. In particular this thesis deals with
the software code optimization of existing automatic data retrieval algorithms that are used to
extract useful particle information from the holograms using a microcomputer-based imaging
system. Two forms of optimization were accomplished, the application of an optimizing
FORTRAN compiler to the existing FORTRAN programs and the complete rewrite of the
programs in the C language using an optimizing compiler. The overall results achieved were
a reduction in executable program size and a significant decrease in program execution speed. / http://archive.org/details/codeoptimization00kaes / Lieutenant Commander, United States Navy
|
4 |
A Scalable Partial-Order Data Structure for Distributed-System ObservationWard, Paul January 2001 (has links)
Distributed-system observation is foundational to understanding and controlling distributed computations. Existing tools for distributed-system observation are constrained in the size of computation that they can observe by three fundamental problems. They lack scalable information collection, scalable data-structures for storing and querying the information collected, and scalable information-abstraction schemes. This dissertation addresses the second of these problems. Two core problems were identified in providing a scalable data structure. First, in spite of the existence of several distributed-system-observation tools, the requirements of such a structure were not well-defined. Rather, current tools appear to be built on the basis of events as the core data structure. Events were assigned logical timestamps, typically Fidge/Mattern, as needed to capture causality. Algorithms then took advantage of additional properties of these timestamps that are not explicit in the formal semantics. This dissertation defines the data-structure interface precisely, and goes some way toward reworking algorithms in terms of that interface. The second problem is providing an efficient, scalable implementation for the defined data structure. The key issue in solving this is to provide a scalable precedence-test operation. Current tools use the Fidge/Mattern timestamp for this. While this provides a constant-time test, it requires space per event equal to the number of processes. As the number of processes increases, the space consumption becomes sufficient to affect the precedence-test time because of caching effects. It also becomes problematic when the timestamps need to be copied between processes or written to a file. Worse, existing theory suggested that the space-consumption requirement of Fidge/Mattern timestamps was optimal. In this dissertation we present two alternate timestamp algorithms that require substantially less space than does the Fidge/Mattern algorithm.
|
5 |
A Scalable Partial-Order Data Structure for Distributed-System ObservationWard, Paul January 2001 (has links)
Distributed-system observation is foundational to understanding and controlling distributed computations. Existing tools for distributed-system observation are constrained in the size of computation that they can observe by three fundamental problems. They lack scalable information collection, scalable data-structures for storing and querying the information collected, and scalable information-abstraction schemes. This dissertation addresses the second of these problems. Two core problems were identified in providing a scalable data structure. First, in spite of the existence of several distributed-system-observation tools, the requirements of such a structure were not well-defined. Rather, current tools appear to be built on the basis of events as the core data structure. Events were assigned logical timestamps, typically Fidge/Mattern, as needed to capture causality. Algorithms then took advantage of additional properties of these timestamps that are not explicit in the formal semantics. This dissertation defines the data-structure interface precisely, and goes some way toward reworking algorithms in terms of that interface. The second problem is providing an efficient, scalable implementation for the defined data structure. The key issue in solving this is to provide a scalable precedence-test operation. Current tools use the Fidge/Mattern timestamp for this. While this provides a constant-time test, it requires space per event equal to the number of processes. As the number of processes increases, the space consumption becomes sufficient to affect the precedence-test time because of caching effects. It also becomes problematic when the timestamps need to be copied between processes or written to a file. Worse, existing theory suggested that the space-consumption requirement of Fidge/Mattern timestamps was optimal. In this dissertation we present two alternate timestamp algorithms that require substantially less space than does the Fidge/Mattern algorithm.
|
6 |
Shortest Path Queries in Very Large Spatial DatabasesZhang, Ning January 2001 (has links)
Finding the shortest paths in a graph has been studied for a long time, and there are many main memory based algorithms dealing with this problem. Among these, Dijkstra's shortest path algorithm is one of the most commonly used efficient algorithms to the non-negative graphs. Even more efficient algorithms have been developed recently for graphs with particular properties such as the weights of edges fall into a range of integer. All of the mentioned algorithms require the graph totally reside in the main memory. Howevery, for very large graphs, such as the digital maps managed by Geographic Information Systems (GIS), the requirement cannot be satisfied in most cases, so the algorithms mentioned above are not appropriate. My objective in this thesis is to design and evaluate the performance of external memory (disk-based) shortest path algorithms and data structures to solve the shortest path problem in very large digital maps. In particular the following questions are studied:What have other researchers done on the shortest path queries in very large digital maps?What could be improved on the previous works? How efficient are our new shortest paths algorithms on the digital maps, and what factors affect the efficiency? What can be done based on the algorithm? In this thesis, we give a disk-based Dijkstra's-like algorithm to answer shortest path queries based on pre-processing information. Experiments based on our Java implementation are given to show what factors affect the running time of our algorithms.
|
7 |
Characterizing applications by integrating andimproving tools for data locality analysis and programperformanceSingh, Saurabh 21 September 2017 (has links)
No description available.
|
8 |
Search-Optimized Disk Layouts For Suffix-Tree Genomic IndexesBhavsar, Rajul D 08 1900 (has links) (PDF)
Over the last decade, biological sequence repositories have been growing at an exponential rate. Sophisticated indexing techniques are required to facilitate efficient searching through these humongous genetic repositories. A particularly attractive index structure for such sequence processing is the classical suffix-tree, a vertically compressed trie structure built over the set of all suffixes of a sequence. Its attractiveness stems from its linearity properties -- suffix-tree construction times are linear in the size of the indexed sequences, while search times are linear in the size of the query strings.
In practice, however, the promise of suffix-trees is not realized for extremely long sequences, such as the human genome, that run into the billions of characters. This is because suffix-trees, which are typically an order of magnitude larger than the indexed sequence, necessarily have to be disk-resident for such elongated sequences, and their traditional construction and traversal algorithms result in random disk accesses. We investigate, in this thesis, post-construction techniques for disk-based suffix-tree storage optimization, with the objective of maximizing disk-reference locality during query processing. We begin by focusing on the layout reorganization in which the node-to-block assignments and sequence of blocks are reworked. Our proposed algorithm is based on combining the breadth-first layout approach advocated in the recent literature with probabilistic techniques for minimizing the physical distance between successive block accesses, based on an analysis of node traversal patterns. In our next step, we consider techniques for reducing the space overheads incurred by suffix-trees. In particular, we propose an embedding strategy whereby leaf nodes can be completely represented within their parent internal nodes, without requiring any space extension of the parent node's structure.
To quantitatively evaluate the benefits of our reorganized and restructured layouts, we have conducted extensive experiments on complete human genome sequences, with complex and computationally expensive user queries that involve finding the maximal common substring matches of the query strings.
We show, for the first time, that the layout reorganization approach can be scaled to entire genomes, including the human genome. In the layout reorganization, with careful choice of node-to-block assignment condition and optimized sequence of blocks, search-time improvements ranging from 25% to 75% can be achieved with respect to the construction layouts on such genomes. While the layout reorganization does take considerable time, it is a one-time process whereas searches will be repeatedly invoked on this index. The internalization of leaf nodes results in a 25% reduction in the suffix-tree space occupancy. More importantly, when applied to the construction layout, it provides search-time improvements ranging from 25% to 85%, and in conjunction with the reorganized layout, searches are speeded up by 50% to 90%. Overall, our study and experimental results indicate that through careful choice of node implementations and layouts, the disk access locality of suffix-trees can be improved to the extent that upto an order-of-magnitude improvements in search-times may result relative to the classical implementations.
|
Page generated in 0.0354 seconds