Spelling suggestions: "subject:"cientific databases"" "subject:"cientific atabases""
1 |
A Comparative Study of Dual-tree Algorithms for Computing Spatial Distance HistogramMou, Chengcheng 01 January 2015 (has links)
Particle simulation has become an important research technique in many scientific and engineering fields in latest years. However, these simulations will generate countless data, and database they required would therefore deal with very challenging tasks in terms of data management, storage, and query processing. The two-body correlation function (2-BCFs), a statistical learning measurement to evaluate the datasets, has been mainly utilized to measure the spatial distance histogram (SDH). By using a straightforward method, the process of SDH query takes quadratic time. Recently, a novel algorithm has been proposed to compute the SDH based on the concept of density map (DM), and it reduces the running time to ϴ(N(3/2)) for two-dimensional data and ϴ (N(5/3) ) for three-dimensional data, respectively. In the DM-SDH algorithm, there are two types of DMs that can be plugged in for computation: Quad-tree (Oct-tree for three-dimensional data) and k-d tree data structure. In this thesis paper, by using the geometric method, we prove the unre- solvable ratios on the k-d tree. Further, we analyze and compare the difference in the performance in each potential case generated by these DM-SDH algorithms. Experimental results confirm our analysis and show that the k-d tree structure has better performance in terms of time complexity in all cases. However, our qualitative analysis shows that the Quad-tree (Oct-tree) has an advantage over the k-d tree on aspect of space complexity.
|
2 |
Optimization and Execution of Complex Scientific QueriesFomkin, Ruslan January 2009 (has links)
Large volumes of data produced and shared within scientific communities are analyzed by many researchers to investigate different scientific theories. Currently the analyses are implemented in traditional programming languages such as C++. This is inefficient for research productivity, since it is difficult to write, understand, and modify such programs. Furthermore, programs should scale over large data volumes and analysis complexity, which further complicates code development. This Thesis investigates the use of database technologies to implement scientific applications, in which data are complex objects describing measurements of independent events and the analyses are selections of events by applying conjunctions of complex numerical filters on each object separately. An example of such an application is analyses for the presence of Higgs bosons in collision events produced by the ATLAS experiment. For efficient implementation of such an ATLAS application, a new data stream management system SQISLE is developed. In SQISLE queries are specified over complex objects which are efficiently streamed from sources through the query engine. This streaming approach is compared with the conventional approach to load events into a database before querying. Since the queries implementing scientific analyses are large and complex, novel techniques are developed for efficient query processing. To obtain efficient plans for such queries SQISLE implements runtime query optimization strategies, which during query execution collect runtime statistics for a query, reoptimize the query using the collected statistics, and dynamically switch optimization strategies. The cost-based optimization utilizes a novel cost model for aggregate functions over nested subqueries. To alleviate estimation errors in large queries the fragments are decomposed into conjunctions of subqueries over which runtime statistics are measured. Performance is further improved by query transformation, view materialization, and partial evaluation. ATLAS queries in SQISLE using these query processing techniques perform close to or better than hard-coded C++ implementations of the same analyses. Scientific data are often stored in Grids, which manage both storage and computational resources. This Thesis includes a framework POQSEC that utilizes Grid resources to scale scientific queries over large data volumes by parallelizing the queries and shipping the data management system itself, e.g. SQISLE, to Grid computational nodes for the parallel query execution.
|
3 |
SLACID - Sparse Linear Algebra in a Column-Oriented In-Memory Database SystemKernert, David, Köhler, Frank, Lehner, Wolfgang 19 September 2022 (has links)
Scientific computations and analytical business applications are often based on linear algebra operations on large, sparse matrices. With the hardware shift of the primary storage from disc into memory it is now feasible to execute linear algebra queries directly in the database engine. This paper presents and compares different approaches of storing sparse matrices in an in-memory column-oriented database system. We show that a system layout derived from the compressed sparse row representation integrates well with a columnar database design and that the resulting architecture is moreover amenable to a wide range of non-numerical use cases when dictionary encoding is used. Dynamic matrix manipulation operations, like online insertion or deletion of elements, are not covered by most linear algebra frameworks. Therefore, we present a hybrid architecture that consists of a read-optimized main and a write-optimized delta structure and evaluate the performance for dynamic sparse matrix workloads by applying workflows of nuclear science and network graphs.
|
4 |
Supporting Advanced Queries on Scientific Array DataEbenstein, Roee A. 18 December 2018 (has links)
No description available.
|
Page generated in 0.0777 seconds