Global ETD Search

1	DCMS: A Data Analytics and Management System for Molecular Simulation Berrada, Meryem 16 March 2015 (has links) Despite the fact that Molecular Simulation systems represent a major research tool in multiple scientific and engineering fields, there is still a lack of systems for effective data management and fast data retrieval and processing. This is mainly due to the nature of MS which generate a very large amount of data - a system usually encompass millions of data information, and one query usually runs for tens of thousands of time frames. For this purpose, we designed and developed a new application, DCMS (A data Analytics and Management System for molecular Simulation), that intends to speed up the process of new discovery in the medical/physics fields. DCMS stores simulation data in a database; and provides users with a user-friendly interface to upload, retrieve, query, and analyze MS data without having to deal with any raw data. In addition, we also created a new indexing scheme, the Time-Parameterized Spatial (TPS) tree, to accelerate query processing through indexes that take advantage of the locality relationships between atoms. The tree was implemented directly inside the PostgreSQL kernel, on top of the SP-GiST platform. Along with this new tree, two new data types were also defined, as well as new algorithms for five data points' retrieval queries. Scientific database Molecular Dynamics Big Data Quadtree SP-GIST Computer Engineering Computer Sciences
2	MsSpark: Implementation of Molecular Simulation Queries Using Apache Spark Kaur, Parneet 24 June 2016 (has links) Huge amount of data is being generated in almost every field and it cannot be avoided, rather is essential for the advancement of the field. Analysis of this data requires intensive computing power. Molecular Simulation is a powerful tool for understanding the behavior of natural systems. The simulation generates large amount data while observing the spatial and temporal relationships. The challenge is to handle the analytical queries that are often compute intensive. Although various tools exist to tackle this problem, but in this paper we have tried an alternate approach that uses Apache Spark- a modern big data platform – to parallelize the computation of analytical queries. MsSpark consists of three layers: Apache Spark layer, MS RDD layer and MS Query Processing layer. MS RDD layers supports data that is specific to Molecular Simulation. MS Query Processing layer provides functionality of executing analytical queries. Caching is used to improve the performance. The system can be further extended to cover more analytical queries. Scientific Database Molecular Dynamics MapReduce Primary Queries Analytical Queries Computer Sciences

Search results

DCMS: A Data Analytics and Management System for Molecular Simulation

MsSpark: Implementation of Molecular Simulation Queries Using Apache Spark