Return to search

Storage and Transformation for Data Analysis Using NoSQL / Lagring och transformation för dataanalys med hjälp av NoSQL

It can be difficult to choose the right NoSQL DBMS, and some systems lack sufficient research and evaluation. There are also tools for moving and transforming data between DBMS' in order to combine or use different systems for different use cases. We have described a use case, based on requirements related to the quality attributes Consistency, Scalability, and Performance. For the Performance attribute, focus is fast insertions and full-text search queries on a large dataset of forum posts. The evaluation was performed on two NoSQL DBMS' and two tools for transforming data between them. The DBMS' are MongoDB and Elasticsearch, and the transformation tools are NotaQL and Compose's Transporter. The purpose is to evaluate three different NoSQL systems, pure MongoDB, pure Elasticsearch and a combination of the two. The results show that MongoDB is faster when performing simple full-text search queries, but otherwise slower. This means that Elasticsearch is the primary choice regarding insertion and complex full-text search query performance. MongoDB is however regarded as a more stable and well-tested system. When it comes to scalability, MongoDB is better suited for a system where the dataset increases over time due to its simple addition of more shards. While Elasticsearch is better for a system which starts off with a large amount of data since it has faster insertion speeds and a more effective process for data distribution among existing shards. In general NotaQL is not as fast as Transporter, but can handle aggregations and nested fields which Transporter does not support. A combined system using MongoDB as primary data store and Elasticsearch as secondary data store could be used to achieve fast full-text search queries for all types of expressions, simple and complex.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-142004
Date January 2017
CreatorsNilsson, Christoffer, Bengtson, John
PublisherLinköpings universitet, Institutionen för datavetenskap, Linköpings universitet, Institutionen för datavetenskap
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0019 seconds