1 |
Compaction Strategies in Apache Cassandra : Analysis of Default Cassandra stress modelRavu, Venkata Sathya Sita J S January 2016 (has links)
Context. The present trend in a large variety of applications are ranging from the web and social networking to telecommunications, is to gather and process very large and fast growing amounts of information leading to a common set of problems known collectively as “Big Data”. The ability to process large scale data analytics over large number of data sets in the last decade proved to be a competitive advantage in a wide range of industries like retail, telecom and defense etc. In response to this trend, the research community and the IT industry have proposed a number of platforms to facilitate large scale data analytics. Such platforms include a new class of databases, often refer to as NoSQL data stores. Apache Cassandra is a type of NoSQL data store. This research is focused on analyzing the performance of different compaction strategies in different use cases for default Cassandra stress model. Objectives. The performance of compaction strategies are observed in various scenarios on the basis of three use cases, Write heavy- 90/10, Read heavy- 10/90 and Balanced- 50/50. For a default Cassandra stress model, so as to finally provide the necessary events and specifications that suggest when to switch from one compaction strategy to another. Methods. Cassandra single node network is deployed on a web server and its behavior of read and write performance with different compaction strategies is studied with read heavy, write heavy and balanced workloads. Its performance metrics are collected and analyzed. Results. Performance metrics of different compaction strategies are evaluated and analyzed. Conclusions. With a detailed analysis and logical comparison, we finally conclude that Level Tiered Compaction Strategy performs better for a read heavy (10/90) workload while using default Cassandra stress model , as compared to size tiered compaction and date tiered compaction strategies. And for Balanced Date tiered compaction strategy performs better than size tiered compaction strategy and date tiered compaction strategy.
|
2 |
Performance Tuning of Big Data Platform : Cassandra Case StudySathvik, Katam January 2016 (has links)
Usage of cloud-based storage systems gained a lot of prominence in fast few years. Every day millions of files are uploaded and downloaded from cloud storage. This data that cannot be handled by traditional databases and this is considered to be Big Data. New powerful platforms have been developed to store and organize big and unstructured data. These platforms are called Big Data systems. Some of the most popular big data platform are Mongo, Hadoop, and Cassandra. In this, we used Cassandra database management system because it is an open source platform that is developed in java. Cassandra has a masterless ring architecture. The data is replicated among all the nodes for fault tolerance. Unlike MySQL, Cassandra uses per-column basis technique to store data. Cassandra is a NoSQL database system, which can handle unstructured data. Most of Cassandra parameters are scalable and are easy to configure. Amazon provides cloud computing platform that helps a user to perform heavy computing tasks over remote hardware systems. This cloud computing platform is known as Amazon Web Services. AWS services also include database deployment and network management services, that have a non-complex user experience. In this document, a detailed explanation on Cassandra database deployment on AWS platform is explained followed by Cassandra performance tuning. In this study impact on read and write performance with change Cassandra parameters when deployed on Elastic Cloud Computing platform are investigated. The performance evaluation of a three node Cassandra cluster is done. With the knowledge of configuration parameters a three node, Cassandra database is performance tuned and a draft model is proposed. A cloud environment suitable for the experiment is created on AWS. A three node Cassandra database management system is deployed in cloud environment created. The performance of this three node architecture is evaluated and is tested with different configuration parameters. The configuration parameters are selected based on the Cassandra metrics behavior with the change in parameters. Selected parameters are changed and the performance difference is observed and analyzed. Using this analysis, a draft model is developed after performance tuning selected parameters. This draft model is tested with different workloads and compared with default Cassandra model. The change in the key cache memory and memTable parameters showed improvement in performance metrics. With increases of key cache size and save time period, read performance improved. This also showed effect on system metrics like increasing CPU load and disk through put, decreasing operation time and The change in memTable parameters showed the effect on write performance and disk space utilization. With increase in threshold value of memTable flush writer, disk through put increased and operation time decreased. The draft derived from performance evaluation has better write and read performance.
|
3 |
Open data locale : acteurs, pratiques et dispositifs / Local open data : stakeholders, practices and toolsRahme, Patricia 16 September 2016 (has links)
Cette étude sur les projets open data dans les collectivités locales vise à associer, dans une approche globale, les modes de production et de collecte de données, les méthodes de diffusion et les dispositifs utilisés, et enfin les modèles d’usage des données publiques numériques. Partant de l’hypothèse que l’open data donne lieu à une nouvelle distribution des rôles des acteurs impliqués tout au long du processus d’ouverture, nous avons distingué un positionnement dynamique des acteurs qui ne se limite pas à une étape du processus d’ouverture des données - que ce soit de réflexion, production, de diffusion, de promotion ou d’usage - ce qui crée un système mixte d’open data. L’étude du système d’offre des collectivités nous a amené à considérer son adéquation avec les besoins des utilisateurs. Nous avons montré que malgré les efforts de créer un système multi-acteur, la mise en oeuvre de l’identification et de la collecte des données est éclectique et instable et ne construit pas un processus solide de l’Open data. Malgré l’urgence suggérée par le contexte politique et sociétal, l’ouverture et le choix des données sont généralement déterminés par des considérations pragmatiques qui imposent une ouverture lente mais progressive sans prétendre à l’exhaustivité.Il s’agit aussi d’examiner les plateformes open data considérées comme des « dispositifs socio-techniques » émergeants du web social dans leurs aspects de médiations, partage, collaboration et de coproduction de données. Dans ce cadre, nos observations nous permettent de mettre en exergue, de questionner et d’analyser les plateformes open data dont leur architecture principale nous amènent à les qualifier «dispositifs socio-techniques ». Ils sont des outils sociaux car ils donnent lieu à des formes d’action. Autour d’un catalogue de jeux de données, il subsiste des espaces virtuels, des blogs, des forums d’interactions entre les producteurs et les réutilisateurs des données ouvertes. / The main purpose of this survey is to explore the process of implementation of open data projects at the local level. The main challenge in realization of any open data project is the engagement of different stakeholders. In this respect, we distinguish a dynamic positioning of stakeholders not limited to one stage of the process - whether during the first phase of reflections before launching the project, the data production, dissemination, promotion or use - creating a mixed system of open data. The study of data offer on open data platforms led us to consider its appropriateness to the needs of users. We have shown that despite the efforts to create a multi-player system, the implementation, identification and data collection is eclectic and unstable and not building a solid process of Open Data. Despite the urgency suggested by the political and societal context, openness and choice of data are generally determined by pragmatic considerations that impose a slow but gradual opening without claiming to be exhaustive. By analyzing open data platforms, this leads us to qualify them as "socio-technical tools" emerging from the social web in their aspects of mediation, sharing, collaboration and co-production data. They are social tools because they give rise to forms of action. Around a catalog of data sets, there are still virtual spaces, blogs, forums of interaction between producers and re-users of open data.
|
4 |
Exploring the Prerequisites to Increase Real Estate Market Transparency in Sweden / En utforskning av förutsättningarna för att öka transparensen på Sveriges fastighetsmarknadDanmo, Emil, Kihlström, Fredrik January 2019 (has links)
In the 2018 edition of the JLL Global Real Estate Transparency Index (GRETI), Sweden was ranked the 10th most transparent real estate market in the world, categorized as ‘Highly Transparent’. For the most part, Sweden has held a similar position since the measurements started in 1999. Transparency on a real estate market generally attracts foreign real estate investments and tenants as well as increasing global competitiveness. It also streamlines work processes in many real estate professions through comprehensive real estate market information and comprehensible legal frameworks, transaction processes and methods of monitoring different sustainability metrics. This study explores the prerequisites for Sweden to attain a better position in the index by increasing its degree of real estate market transparency, with the long-term goal in having Sweden reaping more of the benefits in having a highly transparent real estate market. This is done in two ways. First is through a critical analysis of the index’s methodology for assessing if ranks and scores within the different index categories are produced fairly. Secondly, different industry actors are interviewed to identify in what areas Sweden lags behind compared to more transparent markets, in which way they would like to see transparency improved in Sweden and the main barriers in implementing projects that would increase real estate market transparency and ways of overcoming them. An examination of the index methodology shows a changing methodology from year to year, which indicates a steady increase in real estate market transparency in Sweden. Interview findings support a generally positive view on transparency, facilitating decision making for real estate investments, but the level of preferred transparency differs between net sellers and buyers. It is therefore questionable if increasing real estate market transparency would provide significantly increased utility for some market actors with longer investment horizons and market knowledge through extensive business networks. Main suggestions for improving real estate transparency in Sweden include measures for data standards, increasing the level of data disclosure and information platforms for such standardized, disclosed data. The study suggests that the main barriers for implementing this could be conceptualized as a Prisoners’ dilemma and that institutional bodies could act as trustworthy partners in further opening up real estate market information. / I 2018 års upplaga av rapporten JLL Global Real Estate Transparency Index (GRETI) rankades Sverige som den tionde mest transparenta fastighetsmarknaden i världen, kategoriserat som ‘Mycket Transparent’. Sverige har mestadels hållit en liknande position i rankingen sedan mätningarna startade år 1999. Generellt så medför transparens på ett lands fastighetsmarknad en ökad attraktion för investeringar och hyresgäster såväl som en ökad global konkurrenskraft. Det effektiviserar även arbetsprocesser i många yrken i fastighetsbranschen genom omfattande fastighetsmarknadsinformation och överskådliga legala ramverk, transaktionsprocesser och metoder för att utvärdera olika nyckeltal kopplat till hållbarhet. Denna studie undersöker förutsättningarna för Sverige för att kunna uppnå en bättre position i indexet genom att öka transparensen på landets fastighetsmarknad, med det långsiktiga målet att få Sverige att kunna åtnjuta fördelarna av att ha en mycket transparent fastighetsmarknad. Detta är genomfört på två sätt. Det första är genom en kritisk analys av indexets metodik för att utvärdera om rankingar och poängsättningen inom de olika indexkategorierna har producerats på ett rättvist tillvägagångssätt. Det andra är genom intervjustudier med olika branschaktörer för att identifiera de områden där Sverige släpar efter i förhållande till andra mer transparenta marknader och på vilket sätt de skulle vilja se att transparensen förbättras i Sverige samt de huvudsakliga hindren mot att kunna implementera projekt som skulle kunna öka transparensen på Sveriges fastighetsmarknad och sätt att överkomma dessa hinder. En undersökning av indexmetodiken visar på en ändrad metodik från år till år, som indikerar en stabilt ökande grad av transparens på Sveriges fastighetsmarknad. Intervjuresultaten stödjer en generell positiv syn på transparens som ett sätt att underlätta beslutsfattande för fastighetsinvesteringar, men nivån av föredragen transparens skiljer sig åt mellan nettoköpare och nettosäljare. Det ifrågasätts därför om en ökad transparens på Sveriges fastighetsmarknad skulle bidra med en signifikant ökad nytta för vissa branschaktörer med längre investeringshorisonter samt marknadskännedom genom sina stora branschnätverk. Huvudsakliga förbättringspunkter i termer av att öka transparensen på Sveriges fastighets-marknad inkluderar åtgärder för datastandarder, en ökad nivå av datadelning samt informationsplattformar för sådan standardiserad, delade data. Studien visar på att de huvudsakliga barriärerna för att implementera dessa åtgärder kan konceptualiseras som ett Fångarnas dilemma och att offentliga organ kan agera som pålitliga partners i att vidare öppna upp fastighetsmarknadsinformation.
|
5 |
Publish-Time Data Integration for Open Data PlatformsEberius, Julian, Damme, Patrick, Braunschweig, Katrin, Thiele, Maik, Lehner, Wolfgang 16 September 2022 (has links)
Platforms for publication and collaborative management of data, such as Data.gov or Google Fusion Tables, are a new trend on the web. They manage very large corpora of datasets, but often lack an integrated schema, ontology, or even just common publication standards. This results in inconsistent names for attributes of the same meaning, which constrains the discovery of relationships between datasets as well as their reusability. Existing data integration techniques focus on reuse-time, i.e., they are applied when a user wants to combine a specific set of datasets or integrate them with an existing database. In contrast, this paper investigates a novel method of data integration at publish-time, where the publisher is provided with suggestions on how to integrate the new dataset with the corpus as a whole, without resorting to a manually created mediated schema or ontology for the platform. We propose data-driven algorithms that propose alternative attribute names for a newly published dataset based on attribute- and instance statistics maintained on the corpus. We evaluate the proposed algorithms using real-world corpora based on the Open Data Platform opendata.socrata.com and relational data extracted from Wikipedia. We report on the system's response time, and on the results of an extensive crowdsourcing-based evaluation of the quality of the generated attribute names alternatives.
|
Page generated in 0.0938 seconds