Spelling suggestions: "subject:"mapreduce"" "subject:"hmapreduce""
51 |
Adequação da computação intensiva em dados para ambientes desktop grid com uso de MapReduce / Adequacy of intensive data computing to desktop grid environment with using of mapreduceAnjos, Julio Cesar Santos dos January 2012 (has links)
O surgimento de volumes de dados na ordem de petabytes cria a necessidade de desenvolver-se novas soluções que viabilizem o tratamento dos dados através do uso de sistemas de computação intensiva, como o MapReduce. O MapReduce é um framework de programação que apresenta duas funções: uma de mapeamento, chamada Map, e outra de redução, chamada Reduce, aplicadas a uma determinada entrada de dados. Este modelo de programação é utilizado geralmente em grandes clusters e suas tarefas Map ou Reduce são normalmente independentes entre si. O programador é abstraído do processo de paralelização como divisão e distribuição de dados, tolerância a falhas, persistência de dados e distribuição de tarefas. A motivação deste trabalho é aplicar o modelo de computação intensiva do MapReduce com grande volume de dados para uso em ambientes desktop grid. O objetivo então é investigar os algoritmos do MapReduce para adequar a computação intensiva aos ambientes heterogêneos. O trabalho endereça o problema da heterogeneidade de recursos, não tratando neste momento a volatilidade das máquinas. Devido às deficiências encontradas no MapReduce em ambientes heterogêneos foi proposto o MR-A++, que é um MapReduce com algoritmos adequados ao ambiente heterogêneo. O modelo do MR-A++ cria uma tarefa de medição para coletar informações, antes de ocorrer a distribuição dos dados. Assim, as informações serão utilizadas para gerenciar o sistema. Para avaliar os algoritmos alterados foi empregada a Análise 2k Fatorial e foram executadas simulações com o simulador MRSG. O simulador MRSG foi construído para o estudo de ambientes (homogêneos e heterogêneos) em larga escala com uso do MapReduce. O pequeno atraso introduzido na fase de setup da computação é compensado com a adequação do ambiente heterogêneo à capacidade computacional das máquinas, com ganhos de redução de tempo de execução dos jobs superiores a 70 % em alguns casos. / The emergence of data volumes in the order of petabytes creates the need to develop new solutions that make possible the processing of data through the use of intensive computing systems, as MapReduce. MapReduce is a programming framework that has two functions: one called Map, mapping, and another reducing called Reduce, applied to a particular data entry. This programming model is used primarily in large clusters and their tasks are normally independent. The programmer is abstracted from the parallelization process such as division and data distribution, fault tolerance, data persistence and distribution of tasks. The motivation of this work is to apply the intensive computation model of MapReduce with large volume of data in desktop grid environments. The goal then is to investigate the intensive computing in heterogeneous environments with use MapReduce model. First the problem of resource heterogeneity is solved, not treating the moment of the volatility. Due to deficiencies of the MapReduce model in heterogeneous environments it was proposed the MR-A++; a MapReduce with algorithms adequated to heterogeneous environments. The MR-A++ model creates a training task to gather information prior to the distribution of data. Therefore the information will be used to manager the system. To evaluate the algorithms change it was employed a 2k Factorial analysis and simulations with the simulant MRSG built for the study of environments (homogeneous and heterogeneous) large-scale use of MapReduce. The small delay introduced in phase of setup of computing compensates with the adequacy of heterogeneous environment to computational capacity of the machines, with gains in the run-time reduction of jobs exceeding 70% in some cases.
|
52 |
Maresia : an approach to deal with the single points of failure of the MapReduce model / Maresi: uma abordagem para lidar com os pontos de falha única do modelo MapReduceMarcos, Pedro de Botelho January 2013 (has links)
Durante os últimos anos, a quantidade de dados gerada pelas aplicações cresceu consideravelmente. No entanto, para tornarem-se relevantes estes dados precisam ser processados. Para atender este objetivo, novos modelos de programação para processamento paralelo e distribuído foram propostos. Um exemplo é o modelo MapReduce, o qual foi proposto pela Google. Este modelo, no entanto, possui pontos de falha única (SPOF), os quais podem comprometer a sua execução. Assim, este trabalho apresenta uma nova arquitetura, inspirada pelo Chord, para lidar com os SPOFs do modelo. A avaliação da proposta foi realizada através de modelagem analítica e de testes experimentais. Os resultados mostram a viabilidade de usar a arquitetura proposta para executar o MapReduce. / During the last years, the amount of data generated by applications grew considerably. To become relevant, however, this data should be processed. With this goal, new programming models for parallel and distributed processing were proposed. An example is the MapReduce model, which was proposed by Google. This model, nevertheless, has Single Points of Failure (SPOF), which can compromise the execution of a job. Thus, this work presents a new architecture, inspired by Chord, to avoid the SPOFs on MapReduce. The evaluation was performed through an analytical model and an experimental setup. The results show the feasibility of using the proposed architecture to execute MapReduce jobs.
|
53 |
A distributed approach to Frequent Itemset Mining at low support levelsClark, Neal 22 December 2014 (has links)
Frequent Itemset Mining, the process of finding frequently co-occurring sets of items in a dataset, has been at the core of the field of data mining for the past 25 years. During this time the datasets have grown much faster than the algorithms capacity to process them. Great progress was made at optimizing this task on a single computer however, despite years of research, very little progress has been made on parallelizing this task. FPGrowth based algorithms have proven notoriously difficult to parallelize and Apriori has largely fallen out of favor with the research community.
In this thesis we introduce a parallel, Apriori based, Frequent Itemset Mining algo-
rithm capable of distributing computation across large commodity clusters. Our case study demonstrates that our algorithm can efficiently scale to hundreds of cores, on a standard Hadoop MapReduce cluster, and can improve executions times by at least an order of magnitude at the lowest support levels. / Graduate / 0984 / 0800 / nclark@uvic.ca
|
54 |
Constructing Secure MapReduce Framework in Cloud-based EnvironmentWang, Yongzhi 27 March 2015 (has links)
MapReduce, a parallel computing paradigm, has been gaining popularity in recent years as cloud vendors offer MapReduce computation services on their public clouds. However, companies are still reluctant to move their computations to the public cloud due to the following reason: In the current business model, the entire MapReduce cluster is deployed on the public cloud. If the public cloud is not properly protected, the integrity and the confidentiality of MapReduce applications can be compromised by attacks inside or outside of the public cloud. From the result integrity’s perspective, if any computation nodes on the public cloud are compromised,thosenodes can return incorrect task results and therefore render the final job result inaccurate. From the algorithmic confidentiality’s perspective, when more and more companies devise innovative algorithms and deploy them to the public cloud, malicious attackers can reverse engineer those programs to detect the algorithmic details and, therefore, compromise the intellectual property of those companies.
In this dissertation, we propose to use the hybrid cloud architecture to defeat the above two threats. Based on the hybrid cloud architecture, we propose separate solutions to address the result integrity and the algorithmic confidentiality problems. To address the result integrity problem, we propose the Integrity Assurance MapReduce (IAMR) framework. IAMR performs the result checking technique to guarantee high result accuracy of MapReduce jobs, even if the computation is executed on an untrusted public cloud. We implemented a prototype system for a real hybrid cloud environment and performed a series of experiments. Our theoretical simulations and experimental results show that IAMR can guarantee a very low job error rate, while maintaining a moderate performance overhead. To address the algorithmic confidentiality problem, we focus on the program control flow and propose the Confidentiality Assurance MapReduce (CAMR) framework. CAMR performs the Runtime Control Flow Obfuscation (RCFO) technique to protect the predicates of MapReduce jobs. We implemented a prototype system for a real hybrid cloud environment. The security analysis and experimental results show that CAMR defeats static analysis-based reverse engineering attacks, raises the bar for the dynamic analysis-based reverse engineering attacks, and incurs a modest performance overhead.
|
55 |
Aplikace pro Big Data / Application for Big DataBlaho, Matúš January 2018 (has links)
This work deals with the description and analysis of the Big Data concept and its processing and use in the process of decision support. Suggested processing is based on the MapReduce concept designed for Big Data processing. The theoretical part of this work is largely about the Hadoop system that implements this concept. Its understanding is a key feature for properly designing applications that run within it. The work also contains design for specific Big Data processing applications. In the implementation part of the thesis is a description of Hadoop system management, description of implementation of MapReduce applications and description of their testing over data sets.
|
56 |
Distributed enumeration of four node graphlets at quadrillion-scaleLiu, Xiaozhou 19 November 2021 (has links)
Graphlet enumeration is a basic task in graph analysis with many applications. Thus it is important to be able to perform this task within a reasonable amount of time. However, this objective is challenging when the input graph is very large, with millions of nodes and edges. Known solutions are limited in terms of scalability. Distributed computing is often proposed as a solution to improve scalability. How- ever, it has to be done carefully to reduce the overhead cost and to really benefit from the distributed solution. We study the enumeration of four-node graphlets in undirected graphs using a distributed platform. We propose an efficient distributed solution which significantly surpasses the existing solutions. With this method we are able to process larger graphs that have never been processed before and enumerate quadrillions of graphlets using a modest cluster of machines. We convincingly show the scalability of our solution through experimental results. / Graduate
|
57 |
Négociation multi-agents pour la réallocation dynamique de tâches et application au patron de conception MapReduce / Multi-agent negotiation for dynamic task reallocation and application to the MapReduce design patternBaert, Quentin 13 September 2019 (has links)
Le problème Rm||Cmax consiste à allouer un ensemble de tâches à m agents de sorte à minimiser le makespan de l’allocation, c’est-à-dire le temps d’exécution de l’ensemble des tâches. Ce problème est connu pour être NP-dur dès que les tâches sont allouées à deux agents ou plus (m ≥ 2). De plus, il est souvent admis que le coût d’une tâche est précisément estimé pour un agent et que ce coût ne varie pas au cours de l’exécution des tâches. Dans cette thèse, je propose une approche décentralisée et dynamique pour l’amélioration d’une allocation de tâches. Ainsi, à partir d’une allocation initiale et pendant qu’ils exécutent les tâches, les agents collaboratifs initient de multiples enchères pour réallouer les tâches qui restent à exécuter. Ces réallocations sont socialement rationnelles, c’est-à-dire qu’un agent accepte de prendre en charge une tâche initialement allouée à un autre agent si la délégation de cette tâche bénéficie à l’ensemble du système en faisant décroître le makespan. De plus, le dynamisme du procédé permet d’améliorer une allocation malgré une fonction de coût peu précise et malgré les variations de performances qui peuvent survenir lors de l’exécution des tâches. Cette thèse offre un cadre formel pour la modélisation et la résolution multi-agents d’un problème de réallocation de tâches situées. Dans un tel problème, la localité des ressources nécessaires à l’exécution d’une tâche influe sur son coût pour chaque agent du système. À partir de ce cadre, je présente le protocole d’interaction des agents et je propose plusieurs stratégies pour que les choix des agents aient le plus d’impact sur le makespan de l’allocation courante. Dans le cadre applicatif de cette thèse, je propose d’utiliser ce processus de réallocation de tâches pour améliorer le patron de conception MapReduce. Très utilisé pour le traitement distribué de données massives, MapReduce possède néanmoins des biais que la réallocation dynamique des tâches peut aider à contrer. J’ai donc implémenté un prototype distribué qui s’inscrit dans le cadre formel et implémente le patron de conception MapReduce. Grâce à ce prototype, je suis en mesure d’évaluer l’apport du processus de réallocation et l’impact des différentes stratégies d’agent. / The Rm||Cmax problem consists in allocating a set of tasks to m agents in order to minimize the makespan of the allocation, i.e. the execution time of all the tasks. This problem is known to be NP-hard as soon as the tasks are allocated to two or more agents (m ≥ 2). In addition, it is often assumed that the cost of a task is accurately estimated for an agent and that this cost does not change during the execution of tasks. In this thesis, I propose a decentralized and dynamic approach to improve the allocation of tasks. Thus, from an initial allocation and while they are executing tasks, collaborative agents initiate multiple auctions to reallocate the remaining tasks to be performed. These reallocations are socially rational, i.e. an agent agrees to take on a task initially allocated to another agent if the delegation of this task benefits to the entire system by decreasing the makespan. In addition, the dynamism of the process makes it possible to improve an allocation despite an inaccurate cost function and despite the variations of performance that can occur during the execution of tasks. This thesis provides a formal framework for multi-agent modeling and multi-agent resolution of a located tasks reallocation problem. In such a problem, the locality of the resources required to perform a task affects its cost for each agent of the system. From this framework, I present the interaction protocol used by the agents and I propose several strategies to ensure that the choices of agents have the greatest impact on the makespan of the current allocation. In the applicative context of this thesis, I propose to use this tasks reallocation process to improve the MapReduce design pattern. Widely used for the distributed processing of massive data, MapReduce has biases that the dynamic tasks reallocation process can help to counter. I implemented a distributed prototype that fits into the formal framework and implements the MapReduce design pattern. Thanks to this prototype, I am able to evaluate the effectiveness of the reallocation process and the impact of the different agent strategies.
|
58 |
ACE: Agile,Contingent and Efficient Similarity Joins Using MapReduceLakshminarayanan, Mahalakshmi January 2013 (has links)
No description available.
|
59 |
Gaussian Deconvolution and MapReduce Approach for Chipseq AnalysisSugandharaju, Ravi Kumar Chatnahalli 26 September 2011 (has links)
No description available.
|
60 |
A MapReduce Performance Study of XML ShreddingLam, Wilma Samhita Samuel 20 October 2016 (has links)
No description available.
|
Page generated in 0.04 seconds