• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 5
  • 2
  • 1
  • Tagged with
  • 11
  • 5
  • 5
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

An Efficient Platform for Large-Scale MapReduce Processing

Wang, Liqiang 15 May 2009 (has links)
In this thesis we proposed and implemented the MMR, a new and open-source MapRe- duce model with MPI for parallel and distributed programing. MMR combines Pthreads, MPI and the Google's MapReduce processing model to support multi-threaded as well as dis- tributed parallelism. Experiments show that our model signi cantly outperforms the leading open-source solution, Hadoop. It demonstrates linear scaling for CPU-intensive processing and even super-linear scaling for indexing-related workloads. In addition, we designed a MMR live DVD which facilitates the automatic installation and con guration of a Linux cluster with integrated MMR library which enables the development and execution of MMR applications.
2

Parallel SVM with Application to Protein Structure Prediction

Panaganti, Shilpa 20 December 2004 (has links)
A learning task with thousands of training examples in Support Vector Machine (SVM) demands large amounts of memory and time requirements. SVMlight by Dr. Thorsten Joachims has been implemented in C using a fast optimizing algorithm for handling thousands of such support vectors. SVMlight solves the problem of classification, pattern recognition, regression and learning ranking function. The C code also provides methods for XiAlpha estimation of error rate and precision. Implementing these two methods leads to generalized performance of Support Vector Machine even for computation intensive text classification functions. SVMlight code allows users to define their own kernel functions. The SVMlight software employs an efficient algorithm and minimizes the cost, but it still takes considerable amount of time for computing thousands of support vectors and training examples. This time can be still reduced by parallelizing the code. In our work we refined the SVMlight code by removing unnecessary iterations and rewriting it as cost efficient. Then we parallelized the code individually using two different types, OpenMP and POSIX Threads shared memory parallelism. The code is parallelized for these two methods on Intel’s C compiler for Linux 7.1 using hyper threading technology. The parallelized code is tested for protein structure prediction. Different types of Protein Sequences are tested on these methods by varying the number of training examples and support vectors. The time consumption and speedup are calculated for both OpenMP and Pthreads. Implementation of OpenMP and Pthreads together showed good increase in speedup.
3

Serialisering av API mellan PC och inbyggda system

Andersson, Jonas January 2010 (has links)
Detta examensarbete behandlar problemet med att testa inbyggda system i kontorsmiljö. För att göra detta och därigenom kunna göra anrop på det inbyggda systemets API, måste detta anrop skickas som ett seriellt datapaket över en seriell kommunikationslänk som TCP/IP. Detta möjliggjordes genom att först upprätta en kommunikationslänk med protokollet TCP/IP, där användningen av POSIX-sockets tillämpades. För att packa ner och packa upp funktionsanropen till seriell data implementerades ett protokoll som följdes när detta utfördes. Hantering av data i samband med överföring över TCP/IP sköttes av ett protokoll vid namn BGSFP, ett protokoll som bygger på det tidigare protokollet TSFP.
4

Serialisering av API mellan PC och inbyggda system

Andersson, Jonas January 2010 (has links)
<p>Detta examensarbete behandlar problemet med att testa inbyggda system i kontorsmiljö. För att göra detta och därigenom kunna göra anrop på det inbyggda systemets API, måste detta anrop skickas som ett seriellt datapaket över en seriell kommunikationslänk som TCP/IP.</p><p>Detta möjliggjordes genom att först upprätta en kommunikationslänk med protokollet TCP/IP, där användningen av POSIX-sockets tillämpades. För att packa ner och packa upp funktionsanropen till seriell data implementerades ett protokoll som följdes när detta utfördes. Hantering av data i samband med överföring över TCP/IP sköttes av ett protokoll vid namn BGSFP, ett protokoll som bygger på det tidigare protokollet TSFP.</p>
5

Pthreads and OpenMP : A  performance and productivity study

Swahn, Henrik January 2016 (has links)
Today most computer have a multicore processor and are depending on parallel execution to be able to keep up with the demanding tasks that exist today, that forces developers to write software that can take advantage of multicore systems. There are multiple programming languages and frameworks that makes it possible to execute the code in parallel on different threads, this study looks at the performance and effort required to work with two of the frameworks that are available to the C programming language, POSIX Threads(Pthreads) and OpenMP. The performance is measured by paralleling three algorithms, Matrix multiplication, Quick Sort and calculation of the Mandelbrot set using both Pthreads and OpenMP, and comparing first against a sequential version and then the parallel version against each other. The effort required to modify the sequential program using OpenMP and Pthreads is measured in number of lines the final source code has. The results shows that OpenMP does perform better than Pthreads in Matrix Multiplication and Mandelbrot set calculation but not on Quick Sort because OpenMP has problem with recursion and Pthreads does not. OpenMP wins the effort required on all the tests but because there is a large performance difference between OpenMP and Pthreads on Quick Sort OpenMP cannot be recommended for paralleling Quick Sort or other recursive programs.
6

Performance analysis of GPGPU and CPU on AES Encryption

Neelap, Akash Kiran January 2014 (has links)
The advancements in computing have led to tremendous increase in the amount of data being generated every minute, which needs to be stored or transferred maintaining high level of security. The military and armed forces today heavily rely on computers to store huge amount of important and secret data, that holds a big deal for the security of the Nation. The traditional standard AES encryption algorithm being the heart of almost every application today, although gives a high amount of security, is time consuming with the traditional sequential approach. Implementation of AES on GPUs is an ongoing research since few years, which still is either inefficient or incomplete, and demands for optimizations for better performance. Considering the limitations in previous research works as a research gap, this paper aims to exploit efficient parallelism on the GPU, and on multi-core CPU, to make a fair and reliable comparison. Also it aims to deduce implementation techniques on multi-core CPU and GPU, in order to utilize them for future implementations. This paper experimentally examines the performance of a CPU and GPGPU in different levels of optimizations using Pthreads, CUDA and CUDA STREAMS. It critically exploits the behaviour of a GPU for different granularity levels and different grid dimensions, to examine the effect on the performance. The results show considerable acceleration in speed on NVIDIA GPU (QuadroK4000), over single-threaded and multi-threaded implementations on CPU (Intel® Xeon® E5-1650). / +46-760742850
7

Performance Prediction of Parallel Programs in a Linux Environment

Farooq, Mohammad Habibur Rahman & Qaisar January 2010 (has links)
Context. Today’s parallel systems are widely used in different computational tasks. Developing parallel programs to make maximum use of the computing power of parallel systems is tricky and efficient tuning of parallel programs is often very hard. Objectives. In this study we present a performance prediction and visualization tool named VPPB for a Linux environment, which had already been introduced by Broberg et.al, [1] for a Solaris2.x environment. VPPB shows the predicted behavior of a multithreaded program using any number of processors and the behavior is shown on two different graphs. The prediction is based on a monitored uni-processor execution. Methods. An experimental evaluation was carried out to validate the prediction reliability of the developed tool. Results. Validation of prediction is conducted, using an Intel multiprocessor with 8 processors and PARSEC 2.0 benchmark suite application programs. The validation shows that the speed-up predictions are +/-7% of a real execution. Conclusions. The experimentation of the VPPB tool showed that the prediction of VPPB is reliable and the incurred overhead into the application programs is low. / contact: +46(0)736368336
8

Contech: a shared memory parallel program analysis framework

Vassenkov, Phillip 13 January 2014 (has links)
We are in the era of multicore machines, where we must exploit thread level parallelism for programs to run better, smarter, faster, and more efficiently. In order to increase instruction level parallelism, processors and compilers perform heavy dataflow analyses between instructions. However, there isn’t much work done in the area of inter-thread dataflow analysis. In order to pave the way and find new ways to conserve resources across a variety of domains (i.e., execution speed, chip die area, power efficiency, and computational throughput), we propose a novel framework, termed Contech, to facilitate the analysis of multithreaded program in terms of its communication and execution patterns. We focus the scope on shared memory programs rather than message passing programs, since it is more difficult to analyze the communication and execution patterns for these programs. Discovering patterns of shared memory programs has the potential to allow general purpose computing machines to turn on or off architectural tricks according to application-specific features. Our design of Contech is modular in nature, so we can glean a large variety of information from an architecturally independent representation of the program under examination.
9

Network protocol for distribution and handling of data from JAS 39 Gripen / Nätverksprotokoll för distribuering och hantering av data från JAS 39 Gripen

Karlsson, Jonathan January 2015 (has links)
On board the aircraft JAS 39 Gripen a measuring system, Data Acquisition System (DAS), is sending sensor data to a server on the ground. In this master thesis, a unified API for distribution and handling of the sensor data is designed and implemented. The work has been carried out at Saab Aeronautics, Linköping during, 2014. During flights with the aircraft the engineers at Saab need to monitor different sensors in the aircraft, including the exact commands of the pilots. All that data is serialized and sent via radio link to a server at Saab. The current data distribution solution includes several clients that need to connect to the server. Each client has its own connection protocol, making the system complex and difficult to maintain. An API is needed in order to make the clients connect in a unified manner. This would also enable future clients to implement the API and start receiving sensor data from the server. The research conducted in the thesis project was centered on the different choices that exist for designing such an API. The question that needed answering was; how can an existing complex system can be replaced by a publish-subscribe system and what the benefits would be in terms of latency and flexibility of the system? The design would have to be flexible enough to support multiple clients. The investigated research question was answered with a design utilizing ZMQ, pthreads and a design pattern. The result is a flexible system that was sufficiently fast for the requirements set at Saab and open to future extensions. The thesis work also included designing a unified API with requirements on latency and functionality. The resulting API was designed using the publish-subscribe design pattern, the network library Zero Message Queue (ZMQ) and the threading library pthreads. The resulting system supports multiple coexisting servers and clients that request sensor data. A new feature is that the clients can start sending calculations performed on samples to other clients. To demonstrate that the solution provides a unified framework, two existing clients and the server were developed with the proposed API. To test the latency requirements, tests were performed in the control room at Saab.
10

Hybrid Parallel Computing Strategies for Scientific Computing Applications

Lee, Joo Hong 10 October 2012 (has links)
Multi-core, multi-processor, and Graphics Processing Unit (GPU) computer architectures pose significant challenges with respect to the efficient exploitation of parallelism for large-scale, scientific computing simulations. For example, a simulation of the human tonsil at the cellular level involves the computation of the motion and interaction of millions of cells over extended periods of time. Also, the simulation of Radiative Heat Transfer (RHT) effects by the Photon Monte Carlo (PMC) method is an extremely computationally demanding problem. The PMC method is example of the Monte Carlo simulation method—an approach extensively used in wide of application areas. Although the basic algorithmic framework of these Monte Carlo methods is simple, they can be extremely computationally intensive. Therefore, an efficient parallel realization of these simulations depends on a careful analysis of the nature these problems and the development of an appropriate software framework. The overarching goal of this dissertation is develop and understand what the appropriate parallel programming model should be to exploit these disparate architectures, both from the metric of efficiency, as well as from a software engineering perspective. In this dissertation we examine these issues through a performance study of PathSim2, a software framework for the simulation of large-scale biological systems, using two different parallel architectures’ distributed and shared memory. First, a message-passing implementation of a multiple germinal center simulation by PathSim2 is developed and analyzed for distributed memory architectures. Second, a germinal center simulation is implemented on shared memory architecture with two parallelization strategies based on Pthreads and OpenMP. Finally, we present work targeting a complete hybrid, parallel computing architecture. With this work we develop and analyze a software framework for generic Monte Carlo simulations implemented on multiple, distributed memory nodes consisting of a multi-core architecture with attached GPUs. This simulation framework is divided into two asynchronous parts: (a) a threaded, GPU-accelerated pseudo-random number generator (or producer), and (b) a multi-threaded Monte Carlo application (or consumer). The advantage of this approach is that this software framework can be directly used within any Monte Carlo application code, without requiring application-specific programming of the GPU. We examine this approach through a performance study of the simulation of RHT effects by the PMC method on a hybrid computing architecture. We present a theoretical analysis of our proposed approach, discuss methods to optimize performance based on this analysis, and compare this analysis to experimental results obtained from simulations run on two different hybrid, parallel computing architectures. / Ph. D.

Page generated in 0.0396 seconds