Global ETD Search

1	Efficient Linked List Ranking Algorithms and Parentheses Matching as a New Strategy for Parallel Algorithm Design Halverson, Ranette Hudson 12 1900 (has links) The goal of a parallel algorithm is to solve a single problem using multiple processors working together and to do so in an efficient manner. In this regard, there is a need to categorize strategies in order to solve broad classes of problems with similar structures and requirements. In this dissertation, two parallel algorithm design strategies are considered: linked list ranking and parentheses matching. Algorithms. parallel algorithm parentheses matching
2	Partial Sort and Its Applications on Single-Hop Wireless Networks Shiau, Shyue-Horng 19 January 2006 (has links) In this dissertation, we focus on the study of the partial sorting (generalized sorting) problem and the initialization problem. The partial sorting problem is to find the first k smallest (or largest) elements among n input elements and to report them in nondecreasing (or nonincreasing). The initialization problem on a multiprocessor system is to assign each of n input elements a unique identification number, from 1 to n. This problem can be regarded as a special case of the sorting problem in which all input elements have the same value. We propose some algorithms for solving these problems. The main result is to give precise analysis for these algorithms. On the traditional model, we modify two algorithms, based on insertion sort and quicksort, to solve the partial sorting problem. Our analysis figures out the whole race between the two partial sorting algorithms and shows that the partial insertion sort algorithm obtains the leading position from k = 1 (the beginning) until k 3 5pn. After that, the partial quicksort algorithm will take the leading position on the way to the end. We also extend the partial sorting problem on the Single-Hop wireless network with collision detection (WNCD) model. The extension fits in with the wireless trend and may be a foundation for studying divide-and-conquer. With the repeat maximum finding scheme, we propose a partial sorting algorithm and prove that its average time complexity is (k + log (n − k)). For the initialization problem on the WNCD model, we can invoke the sorting algorithms directly for solving it. However, those sorting algorithms would not be better than the method of building a partition tree. We show that the partition tree method requires 2.88n time slots in average. After reconstructing and analyzing the method, we improve the result from 2.88n to 2.46n. partial sort initialization parallel algorithm quicksort broadcast insertion sort
3	An Application Developed for Simulation of Electrical Excitation and Conduction in a 3D Human Heart Yu, Di 01 January 2013 (has links) This thesis first reviews the history of General Purpose computing Graphic Processing Unit (GPGPU) and then introduces the fundamental problems that are suitable for GPGPU algorithm. The architecture of GPGPU is compared against modern CPU architecture, and the fundamental difference is outlined. The programming challenges faced by GPGPU and the techniques utilized to overcome these issues are evaluated and discussed. The second part of the thesis presents an application developed with GPGPU technology to simulate the electrical excitation and conduction in a 3D human heart model based on cellular automata model. The algorithm and implementation are discussed in detail and the performance of GPU is compared against CPU. Cellular Automata CUDA GPGPU Parallel Algorithm SIMD Computer Sciences
4	Performance Optimization and Parallelization of Turbo Decoding for Software-Defined Radio Roth, Jonathan 26 September 2009 (has links) Research indicates that multiprocessor-based architectures will provide a flexible alternative to hard-wired application-specific integrated circuits (ASICs) suitable to implement the multitude of wireless standards required by mobile devices, while meeting their strict area and power requirements. This shift in design philosophy has led to the software-defined radio (SDR) paradigm, where a significant portion of a wireless standard's physical layer is implemented in software, allowing multiple standards to share a common architecture. Turbo codes offer excellent error-correcting performance, however, turbo decoders are one of the most computationally complex baseband tasks of a wireless receiver. Next generation wireless standards such as Worldwide Interoperability for Microwave Access (WiMAX), support enhanced double-binary turbo codes, which offer even better performance than the original binary turbo codes, at the expense of additional complexity. Hence, the design of efficient double-binary turbo decoder software is required to support wireless standards in a SDR environment. This thesis describes the optimization, parallelization, and simulated performance of a software double-binary turbo decoder implementation supporting the WiMAX standard suitable for SDR. An adapted turbo decoder is implemented in the C language, and numerous software optimizations are applied to reduce its overall computationally complexity. Evaluation of the software optimizations demonstrated a combined improvement of at least 270% for serial execution, while maintaining good bit-error rate (BER) performance. Using a customized multiprocessor simulator, special instruction support is implemented to speed up commonly performed turbo decoder operations, and is shown to improve decoder performance by 29% to 40%. The development of a flexible parallel decoding algorithm is detailed, with multiprocessor simulations demonstrating a speedup of 10.8 using twelve processors, while maintaining good parallel efficiency (above 89%). A linear-log-MAP decoder implementation using four iterations was shown to have 90% greater throughput than a max-log-MAP decoder implementation using eight iterations, with comparable BER performance. Simulation also shows that multiprocessor cache effects do not have a significant impact on parallel execution times. An initial investigation into the use of vector processing to further enhance performance of the parallel decoder software reveals promising results. / Thesis (Master, Electrical & Computer Engineering) -- Queen's University, 2009-09-25 16:22:47.288 parallel algorithm turbo decoder software-defined radio hardware simulation
5	OpenMPBench : An Open-Source Benchmark for Multiprocessor Based Embedded Systems / OpenMPBench : en Open-Source riktmärke för multiprocessor baserade inbyggda system Liang, Yuchen, Iqbal, Syed Muhammad Zeeshan January 2010 (has links) It is a new and open-source benchmark for multiprocessor based embedded system. It comprises a set of parallel implementations for seven classical algorithms that cover different computing features of general-purpose processor. The performance data including tables and figures is provided for guiding the potential users to evaluate the design of multiprocessor based embedded system. The parallel implementations for seven applications that cover four categories are shown according to the category: Automation and Industry Control * Bitcount * SUSAN * BASICMATH Network * Patricia * Dijkstra Office * Stringsearch Security * SHA Among them, Bitcount and Dijkstra involve more than one parallel application implemented for different functions or using different strategies. Bitcount consists three parallel applications, parallel Bitcnt_1, parallel Bitstring and parallel Bitcnts, that implemented bit counting with different strategy. Three parallel applications implemented for Dijkstra. One is for all-pairs shortest paths problem. Another two are for solving single-source shortest paths problem using single queue strategy and multiple queue strategy respectively. Stringsearch consists of Pratt-Boyer-Moore, Case-sensitive Boyer-Moore-Horspool, Case-Insensitive Boyer-Moore-Horspool, and Boyer-Moore-Horspool (Case-insensitive with accented character translation) implementations. Source code of sequential versions of these applications download from Mibench as well as the standard output based on x86-linux. For OpenMPBench, all parallel applications have been implemented in ANCI C language using POSIX threads. All libraries related to implementations are based on GNU standard library. Development environment is in UBUNTU 9.04 with 2.6.28-generic Linux kernel, GCC 4.2.4 compiler, and Emacs 22.1 editor. On the basis of current hardware condition, a workstation with 8 processors, shipped with UBUNTU 4.2.4, is selected for experiment environment. UBUNTU is a free GNU Linux version that offers all GNU standard library and GCC has been installed by default. In conclusion, we consider this experiment environment is available to simulate the multiprocessor based on embedded systems. / Det är en ny och öppen källkod riktmärke för multiprocessor baserade inbyggda system. Det innehåller en rad parallella implementationer i sju klassiska algoritmer som täcker olika datorer funktioner i allmänt bruk processor. Uppgifter om prestanda inklusive tabeller och siffror ges för att styra potentiella användare att utvärdera utformningen av multiprocessor baserade inbyggda system. De parallella implementeringar för sju ansökningar som omfattar fyra kategorier visas beroende på vilken kategori: Automation och industri Control * Bitcount * SUSAN * BASICMATH Nätverk * Patricia * Dijkstra Office * Stringsearch Säkerhet * SHA Bland dem, Bitcount och Dijkstra omfattar mer än en parallell ansökan genomförs för olika funktioner eller med hjälp av olika strategier. Bitcount består tre parallella program, parallell Bitcnt_1, parallell Bitstring och parallella Bitcnts, som genomförs bit räknar med olika strategi. Tre parallella ansökningar genomförs för Dijkstra. Den ena är för all-par kortaste stigar problem. Ytterligare två är för att lösa enda källa kortaste stigar problemet, använder en kö strategi och flera kö strategi respektive. Stringsearch består av Pratt-Boyer-Moore, skiftlägeskänslig Boyer-Moore-Horspool, skiftlägesokänslig Boyer-Moore-Horspool, och Boyer-Moore-Horspool (små bokstäver med accenttecken översättning) implementationer. Källkod sekventiell versioner av dessa program att hämta från Mibench liksom standard produktion baserad på x86-linux. För OpenMPBench har alla parallella ansökningar har genomförts i ANCI C-språk med POSIX trådar. Alla bibliotek i samband med implementationer är baserat på GNU standard bibliotek. Utvecklingsmiljö i Ubuntu 9.04 med 2.6.28-generic Linuxkärnan, GCC 4.2.4 kompilator och Emacs 22,1 redaktör. På grundval av nuvarande hårdvara skick, en arbetsstation med 8 processorer, som levereras med Ubuntu 4.2.4, har valts för experiment miljön. Ubuntu är ett gratis GNU Linux-version som kan erbjuda alla GNU Standard bibliotek och GCC har installerats som standard. Sammanfattningsvis anser vi att detta experiment miljön är tillgänglig för att simulera multiprocessor baserade på inbyggda system. / Yuchen Liang: phone no: 8641182120823 6-3-1, No. 44, Huabei Road Ganduan, Ganjingzi District, Dalian City, 116023, Liaoning Province, P. R. China Syed Muhammad Zeeshan Iqbal: phone no: 92415510275 Muhallah Gurunanak Pura, Street No: 7, House No:211, Faisalabad, Pakistan Benchmark Multiprocessor Linux Multithread Parallel Algorithm Computer Sciences Datavetenskap (datalogi)
6	Použití OpenCl v AVG na platformě Windows / Using of OpenCl at AVG in Windows Platform Bajcar, Martin January 2012 (has links) The main topic of this thesis is the practical use of OpenCL at AVG company. AVG is looking for ways to decrease hardware requirement of their security product and also to decrease computation time of some algorithms. Using OpenCL is one way to achieve this requirement. Significant part of this thesis deals with optimization strategies for AMD and NVIDIA graphics cards as they are most common cards among users. Practical part of the thesis describes parallelization of two algorithms, their analysis and implementation. After that, the obtained results are presented and cases in which the use of OpenCL is beneficial are identified. As a part of implementation, library containing various utility functions which can aid programmers to implement OpenCL based code was developed.
7	Parallel Sparse Matrix-Matrix Multiplication: A Scalable Solution With 1D Algorithm Hoque, Mohammad Asadul, Raju, Md Rezaul Karim, Tymczak, Christopher John, Vrinceanu, Daniel, Chilakamarri, Kiran 01 January 2015 (has links) This paper presents a novel implementation of parallel sparse matrix-matrix multiplication using distributed memory systems on heterogeneous hardware architecture. The proposed algorithm is expected to be linearly scalable up to several thousands of processors for matrices with dimensions over 106 (million). Our approach of parallelism is based on 1D decomposition and can work for both structured and unstructured sparse matrices. The storage mechanism is based on distributed hash lists, which reduces the latency for accessing and modifying an element of the product matrix, while reducing the overall merging time of the partial results computed by the processors. Theoretically, the time and space complexity of our algorithm is linearly proportional to the total number of non-zero elements in the product matrix C. The results of the performance evaluation show that the algorithm scales much better for sparse matrices with bigger dimensions. The speedup achieved using our algorithm is much better than other existing 1D algorithms. We have been able to achieve about 500 times speedup with only 672 processors. We also identified the impact of hardware architecture on scalability. distributed computing MPI parallel algorithm scalable Sparse matrix
8	Parallel Mining and Analysis of Triangles and Communities in Big Networks Arifuzzaman, S M. 19 August 2016 (has links) A network (graph) is a powerful abstraction for interactions among entities in a system. Examples include various social, biological, collaboration, citation, and co-purchase networks. Real-world networks are often characterized by an abundance of triangles and the existence of well-structured communities. Thus, counting triangles and detecting communities in networks have become important algorithmic problems in network mining and analysis. In the era of big data, the network data emerged from numerous scientific disciplines are very large. Online social networks such as Twitter and Facebook have millions to billions of users. Such massive networks often do not fit in the main memory of a single machine, and the existing sequential methods might take a prohibitively large runtime. This motivates the need for scalable parallel algorithms for mining and analysis. We design MPI-based distributed-memory parallel algorithms for counting triangles and detecting communities in big networks and present related analysis. The dissertation consists of four parts. In Part I, we devise parallel algorithms for counting and enumerating triangles. The first algorithm employs an overlapping partitioning scheme and novel load-balancing schemes leading to a fast algorithm. We also design a space-efficient algorithm using non-overlapping partitioning and an efficient communication scheme. This space efficiency allows the algorithm to work on even larger networks. We then present our third parallel algorithm based on dynamic load balancing. All these algorithms work on big networks, scale to a large number of processors, and demonstrate very good speedups. An important property, very related to triangles, of many real-world networks is high transitivity, which states that two nodes having common neighbors tend to become neighbors themselves. In Part II, we characterize networks by quantifying the number of common neighbors and demonstrate its relationship to community structure of networks. In Part III, we design parallel algorithms for detecting communities in big networks. We propose efficient load balancing and communication approaches, which lead to fast and scalable algorithms. Finally, in Part IV, we present scalable parallel algorithms for a useful graph preprocessing problem-- converting edge list to adjacency list. We present non-trivial parallelization with efficient HPC-based techniques leading to fast and space-efficient algorithms. / Ph. D. Network Mining Parallel Algorithm Triangle Counting Community Detection Big Data
9	Parallel multi-modal optimal design and sensitivity assessment for electric power systems Yazdanpanah Goharrizi, Ali 05 April 2016 (has links) This thesis proposes a novel algorithm to optimize multi-modal, nonlinear, black-box objective functions for electric power system design using an electromagnetic transients (EMT) simulator. The algorithm discovers multiple local optimal solutions for a given complex power system, and then generates accurate surrogate models of an objective function around each discovered local optimal solution. These surrogate models represent the local behaviour of the objective function that can be used in the subsequent stages of sensitivity analyses. Using surrogate models instead of intensive transient simulation during sensitivity analysis reduces computational intensity and simulation time. This makes the proposed algorithm particularly suited for optimization of computationally expensive black-box functions. The stages of the algorithm can be implemented independently and hence the computations can be done in parallel. Therefore, the algorithm is implemented in a parallel environment to gain significant speed-up in the design of electric power systems. Comparative studies in terms of objective function evaluation and computation time are provided. Using several multi-modal benchmark objective functions, the superiority of the proposed algorithm compared to other recently developed algorithms is demonstrated. Additionally, the application of the algorithm in the design process of complex electric power system demonstrated through several examples. The case studies show that the parallelized algorithm provides computational savings up to 39 times compared to the conventional sequential approach. / May 2016 power system optimization surrogate model sensitivity assessment parallel algorithm high voltage direct current drive system
10	Algorithmic Aspects of Ordered Structures Gustedt, Jens 03 July 1992 (has links) (PDF) In dieser Arbeit verbinden wir die Theorie der Quasi-Ordnungen mit der Theorie der Algorithmen einiger kombinatorischer Objekte. Zuerst entwickeln wir die Theorie der Wohl-Quasi-Ordnungen, WQO, im Zusammenhang zur maximalen Komplexität. Dann geben wir ein allgemeines 0-1-Gesetz für erbliche Eigenschaften, das Auswirkungen für die mittlere Komplexität hat. Dieses Ergebnis für mittlere Komplexität wird auf die Klasse der endlichen Graphen, versehen mit der Relation ``induzierter Subgraph'', angewendet. Wir erhalten, daß eine große Klasse von Problemen, welche z.B. Perfektheit umfaßt, Algorithmen mit im Mittel konstanter Laufzeit haben. Dann zeigen wir, indem wir ein Ergebnis von Damaschke für Cographen veralgemeinern, daß die Klassen der endlichen Ordnungen bzw. Graphen mit beschränktem Dekompositionsdurchmesser bzgl. der Relation ``induzierte Subordnung'' bzw. ``induzierter Subgraph'' WQO sind. Dies führt uns zu linearen Erkennungsalgorithmen für alle erblichen Eigenschaften über diesen Objekten. Unser Hauptresultat ist dann, daß die Menge der endlichen partiellen Ordnungen eine Wohl-Quasi-Ordnung bzgl. einer gewissen Relation &leq; , der sogenannten Ketten-Minor-Relation, ist. Um dies zu beweisen, führen wir eine verwandte Relation auf endlichen formalen Sprachen ein, die die gleiche Eigenschaft hat. Als Folgerung erhalten wir, daß jede Eigenschaft, die erblich bzgl. &leq; ist, einen Test in <em>O(\|P\|<sup>c</sup>)</em> Zeit zuläßt, wobei $c$ von der Eigenschaft abhängt. Dieser Test läßt sich leicht parallelisieren. Auf einer parallelen Maschine (\CRCW\ \PRAM) kann er so implementiert werden, daß er konstante Zeit auf <em>O(\|P\|<sup>c</sup>)</em> Prozessoren benötigt. [MATH] Mathematics [INFO:INFO_OH] Computer Science/Other well-quasi-orders hereditary properties polynomial time parallel algorithm

Search results