Global ETD Search

11	An analysis of key generation efficiency of rsa cryptos ystem in distributed environments/ Çağrıcı, Gökhan. Koltuksuz, Ahmet January 2005 (has links) (PDF) Thesis (Master)--İzmir Institute of Technology, İzmir, 2005. / Keywords: Cryptosystem, rivest-Shamir-Adleman, parallel computing, parallel algorithms, Random number. Includes bibliographical references (leaves. 68).
12	Using parallel computation to apply the singular value decomposition (SVD) in solving for large Earth gravity fields based on satellite data Hinga, Mark Brandon, Tapley, Byron D., January 2004 (has links) (PDF) Thesis (Ph. D.)--University of Texas at Austin, 2004. / Supervisor: Byron D. Tapley. Vita. Includes bibliographical references. Also available from UMI.
13	Parallelization of ECG template-based abnormality detection Kratsas, Sherry L. January 2000 (has links) Thesis (M.S.)--West Virginia University, 2000. / Title from document title page. Document formatted into pages; contains vii, 62 p. : ill. (some col.). Includes abstract. Includes bibliographical references (p. 61-62).
14	Efektivní paralelní zpracování dat na moderním hardware / Towards Efficient Parallel Data Processing on Modern Hardware Falt, Zbyněk January 2014 (has links) Parallel data processing is a very hot topic in current research, since the amount of data and the complexity of the operations performed on them has been increasing significantly in the past few years. In this thesis, we focus on a specific domain of this research -- the design and implementation of parallel algorithms used mainly in database systems. First, we introduce important enhancements in the Bobox system, which is a framework for the development of parallel data processing applications. Then, we introduce a new domain specific language called Bobolang which makes the implementation of those applications easier. Next, we propose parallel and scalable algorithms used in the domain of databases, namely sort and merge join, and introduce their efficient implementation using the combination of Bobox and Bobolang. Finally, we introduce parallel runtime for SPARQL engine as an example of a parallel data processing application which demonstrates the main contributions of this thesis in complex and real-life situations. Powered by TCPDF (www.tcpdf.org)
15	Parallelization of the Euler Equations on Unstructured Grids Bruner, Christopher William Stuteville 01 May 1996 (has links) Several different time-integration algorithms for the Euler equations are investigated on two distributed-memory parallel computers using an explicit message-passing paradigm: these are classic Euler Explicit, four-stage Jameson-style Runge-Kutta, Block Jacobi, Block Gauss-Seidel, and Block Symmetric Gauss-Seidel. A finite-volume formulation is used for the spatial discretization of the physical domain. Both two- and three-dimensional test cases are evaluated against five reference solutions to demonstrate accuracy of the fundamental sequential algorithms. Different schemes for communicating or approximating data that are not available on the local compute node are discussed and it is shown that complete sharing of the evolving solution to the inner matrix problem at every iteration is faster than the other schemes considered. Speedup and efficiency issues pertaining to the various time-integration algorithms are then addressed for each system. Of the algorithms considered, Symmetric Block Gauss-Seidel has the overall best performance. It is also demonstrated that using parallel efficiency as the sole means of evaluating performance of an algorithm often leads to erroneous conclusions; the clock time needed to solve a problem is a much better indicator of algorithm performance. A general method for extending one-dimensional limiter formulations to the unstructured case is also discussed and applied to Van Albada’s limiter as well as Roe’s Superbee limiter. Solutions and convergence histories for a two-dimensional supersonic ramp problem using these limiters are presented along with computations using the limiters of Barth & Jesperson and Venkatakrishnan — the Van Albada limiter has performance similar to Venkatakrishnan’s. / Ph. D. unstructured grids parallel algorithms computational fluid dynamics
16	Optimized hardware accelerators for data mining applications Kanan, Awos 19 February 2018 (has links) Data mining plays an important role in a variety of fields including bioinformatics, multimedia, business intelligence, marketing, and medical diagnosis. Analysis of today’s huge and complex data involves several data mining algorithms including clustering and classification. The computational complexity of machine learning and data mining algorithms, that are frequently used in today’s applications such as embedded systems, makes the design of efficient hardware architectures for these algorithms a challenging issue for the development of such systems. The aim of this work is to optimize the performance of hardware acceleration for data mining applications in terms of speed and area. Most of the previous accelerator architectures proposed in the literature have been obtained using ad hoc techniques that do not allow for design space exploration, some did not consider the size (number of samples) and dimensionality (number of features in each sample) of the datasets. To obtain practical architectures that are amenable for hardware implementation, size and dimensionality of input datasets are taken into consideration in this work. For one-dimensional data, algorithm-level optimizations are investigated to design a fast and area-efficient hardware accelerator for clustering one-dimensional datasets using the well-known K-Means clustering algorithm. Experimental results show that the optimizations adopted in the proposed architecture result in faster convergence of the algorithm using less hardware resources while maintaining the quality of clustering results. The computation of similarity distance matrices is one of the computational kernels that are generally required by several machine learning and data mining algorithms to measure the degree of similarity between data samples. For these algorithms, distance calculation is considered a computationally intensive task that accounts for a significant portion of the processing time. A systematic methodology is presented to explore the design space of 2-D and 1-D processor array architectures for similarity distance computation involved in processing datasets of different sizes and dimensions. Six 2-D and six 1-D processor array architectures are developed systematically using linear scheduling and projection operations. The obtained architectures are classified based on the size and dimensionality of input datasets, analyzed in terms of speed and area, and compared with previous architectures in the literature. Motivated by the necessity to accommodate large-scale and high-dimensional data, nonlinear scheduling and projection operations are finally introduced to design a scalable processor array architecture for the computation of similarity distance matrices. Implementation results of the proposed architecture show improved compromise between area and speed. Moreover, it scales better for large and high-dimensional datasets since the architecture is fully parameterized and only has to deal with one data dimension in each time step. / Graduate / 2019-12-31 Data Mining Parallel Algorithms Hardware Acceleration Systolic Arrays Design Methodology
17	Embarrassingly Parallel Statistics and its Applications: Divide & Recombine Methods for Parallel Computation of Quantiles and Construction of K-D Trees for Big-Data Aritra Chakravorty (5929565) 16 January 2019 (has links) <div>In Divide & Recombine (D&R), data are divided into subsets, analytic methodsare applied to each subset independently, with no communication between processes;then the subset outputs for each method are recombined. For big data, this providesalmost all of the analytic tasking needed when data are analyzed. It also provideshigh computational performance because typically most of the computation is em-barrassingly parallel, the simplest parallel computation.</div><div><br></div><div>Another kind of tasking must address computational performance and numericaccuracy: the computing of functions of all of the data, or “statistics”. For data bigand small, it is often important to compute such statistics for all of the data, whichcan be summaries of the data, such as sample quantiles of continuous variables, orcan process the data into a form that helps analysis, such as dividing the data intorepresentative subsets. Development of computational methods to compute thesestatistics can be challenging.</div><div><br></div><div>D&R can be a very effective framework for computing statistics. To supportthis, we introduce the concept of embarrassingly parallel (EP) statistics, both weakand strong. The concept of EP statistics is not entirely new, but has had littledevelopment. The existing methodology is mainly sums of sums. For example, this isdone when computing the necessary statistics for least squares where sums of productsand cross productions are carried out on subsets then summed across subsets. Ourtreatment of EP statistics has taken the concept much further. The outcome is abilityto use EP statistics in conjunction with the use a Fourier series to approximate an optimization criteria. The series terms, which are strongly EP statistics, are summedacross subsets, and the result is optimized. These are EP-F computational methods.</div><div><br></div><div>We have so far developed two EP-F computational methods for two widely usedstatistic computations. EP-F-Quantile is for quantiles of big data, and EP-F-KDtreeis for KD-trees. Speed and accuracy of EPF-Quantile are compared with that of thewell-known binning method, which also can be formulated in terms of EP statistics. EPF-KDtree is the first parallel KD-tree computational method of which we areaware. EP and EPF computational methods have potentially many other applicationsto computing statistics.</div> Statistics Divide and Recombine Map-Reduce Parallel algorithms. KD-tree quantiles
18	Hardware Architecture for Semantic Comparison Mohan, Suneil 2012 May 1900 (has links) Semantic Routed Networks provide a superior infrastructure for complex search engines. In a Semantic Routed Network (SRN), the routers are the critical component and they perform semantic comparison as their key computation. As the amount of information available on the Internet grows, the speed and efficiency with which information can be retrieved to the user becomes important. Most current search engines scale to meet the growing demand by deploying large data centers with general purpose computers that consume many megawatts of power. Reducing the power consumption of these data centers while providing better performance, will help reduce the costs of operation significantly. Performing operations in parallel is a key optimization step for better performance on general purpose CPUs. Current techniques for parallelization include architectures that are multi-core and have multiple thread handling capabilities. These coarse grained approaches have considerable resource management overhead and provide only sub-linear speedup. This dissertation proposes techniques towards a highly parallel, power efficient architecture that performs semantic comparisons as its core activity. Hardware-centric parallel algorithms have been developed to populate the required data structures followed by computation of semantic similarity. The performance of the proposed design is further enhanced using a pipelined architecture. The proposed algorithms were also implemented on two contemporary platforms such as the Nvidia CUDA and an FPGA for performance comparison. In order to validate the designs, a semantic benchmark was also been created. It has been shown that a dedicated semantic comparator delivers significantly better performance compared to other platforms. Results show that the proposed hardware semantic comparison architecture delivers a speedup performance of up to 10^5 while reducing power consumption by 80% compared to traditional computing platforms. Future research directions including better power optimization, architecting the complete semantic router and using the semantic benchmark for SRN research are also discussed. Semantic Comparison Semantic Networks Hardware Architecture Parallel Algorithms
19	Parallel algorithms for inductance extraction Mahawar, Hemant 17 September 2007 (has links) In VLSI circuits, signal delays play an important role in design, timing verification and signal integrity checks. These delays are attributed to the presence of parasitic resistance, capacitance and inductance. With increasing clock speed and reducing feature sizes, these delays will be dominated by parasitic inductance. In the next generation VLSI circuits, with more than millions of components and interconnect segments, fast and accurate inductance estimation becomes a crucial step. A generalized approach for inductance extraction requires the solution of a large, dense, complex linear system that models mutual inductive effects among circuit elements. Iterative methods are used to solve the system without explicit computation of the system matrix itself. Fast hierarchical techniques are used to compute approximate matrix-vector products with the dense system matrix in a matrix-free way. Due to unavailability of system matrix, constructing a preconditioner to accelerate the convergence of the iterative method becomes a challenging task. This work presents a class of parallel algorithms for fast and accurate inductance extraction of VLSI circuits. We use the solenoidal basis approach that converts the linear system into a reduced system. The reduced system of equations is solved by a preconditioned iterative solver that uses fast hierarchical methods to compute products with the dense coefficient matrix. A GreenÃÂ¢ÃÂÃÂs function based preconditioner is proposed that achieves near-optimal convergence rates in several cases. By formulating the preconditioner as a dense matrix similar to the coefficient matrix, we are able to use fast hierarchical methods for the preconditioning step as well. Experiments on a number of benchmark problems highlight the efficient preconditioning scheme and its advantages over FastHenry. To further reduce the solution time of the software, we have developed a parallel implementation. The parallel software package is capable of analyzing interconnects con- figurations involving several conductors within reasonable time. A two-tier parallelization scheme enables mixed mode parallelization, which uses both OpenMP and MPI directives. The parallel performance of the software is demonstrated through experiments on the IBM p690 and AMD Linux clusters. These experiments highlight the portability and efficiency of the software on multiprocessors with shared, distributed, and distributed-shared memory architectures. Preconditioning Inductance Extraction Parallel Algorithms Mixed Mode Parallelization Iterative Methods
20	Realizable paths and the NL vs L problem Prasad, Kintali Shiva 29 August 2011 (has links) A celebrated theorem of Savitch [Savitch'70] states that NSPACE(S) is contained in DSPACE(S²). In particular, Savitch gave a deterministic algorithm to solve ST-Connectivity (an NL-complete problem) using O({log}²{n}) space, implying NL (non-deterministic logspace) is contained in DSPACE({log}²{n}). While Savitch's theorem itself has not been improved in the last four decades, several graph connectivity problems are shown to lie between L and NL, providing new insights into the space-bounded complexity classes. All the connectivity problems considered in the literature so far are essentially special cases of ST-Connectivity. In this dissertation, we initiate the study of auxiliary PDAs as graph connectivity problems and define sixteen different "graph realizability problems" and study their relationships. The complexity of these connectivity problems lie between L (logspace) and P (polynomial time). ST-Realizability, the most general graph realizability problem is P-complete. 1DSTREAL(poly), the most specific graph realizability problem is L-complete. As special cases of our graph realizability problems we define two natural problems, Balanced ST-Connectivity and Positive Balanced ST-Connectivity, that lie between L and NL. We study the space complexity of SGSLOGCFL, a graph realizability problem lying between L and LOGCFL. We define generalizations of graph squaring and transitive closure, present efficient parallel algorithms for SGSLOGCFL and use the techniques of Trifonov to show that SGSLOGCFL is contained in DSPACE(lognloglogn). This implies that Balanced ST-Connectivity is contained in DSPACE(lognloglogn). We conclude with several interesting new research directions. Computational complexity Parallel algorithms Computer science Graph connectivity

Search results