Global ETD Search

221	An approach of parallel computation on factoring programs Lin, Jieh-Shwu January 2010 (has links) Typescript (photocopy). / Digitized by Kansas Correctional Industries
222	Parallel computation : synchronization, scheduling, and schemes. Jaffe, Jeffrey Martin January 1979 (has links) Thesis. 1979. Ph.D.--Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. / MICROFICHE COPY AVAILABLE IN ARCHIVES AND ENGINEERING. / Vita. / Bibliography: leaves 256-263. / Ph.D.
223	Avaliação de desempenho de sistemas paralelos baseada em descrição simplificada do programa e da arquitetura. / Performance evaluation of parallel systems based on simplified description of programs and architecture. Piola, Thatyana de Faria 27 August 2002 (has links) Este trabalho apresenta o desenvolvimento de uma linguagem para descrição simplificada de algoritmos paralelos, um tradutor e um simulador de rede. Com vistas à avaliação de desempenho, a linguagem permite uma prototipagem fácil e abrangente para descrever vários tipos de programas paralelos, envolvendo estruturas de controle, repetição e as partes de comunicação e computação. Para interpretar o código escrtio na linguagem, foi desenvolvido um tradutor que traduz o código simplificado descrito pela linguagem desenvolvida, gerando código C++. O simulador de rede computa os tempos envolvidos nas comunicações. O simulador interage com o código gerado pelo tradutor. Para validação foram utilizados alguns programas de testes e resultado da simulação comparado com o da execução em um cluster de computadores pessoais. / This work presents the development of a language for simplified description of parallel algorithms, a language translator and a network simulator. The language aims to allow an easy parallel program prototyping for performance evaluation purposes and aims to be enough comprehensive to describe several kinds of parallel programs including execution control structures, repetition communication and computation parts. A translator that translates the simplified code described by the language was developed producing C++ code. A network simulator computes the communication times. The simulator interacts with the code produced by the translator. For validation some tests programs were used and the simulation results compared with the execution times in a cluster of personal computers. language linguagem parallel processing processamento paralelo simulação simulation
224	Interface WEB para gerenciamento e utilização de clusters para processamento paralelo / A WEB interface for the use and management of parallel processing in clusters Lett, Elaine Patricia Quaresma Xavier 17 February 2003 (has links) Este trabalho descreve um projeto simples de gerenciamento de clusters que apresenta uma interface de usuário para as tarefas mais comuns de uso e gerenciamento de um cluster utilizado como máquina paralela. A partir do estudo de vários softwares existentes hoje, o sistema projetado foi adequado às necessidades do cluster do Laboratório de Processamento Paralelo Aplicado do Instituto de Física de São Carlos. O sistema é baseado em páginas HTML e scripts CGI. O uso de HTML e CGI se demonstrou apropriado para o desenvolvimento desse tipo de sistemas. / This work describes a simple cluster management system that operates as a user interface for some common user and manager tasks performed on a cluster used as parallel machine. We studied some cluster management systems from the literature and then designed a system with the needs of our research laboratory in mind. The system was implemented using HTML pages and CGI scripts. The use of HTML and CGI was found adequate for this type of systems. Cluster Cluster Interface WEB Parallel processing Processamento paralelo WEB interface
225	Parallel computing for image processing problems. January 1997 (has links) by Kin-wai Mak. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1997. / Includes bibliographical references (leaves 52-54). / Chapter 1 --- Introduction to Parallel Computing --- p.7 / Chapter 1.1 --- Parallel Computer Models --- p.8 / Chapter 1.2 --- Forms of Parallelism --- p.12 / Chapter 1.3 --- Performance Evaluation --- p.15 / Chapter 1.3.1 --- Finding Machine Parameters --- p.15 / Chapter 1.3.2 --- Amdahl's Law --- p.19 / Chapter 1.3.3 --- Gustafson's Law --- p.20 / Chapter 1.3.4 --- Scalability Analysis --- p.20 / Chapter 2 --- Introduction to Image Processing --- p.26 / Chapter 2.1 --- Image Restoration Problem --- p.26 / Chapter 2.1.1 --- Toeplitz Least Squares Problems --- p.29 / Chapter 2.1.2 --- The Need For Regularization --- p.31 / Chapter 2.1.3 --- Guide Star Image --- p.32 / Chapter 3 --- Toeplitz Solvers --- p.34 / Chapter 3.1 --- Introduction --- p.34 / Chapter 3.2 --- Parallel Implementation --- p.38 / Chapter 3.2.1 --- Overview of MasPar --- p.38 / Chapter 3.2.2 --- Design Methodology --- p.39 / Chapter 3.2.3 --- Implementation Details --- p.42 / Chapter 3.2.4 --- Application to Ground Based Astronomy --- p.44 / Chapter 3.2.5 --- Performance Analysis --- p.46 / Chapter 3.2.6 --- The Graphical Interface --- p.48 / Bibliography Image processing--Digital techniques
226	Parallel routing algorithms in Benes-Clos networks. January 1996 (has links) by Soung-Yue Liew. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1996. / Includes bibliographical references (leaves 55-57). / Chapter 1 --- Introduction --- p.1 / Chapter 2 --- The Basic Principles of Routing Algorithms --- p.10 / Chapter 2.1 --- The principles of sequential algorithms --- p.11 / Chapter 2.1.1 --- Edge-coloring of bipartite graph with maximum degree two --- p.11 / Chapter 2.1.2 --- Edge-coloring of bipartite graph with maximum degree M --- p.14 / Chapter 2.2 --- Looping algorithm --- p.17 / Chapter 2.2.1 --- Paull's Matrix --- p.17 / Chapter 2.2.2 --- Chain to be rearranged in Paull's Matrix --- p.18 / Chapter 2.3 --- The principles of parallel algorithms --- p.19 / Chapter 2.3.1 --- Edge-coloring of bipartite graph with maximum degree two --- p.20 / Chapter 2.3.2 --- Edge-coloring of bipartite graph with maximum degree 2m --- p.22 / Chapter 3 --- Parallel routing algorithm in Benes-Clos networks --- p.25 / Chapter 3.1 --- Routing properties of Benes networks --- p.25 / Chapter 3.1.1 --- Three-stage structure and routing constraints --- p.26 / Chapter 3.1.2 --- Algebraic interpretation of connection set up problem --- p.29 / Chapter 3.1.3 --- Equivalent classes --- p.31 / Chapter 3.2 --- Parallel routing algorithm --- p.32 / Chapter 3.2.1 --- Basic principles --- p.32 / Chapter 3.2.2 --- Initialization --- p.34 / Chapter 3.2.3 --- Algorithm --- p.36 / Chapter 3.2.4 --- Set up the states and determine π for next stage --- p.37 / Chapter 3.2.5 --- Simulation results --- p.40 / Chapter 3.2.6 --- Time complexity --- p.41 / Chapter 3.3 --- Contention resolution --- p.41 / Chapter 3.4 --- Algorithms applied to Clos network with 2m central switches --- p.43 / Chapter 3.5 --- Parallel algorithms in rearrangeability --- p.47 / Chapter 4 --- Conclusions --- p.52 Computer algorithms Computer networks
227	Parallelism and distribution for very large scale content-based image retrieval / Parallélisme et distribution pour des bases d'images à très grande échelle Gudmunsson, Gylfi Thor 12 September 2013 (has links) Les volumes de données multimédia ont fortement crus ces dernières années. Facebook stocke plus de 100 milliards d'images, 200 millions sont ajoutées chaque jour. Cela oblige les systèmes de recherche d'images par le contenu à s'adapter pour fonctionner à ces échelles. Les travaux présentés dans ce manuscrit vont dans cette direction. Deux observations essentielles cadrent nos travaux. Premièrement, la taille des collections d'images est telle, plusieurs téraoctets, qu'il nous faut obligatoirement prendre en compte les contraintes du stockage secondaire. Cet aspect est central. Deuxièmement, tous les processeurs sont maintenant multi-cœurs et les grilles de calcul largement disponibles. Du coup, profiter de parallélisme et de distribution semble naturel pour accélérer tant la construction de la base que le débit des recherches par lots. Cette thèse décrit une technique d'indexation multidimensionnelle s'appelant eCP. Sa conception prend en compte les contraintes issues de l'usage de disques et d'architectures parallèles et distribuées. eCP se fonde sur la technique de quantification vectorielle non structurée et non itérative. eCP s'appuie sur une technique de l'état de l'art qui est toutefois orientée mémoire centrale. Notre première contribution se compose d'extensions destinées à permettre de traiter de très larges collections de données en réduisant fortement le coût de l'indexation et en utilisant les disques au mieux. La seconde contribution tire profit des architectures multi-cœurs et détaille comment paralléliser l'indexation et la recherche. Nous évaluons cet apport sur près de 25 millions d'images, soit près de 8 milliards de descripteurs SIFT. La troisième contribution aborde l'aspect distribué. Nous adaptons eCP au paradigme Map-Reduce et nous utilisons Hadoop pour en évaluer les performances. Là, nous montrons la capacité de eCP à traiter de grandes bases en indexant plus de 100 millions d'images, soit 30 milliards de SIFT. Nous montrons aussi la capacité de eCP à utiliser plusieurs centaines de cœurs. / The scale of multimedia collections has grown very fast over the last few years. Facebook stores more than 100 billion images, 200 million are added every day. In order to cope with this growth, methods for content-based image retrieval must adapt gracefully. The work presented in this thesis goes in this direction. Two observations drove the design of the high-dimensional indexing technique presented here. Firstly, the collections are so huge, typically several terabytes, that they must be kept on secondary storage. Addressing disk related issues is thus central to our work. Secondly, all CPUs are now multi-core and clusters of machines are a commonplace. Parallelism and distribution are both key for fast indexing and high-throughput batch-oriented searching. We describe in this manuscript a high-dimensional indexing technique called eCP. Its design includes the constraints associated to using disks, parallelism and distribution. At its core is an non-iterative unstructured vectorial quantization scheme. eCP builds on an existing indexing scheme that is main memory oriented. Our first contribution is a set of extensions for processing very large data collections, reducing indexing costs and best using disks. The second contribution proposes multi-threaded algorithms for both building and searching, harnessing the power of multi-core processors. Datasets for evaluation contain about 25 million images or over 8 billion SIFT descriptors. The third contribution addresses distributed computing. We adapt eCP to the MapReduce programming model and use the Hadoop framework and HDFS for our experiments. This time we evaluate eCP's ability to scale-up with a collection of 100 million images, more than 30 billion SIFT descriptors, and its ability to scale-out by running experiments on more than 100 machines. Indexation automatique Parallélisme (informatique) Automatic indexing
228	Reactive scheduling of DAG applications on heterogeneous and dynamic distributed computing systems Hernandez, Jesus Israel January 2008 (has links) Emerging technologies enable a set of distributed resources across a network to be linked together and used in a coordinated fashion to solve a particular parallel application at the same time. Such applications are often abstracted as directed acyclic graphs (DAGs), in which vertices represent application tasks and edges represent data dependencies between tasks. Effective scheduling mechanisms for DAG applications are essential to exploit the tremendous potential of computational resources. The core issues are that the availability and performance of resources, which are already by their nature heterogeneous, can be expected to vary dynamically, even during the course of an execution. In this thesis, we first consider the problem of scheduling DAG task graphs onto heterogeneous resources with changeable capabilities. We propose a list-scheduling heuristic approach, the Global Task Positioning (GTP) scheduling method, which addresses the problem by allowing rescheduling and migration of tasks in response to significant variations in resource characteristics. We observed from experiments with GTP that in an execution with relatively frequent migration, it may be that, over time, the results of some task have been copied to several other sites, and so a subsequent migrated task may have several possible sources for each of its inputs. Some of these copies may now be more quickly accessible than the original, due to dynamic variations in communication capabilities. To exploit this observation, we extended our model with a Copying Management(CM) function, resulting in a new version, the Global Task Positioning with copying facilities (GTP/c) system. The idea is to reuse such copies, in subsequent migration of placed tasks, in order to reduce the impact of migration cost on makespan. Finally, we believe that fault tolerance is an important issue in heterogeneous and dynamic computational environments as the availability of resources cannot be guaranteed. To address the problem of processor failure, we propose a rewinding mechanism which rewinds the progress of the application to a previous state, thereby preserving the execution in spite of the failed processor(s). We evaluate our mechanisms through simulation, since this allow us to generate repeatable patterns of resource performance variation. We use a standard benchmark set of DAGs, comparing performance against that of competing algorithms from the scheduling literature. 004
229	Contention-free Scheduling of Communication Induced by Array Operations on 2D Meshes Eberhart, Andreas Bernhard Georg 10 May 1996 (has links) Whole array operations and array section operations are important features of many data-parallel languages. Efficient implementation of these operations on distributed- memory multicomputers is critical to the scalability and high-performance of data-parallel programs. This thesis presents an approach for analyzing communication patterns induced by array operations and for using run-time information to schedule the message flow. The distributed, dynamic scheduling algorithms guarantee link-contention-free data transfer and utilize network resources almost optimally. They incur little overhead, which is important in order not to reduce the speedup gained by the parallel execution. The algorithms can be used by compilers for the generation of efficient code for array operations. Implemented in a runtime library, they can derive a schedule depending on parameters passed by the parallel application. Simulation results demonstrate the algorithms' superiority to the asynchronous transfer mode that is commonly used for this type of communication. Computer algorithms Computer Sciences
230	Associative processing implemented with content-addressable memories Kida, Luis Sergio 01 January 1991 (has links) The associative processing model provides an alternative solution to the von Neumann bottleneck. The memory of an associative computer takes some of the responsibility for processing. Only intermediate results are exchanged between memory and processor. This greatly reduces the amount of communication between them. Content-addressable memories are one implementation of memory for this computational model. Associative computers implemented with CAMs have reported performance improvements of three orders of magnitude, which is equivalent to the performance of the same application running in a conventional computer with clock frequencies of the order of GHz. Among the benefits of content-addressable memories to the computer system are: 1) it is simpler to parallelize algorithms and implement concurrency; 2) the synchronization cost for parallel processing is lower, which enables the use of small grain parallelism; 3) it can improve the performance in non-numeric applications that are known to have low performance in conventional computers; 4) it provides a trade off between integration density and clock frequencies to achieve the same performance that is not available in RAM 5) matches well to current and future technologies due to the trade off between integration and clock frequency; 6) it attacks the von Neumann bottleneck by reducing the requirements on the communication bandwidth between processor and memory. In this thesis, the role of CAMs in associative processing is analyzed, reaching the conclusion that to implement these characteristics the CAM must be able to filter the data transferred to the processor, provide explicit support for parallelism and data structures, support non-numeric applications, and execute logical operations. The characteristics and architecture of a content-addressable memory integrated circuit are presented along with an application with estimated performance improvement of over three orders of magnitude. Associative storage Electrical and Computer Engineering

Search results