• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 60
  • 7
  • 6
  • 4
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • Tagged with
  • 96
  • 42
  • 29
  • 28
  • 19
  • 18
  • 17
  • 13
  • 11
  • 11
  • 10
  • 10
  • 9
  • 9
  • 9
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
81

Architecture and Compiler Support for Leakage Reduction Using Power Gating in Microprocessors

Roy, Soumyaroop 31 August 2010 (has links)
Power gating is a technique commonly used for runtime leakage reduction in digital CMOS circuits. In microprocessors, power gating can be implemented by using sleep transistors to selectively deactivate circuit modules when they are idle during program execution. In this dissertation, a framework for power gating arithmetic functional units in embedded microprocessors with architecture and compiler support is proposed. During compile time, program regions are identified where one or more functional units are idle and sleep instructions are inserted into the code so that those units can be put to sleep during program execution. Subsequently, when their need is detected during the instruction decode stage, they are woken up with the help of hardware control signals. For a set of benchmarks from the MiBench suite, leakage energy savings of 27% and 31% are achieved (based on a 70 nm PTM model) in the functional units of a processor, modeled on the ARM architecture, with and without floating point units, respectively. Further, the impact of traditional performance-enhancing compiler optimizations on the amount of leakage savings obtained with this framework is studied through analysis and simulations. Based on the observations, a leakage-aware compilation flow is derived that improves the effectiveness of this framework. It is observed that, through the use of various compiler optimizations, an additional savings of around 15% and even up to 9X leakage energy savings in individual functional units is possible. Finally,in the context of multi-core processors supporting multithreading, three different microarchitectural techniques, for different multithreading schemes, are investigated for state-retentive power gating of register files. In an in-order core, when a thread gets blocked due to a memory stall, the corresponding register file can be placed in a low leakage state. When the memory stall gets resolved, the register file is activated so that it may be accessed again. The overhead due to wake-up latency is completely hidden in two of the schemes, while it is hidden for the most part in the third. Experimental results on multiprogrammed workloads comprised of SPEC 2000 integer benchmarks show that, in an 8-core processor executing 64 threads, the average leakage savings in the register files, modeled in FreePDK 45 nm MTCMOS technology, are 42% in coarse-grained multithreading, while they are between 7% and 8% in fine-grained and simultaneous multithreading. The contributions of this dissertation represent a significant advancement in the quest for reducing leakage energy consumption in microprocessors with minimal degradation in performance.
82

Υλοποίηση συστήματος κοινής ιδεατής μνήμης για συστάδες πολυεπεξεργαστικών συστημάτων / Software distributed shared memory for clusters of multiprocessors

Τουρναβίτης, Γεώργιος 16 May 2007 (has links)
Οι συστάδες υπολογιστών αποτελούν μία σύγχρονη ευρέως χρησιμοποιούμενη και ιδιαίτερα ανταγωνιστική αρχιτεκτονική για την υλοποίηση υπολογιστικών συστημάτων υψηλών επιδόσεων με χαμηλό κόστος. Παράλληλα, η ευρεία εμπορική διάθεση πολυεπεξεργαστικών συστημάτων μικρής κλίμακας, επιτρέπει τον συνδυασμό τους σε υβριδικά σχήματα συστάδων πολυεπεξεργαστών. Παρά την ευελιξία που παρέχεται στη σχεδίαση τους, η απαίτηση για χρήση κατανεμημένων μοντέλων προγραμματισμού αυξάνει σημαντικά την πολυπλοκότητα της ανάπτυξης εφαρμογών. Μία εναλλακτική προσέγγιση αποτελούν τα συστήματα κοινής ιδεατής μνήμης. Τα συστήματα κοινής ιδεατής μνήμης παρέχουν στις εφαρμογές, που εκτελούνται σε διαφορετικούς κόμβους της συστάδας, πρόσβαση σε έναν διαμοιραζόμενο χώρο διευθύνσεων αποκρύπτοντας την υποκείμενη κατανεμημένη αρχιτεκτονική. Βασικότερο περιορισμό της πλειονότητας των υπαρχόντων υλοποιήσεων αποτελεί η απουσία υποστήριξης πολυνηματισμού. Το χαρακτηριστικό αυτό έχει ως άμεση συνέπεια τη χαμηλή χρησιμοποίηση των σύγχρονων πολυεπεξεργαστικών υπολογιστικών μονάδων, καθώς ούτε η εφαρμογή αλλά ούτε και οι μηχανισμοί που εξασφαλίζουν τη συνέπεια της κοινής μνήμης εκτελούνται παράλληλα. Στα πλαίσια της παρούσας μεταπτυχιακής εργασίας παρουσιάζεται η σχεδίαση και η υλοποίηση μίας πλατφόρμας κοινής ιδεατής μνήμης χρησιμοποιώντας μηχανισμούς υλοποιημένους αποκλειστικά σε λογισμικό. Το προτεινόμενο σύστημα στοχεύει στην αποδοτικότερη χρησιμοποίηση των πόρων των πολυεπεξεργαστικών μονάδων της συστάδας, υποστηρίζοντας την πολυνηματική εκτέλεση της εφαρμογής σε κάθε κόμβο. Τόσο το πρωτόκολλο συνέπειας της κατανεμημένης μνήμης, όσο και το υποσύστημα επικοινωνίας, επανασχεδιάστηκαν ώστε να χρησιμοποιούν πολλαπλά νήματα εκτέλεσης. Επιπλέον παρουσιάζονται και αξιολογούνται εναλλακτικοί ιεραρχικοί αλγόριθμοι συγχρονισμού που επιτρέπουν την αποδοτικότερη χρήση της υβριδικής οργάνωσης των συστάδων. / Software Distributed Shared Memory (SDSM) systems provide an abstraction layer of shared memory semantics on top of a distributed set of computational nodes. The use of small-scale Symmetric Multiprocessor (SMP) nodes has the potential for bridging the performance-cost gap between the low-end SMPs and high-end Distributed Shared Memory (DSM) systems, using a hybrid software and hardware coherency model presented in this thesis. We present the design and discuss the main architectural choices involved in our implementation of a multithreaded SDSM system. Our implementation was developed on top of Pthreads and the TCP/IP network protocol, employing a simple yet efficient design. Finally, we evaluate and analyze the performance of the multithreading SDSM platform, using a wide set of benchmark applications.
83

[en] A FACE RECOGNITION SYSTEM FOR VIDEO SEQUENCES BASED ON A MULTITHREAD IMPLEMENTATION OF TLD / [pt] UM SISTEMA DE RECONHECIMENTO FACIAL EM VÍDEO BASEADO EM UMA IMPLEMENTAÇÃO MULTITHREAD DO ALGORITMO TLD

CIZENANDO MORELLO BONFA 04 October 2018 (has links)
[pt] A identificação facial em vídeo é uma aplicação de grande interesse na comunidade cientifica e na indústria de segurança, impulsionando a busca por técnicas mais robustas e eficientes. Atualmente, no âmbito de reconhecimento facial, as técnicas de identificação frontal são as com melhor taxa de acerto quando comparadas com outras técnicas não frontais. Esse trabalho tem como objetivo principal buscar métodos de avaliar imagens em vídeo em busca de pessoas (rostos), avaliando se a qualidade da imagem está dentro de uma faixa aceitável que permita um algoritmo de reconhecimento facial frontal identificar os indivíduos. Propõem-se maneiras de diminuir a carga de processamento para permitir a avaliação do máximo número de indivíduos numa imagem sem afetar o desempenho em tempo real. Isso é feito através de uma análise da maior parte das técnicas utilizadas nos últimos anos e do estado da arte, compilando toda a informação para ser aplicada em um projeto que utiliza os pontos fortes de cada uma e compense suas deficiências. O resultado é uma plataforma multithread. Para avaliação do desempenho foram realizados testes de carga computacional com o uso de um vídeo público disponibilizado na AVSS (Advanced Video and Signal based Surveillance). Os resultados mostram que a arquitetura promove um melhor uso dos recursos computacionais, permitindo um uso de uma gama maior de algoritmos em cada segmento que compõe a arquitetura, podendo ser selecionados segundo critérios de qualidade da imagem e ambiente onde o vídeo é capturado. / [en] Face recognition in video is an application of great interest in the scientific community and in the surveillance industry, boosting the search for efficient and robust techniques. Nowadays, in the facial recognition field, the frontal identification techniques are those with the best hit ratio when compared with others non-frontal techniques. This work has as main objective seek for methods to evaluate images in video to look for people (faces), assessing if the image quality is in an acceptable range that allows a facial recognition algorithm to identify the individuals. It s proposed ways to decrease the processing load to allow a maximum number of individuals assessed in an image without affecting the real time performance. This is reached through analysis of most the techniques used in the last years and the state-of-the-art, compiling all information to be applied in a project that uses the strengths of each one and offset its shortcomings. The outcome is a multithread platform. Performance evaluation was performed through computational load tests by using public videos available in AVSS ( Advanced Video and Signal based Surveillance). The outcomes show that the architecture makes a better use of the computational resources, allowing use of a wide range of algorithms in every segment of the architecture that can be selected according to quality image and video environment criteria.
84

Parallel Simulation : Parallel computing for high performance LTE radio network simulations

Andersson, Håkan January 2010 (has links)
Radio access technologies for cellular mobile networks are continuously being evolved to meet the future demands for higher data rates, and lower end‐to‐end delays. In the research and development of LTE, radio network simulations play an essential role. The evolution of parallel processing hardware makes it desirable to exploit the potential gains of parallelizing LTE radio network simulations using multithreading techniques in contrast to distributing experiments over processors as independent simulation job processes. There is a hypothesis that parallel speedup gain diminishes when running many parallel simulation jobs concurrently on the same machine due to the increased memory requirements. A proposed multithreaded prototype of the Ericsson LTE simulator has been constructed, encapsulating scheduling, execution and synchronization of asynchronous physical layer computations. In order to provide implementation transparency, an algorithm has been proposed to sort and synchronize log events enabling a sequential logging model on top of non‐deterministic execution. In order to evaluate and compare multithreading techniques to parallel simulation job distribution, a large number of experiments have been carried out for four very diverse simulation scenarios. The evaluation of the results from these experiments involved analysis of average measured execution times and comparison with ideal estimates derived from Amdahl’s law in order to analyze overhead. It has been shown that the proposed multithreaded task‐oriented framework provides a convenient way to execute LTE physical layer models asynchronously on multi‐core processors, still providing deterministic results that are equivalent to the results of a sequential simulator. However, it has been indicated that distributing parallel independent jobs over processors is currently more efficient than multithreading techniques, even though the achieved speedup is far from ideal. This conclusion is based on the observation that the overhead caused by increased memory requirements, memory access and system bus congestion is currently smaller than the thread management and synchronization overhead of the proposed multithreaded Java prototype.
85

Dynamic Task Prediction for an SpMT Architecture Based on Control Independence

Jothi, Komal 01 January 2009 (has links)
Exploiting better performance from computer programs translates to finding more instructions to execute in parallel. Since most general purpose programs are written in an imperatively sequential manner, closely lying instructions are always data dependent, making the designer look far ahead into the program for parallelism. This necessitates wider superscalar processors with larger instruction windows. But superscalars suffer from three key limitations, their inability to scale, sequential fetch bottleneck and high branch misprediction penalty. Recent studies indicate that current superscalars have reached the end of the road and designers will have to look for newer ideas to build computer processors. Speculative Multithreading (SpMT) is one of the most recent techniques to exploit parallelism from applications. Most SpMT architectures partition a sequential program into multiple threads (or tasks) that can be concurrently executed on multiple processing units. It is desirable that these tasks are sufficiently distant from each other so as to facilitate parallelism. It is also desirable that these tasks are control independent of each other so that execution of a future task is guaranteed in case of local control flow misspeculations. Some task prediction mechanisms rely on the compiler requiring recompilation of programs. Current dynamic mechanisms either rely on program constructs like loop iterations and function and loop boundaries, resulting in unbalanced loads, or predict tasks which are too short to be of use in an SpMT architecture. This thesis is the first proposal of a predictor that dynamically predicts control independent tasks that are consistently wide apart, and executes them on a novel SpMT architecture.
86

Parallel Solution of the Subset-sum Problem: An Empirical Study

Bokhari, Saniyah S. 21 July 2011 (has links)
No description available.
87

Out-of-Order Retirement of Instructions in Superscalar, Multithreaded, and Multicore Processors

Ubal Tena, Rafael 01 September 2010 (has links)
Los procesadores superescalares actuales utilizan un reorder buffer (ROB) para contabilizar las instrucciones en vuelo. El ROB se implementa como una cola FIFO first in first out en la que las instrucciones se insertan en orden de programa después de ser decodificadas, y de la que se extraen también en orden de programa en la etapa commit. El uso de esta estructura proporciona un soporte simple para la especulación, las excepciones precisas y la reclamación de registros. Sin embargo, el hecho de retirar instrucciones en orden puede degradar las prestaciones si una operación de alta latencia está bloqueando la cabecera del ROB. Varias propuestas se han publicado atacando este problema. La mayoría utiliza retirada de instrucciones fuera de orden de forma especulativa, requiriendo almacenar puntos de recuperación (checkpoints) para restaurar un estado válido del procesador ante un fallo de especulación. Normalmente, los checkpoints necesitan implementarse con estructuras hardware costosas, y además requieren un crecimiento de otras estructuras del procesador, lo cual a su vez puede impactar en el tiempo de ciclo de reloj. Este problema afecta a muchos tipos de procesadores actuales, independientemente del número de hilos hardware (threads) y del número de núcleos de cómputo (cores) que incluyan. Esta tesis abarca el estudio de la retirada no especulativa de instrucciones fuera de orden en procesadores superescalares, multithread y multicore. / Ubal Tena, R. (2010). Out-of-Order Retirement of Instructions in Superscalar, Multithreaded, and Multicore Processors [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8535
88

Conception et évaluation de performance d'un Bus applicatif, massivement parallèle et orienté service.

Benosman, Mohammed Ridha 12 December 2013 (has links) (PDF)
Enterprise Service Bus (ESB) est actuellement l'approche la plus prometteuse pour l'implémentation d'une architecture orientée services (SOA : Service-Oriented Architecture) par l'intégration des différentes applications isolées dans une plateforme centralisée. De nombreuses solutions d'intégration à base d'ESB on été proposées, elles sont soit open-source comme : Mule, Petals, ou encore Fuse, soit propriétaires tels que : Sonic ESB, IBM WebSphere Message Broker, ou Oracle ESB. Cependant, il n'en existe aucune en mesure de traiter, à la fois des aspects : d'intégration et de traitement massivement parallèle, du moins à notre connaissance. L'intégration du parallélisme dans le traitement est un moyen de tirer profit des technologies multicœurs/multiprocesseurs qui améliorent considérablement les performances des ESBs.Toutefois, cette intégration est une démarche complexe et soulève des problèmes à plusieurs niveaux : communication, synchronisation, partage de données, etc.Dans cette thèse, nous présentons l'étude d'une nouvelle architecture massivement parallèle de type ESB.
89

Athapascan-0 : exploitation de la multiprogrammation légère sur grappes de multiprocesseurs

Carissimi, Alexandre da Silva January 1999 (has links)
L'accroissement d'efficacite des réseaux d'interconnexion et la vulgarisation des machines multiprocesseurs permettent la réalisation de machines parallèles a mémoire distribuée de faible coût: les grappes de multiprocesseurs. Elles nécessitent l'exploitation à la fois du parallélismeà grain fin, interne à un multiprocesseur offert par la multiprogrammation légère, et du parallélisme à gros grain entre les différents multiprocesseurs. L'exploitation simultanée de ces deux types de parallélisme exige une méthode de communication entre les processus légers qui ne partagent pas le mêmme espace d'adressage. Le travail de cette thèse porte sur le problème de l'Intégration de la multiprogrammation légère et des communications sur grappes de multiprocesseurs symétriques (SMP). II porte plus précisément sur evaluation et le reglage du noyau exécutif ATHAPASCAN-0 sur ce type d'architecture. ATHAPASCAN-0 est un noyau exécutif, portable, développé au sein du projet APACHE (CNRS-INPG-INRIA-UJF), qui combine la multiprogrammation légère et la communication par échange de messages. La portabilité est assurée par une organisation en couches basée sur les standards POSIX threads et MPI largement répandus. ATHAPASCAN-0 étend le modèle de réseau statique de processus «lourds» communicants tel que MPI, PVM, etc,à celui d'un réseau dynamique de processus légers communicants. La technique de base est la multiprogrammation lègere des communications et des calculs. La progression des communications exige la scrutation de état du reseau et l'enchainement des opérations de transferts. L'efficacité repose sur la minimisation de ces opérations. De plus, l'emploi de multiprocesseurs ajoute des problèmes spécifiques dus à l'apparition d'un parallélisme réel entre calcul et communication. Ces problèmes sont présentés et des solutions sont proposées pour l'environnement ATHAPASCAN-0. Ces solutions sont évaluées sur des grappes de multiprocesseurs. / The continuous price reduction for commodity PC multiprocessors and the availability of fast network interfaces have made cluster of multiprocessors an attractive low-price alternative to build parallel systems. Multiprocessor clusters offer two levels of parallelism: a fine grain parallelism inside a single multiprocessor and a coarse grain among them. A mechanism must be provided to exploit both levels of parallelism simultaneously. This requires to provide communications between threads belonging to different addresses spaces. This dissertation addresses the problem of integrating threads and communications on ATHAPASCAN-0 run time system. ATHAPASCAN-0 is a portable run time for cluster of multiprocessors developed as part of the APACHE project (CNRS-INPG-INRIA-UJF). Portability is achieved by a layered organization based on standards like POSIX threads and MPI. The ATHAPASCAN-0 run time system extends the heavy-weight process communication model of message passing libraries such as MPI, PVM, etc, into a lighter dynamic network of communicating threads. Multiprogramming is the key concept used. Communication progress is based on a network polling basis to handle incoming messages and to deliver outgoing communications requests. Performance is strongly dependent on the way these operations are implemented. Additionally, multiprocessors introduce some programming problems like overhead of cache coherency mechanisms, method of managing concurrent accesses and efficient mutex locking to avoid unnecessary context switching. These problems are analyzed and solutions are implemented in the ATHAPASCAN-0 run time system. An evaluation of these solutions is performed on a cluster of multiprocessors.
90

Athapascan-0 : exploitation de la multiprogrammation légère sur grappes de multiprocesseurs

Carissimi, Alexandre da Silva January 1999 (has links)
L'accroissement d'efficacite des réseaux d'interconnexion et la vulgarisation des machines multiprocesseurs permettent la réalisation de machines parallèles a mémoire distribuée de faible coût: les grappes de multiprocesseurs. Elles nécessitent l'exploitation à la fois du parallélismeà grain fin, interne à un multiprocesseur offert par la multiprogrammation légère, et du parallélisme à gros grain entre les différents multiprocesseurs. L'exploitation simultanée de ces deux types de parallélisme exige une méthode de communication entre les processus légers qui ne partagent pas le mêmme espace d'adressage. Le travail de cette thèse porte sur le problème de l'Intégration de la multiprogrammation légère et des communications sur grappes de multiprocesseurs symétriques (SMP). II porte plus précisément sur evaluation et le reglage du noyau exécutif ATHAPASCAN-0 sur ce type d'architecture. ATHAPASCAN-0 est un noyau exécutif, portable, développé au sein du projet APACHE (CNRS-INPG-INRIA-UJF), qui combine la multiprogrammation légère et la communication par échange de messages. La portabilité est assurée par une organisation en couches basée sur les standards POSIX threads et MPI largement répandus. ATHAPASCAN-0 étend le modèle de réseau statique de processus «lourds» communicants tel que MPI, PVM, etc,à celui d'un réseau dynamique de processus légers communicants. La technique de base est la multiprogrammation lègere des communications et des calculs. La progression des communications exige la scrutation de état du reseau et l'enchainement des opérations de transferts. L'efficacité repose sur la minimisation de ces opérations. De plus, l'emploi de multiprocesseurs ajoute des problèmes spécifiques dus à l'apparition d'un parallélisme réel entre calcul et communication. Ces problèmes sont présentés et des solutions sont proposées pour l'environnement ATHAPASCAN-0. Ces solutions sont évaluées sur des grappes de multiprocesseurs. / The continuous price reduction for commodity PC multiprocessors and the availability of fast network interfaces have made cluster of multiprocessors an attractive low-price alternative to build parallel systems. Multiprocessor clusters offer two levels of parallelism: a fine grain parallelism inside a single multiprocessor and a coarse grain among them. A mechanism must be provided to exploit both levels of parallelism simultaneously. This requires to provide communications between threads belonging to different addresses spaces. This dissertation addresses the problem of integrating threads and communications on ATHAPASCAN-0 run time system. ATHAPASCAN-0 is a portable run time for cluster of multiprocessors developed as part of the APACHE project (CNRS-INPG-INRIA-UJF). Portability is achieved by a layered organization based on standards like POSIX threads and MPI. The ATHAPASCAN-0 run time system extends the heavy-weight process communication model of message passing libraries such as MPI, PVM, etc, into a lighter dynamic network of communicating threads. Multiprogramming is the key concept used. Communication progress is based on a network polling basis to handle incoming messages and to deliver outgoing communications requests. Performance is strongly dependent on the way these operations are implemented. Additionally, multiprocessors introduce some programming problems like overhead of cache coherency mechanisms, method of managing concurrent accesses and efficient mutex locking to avoid unnecessary context switching. These problems are analyzed and solutions are implemented in the ATHAPASCAN-0 run time system. An evaluation of these solutions is performed on a cluster of multiprocessors.

Page generated in 0.0629 seconds