Global ETD Search

131	Microarchitecture and FPGA Implementation of the Multi-level Computing Architecture Capalija, Davor 30 July 2008 (has links) We design the microarchitecture of the Multi-Level Computing Architecture (MLCA), focusing on its Control Processor (CP). The design of the microarchitecture of the CP faces us with both opportunities and challenges that stem from the coarse granularity of the tasks and the large number of inputs and outputs for each task instruction. Thus, we explore changes to standard superscalar microarchitectural techniques. We design the entire CP microarchitecture and implement it on an FPGA using SystemVerilog. We synthesize and evaluate the MLCA system based on a 4-processor shared-memory multiprocessor. The performance of realistic applications shows scalable speedups that are comparable to that of simulation. We believe that our implementation achieves low complexity in terms of FPGA resource usage and operating frequency. In addition, we argue that our design methodology allows the scalability of the CP as the entire system grows. Computer architecture FPGA applications Microarchitecture Parallelism Embedded systems Multi-core systems 0984
132	Microarchitecture and FPGA Implementation of the Multi-level Computing Architecture Capalija, Davor 30 July 2008 (has links) We design the microarchitecture of the Multi-Level Computing Architecture (MLCA), focusing on its Control Processor (CP). The design of the microarchitecture of the CP faces us with both opportunities and challenges that stem from the coarse granularity of the tasks and the large number of inputs and outputs for each task instruction. Thus, we explore changes to standard superscalar microarchitectural techniques. We design the entire CP microarchitecture and implement it on an FPGA using SystemVerilog. We synthesize and evaluate the MLCA system based on a 4-processor shared-memory multiprocessor. The performance of realistic applications shows scalable speedups that are comparable to that of simulation. We believe that our implementation achieves low complexity in terms of FPGA resource usage and operating frequency. In addition, we argue that our design methodology allows the scalability of the CP as the entire system grows. Computer architecture FPGA applications Microarchitecture Parallelism Embedded systems Multi-core systems 0984
133	Μελέτη συστήματος παροχής ηλεκτρικών τάσεων ελεγχομένων μέσω PLC σε εργαστηριακό χώρο : λειτουργία ζεύγους μηχανών στα τέσσερα τεταρτημόρια / Study of PLC controlled electrical voltage supply system used in laboratory : operation of electrical machine pair in four quadrants Λουκάκος, Παναγιώτης 19 August 2010 (has links) Το αντικείμενο της παρούσας διπλωματικής εργασίας είναι η κατασκευή ενός πλήρους αυτοματοποιημένου συστήματος παραγωγής και παροχής ηλεκτρικών τάσεων με τη χρήση Προγραμματιζόμενου Λογικού Ελεγκτή (PLC) και συστήματος εποπτικού ελέγχου και συλλογής πληροφοριών (SCADA). Το σύστημα βρίσκεται στο Εργαστήριο Ηλεκτρομηχανικής Μετατροπής Ενέργειας του τμήματος Ηλεκτρολόγων Μηχανικών και Τεχνολογίας Υπολογιστών. Ο αρχικός σχεδιασμός και υλοποίηση του συστήματος έγινε το 1987 από την εταιρία ΑSEA BROWN BOVERI (ABB). Με την πάροδο του χρόνου εμφανίστηκαν βασικά λειτουργικά προβλήματα τα οποία σε συνδυασμό με την ανύπαρκτη τεχνική υποστήριξη οδήγησαν στην προβληματική λειτουργία και σταδιακά στην κατάρρευση του συστήματος. Στην πορεία λειτούργησαν κάποιες παροχές (Ε1 και Ε3) μέσω κονσόλας κλασικού αυτοματισμού με μπουτόν. Έτσι, με την πρωτοβουλία του Καθηγητή Δρ. -Μηχανικού κου Αθανασίου Ν. Σαφάκα ανατέθηκαν 7 διπλωματικές εργασίες σε φοιτητές του Τμήματος Ηλεκτρολόγων Μηχανικών και Τεχνολογίας Υπολογιστών του Πανεπιστημίου Πατρών, οι οποίες είχαν ως στόχο την επαναλειτουργία του αρχικά αυτοματοποιημένου συστήματος, με χρήση Προγραμματιζόμενων Λογικών Ελεγκτών τελευταίας γενιάς. Για την πραγματοποίηση του έργου αυτού, το σύστημα μελετήθηκε εκτενώς, καταγράφηκαν όλα τα στοιχεία του και επαναλειτούργησε, σταδιακά, σε τρεις φάσεις ανάθεσης διπλωματικών εργασιών. Συγκεκριμένα πρόκειται για την παραγωγή 10 διαφορετικών τύπων τάσεων σε 12 εργαστηριακές θέσεις. Στην 3η φάση υλοποίησης και ολοκλήρωσης του έργου, όπου εντάσσεται και αυτή η διπλωματική, υλοποιήθηκαν οι εξής παροχές: • Ε6 = 0V…230V DC • Ε7 = 0V…230V DC • Ε8 = 0V…500V DC • Ε9 = 149V…450V DC • Ε10 = 150V…500V AC, 15…100Hz Πρόκειται για μεταβλητές τάσεις που παράγονται από ζεύγη ηλεκτρικών μηχανών. Οι μηχανές αυτές βρίσκονται στο υπόγειο του Εργαστηρίου Ηλεκτρομηχανικής Μετατροπής Ενέργειας και ο χειρισμός τους γίνεται από ένα φιλικό ως προς το χρήστη περιβάλλον μέσω ηλεκτρονικού υπολογιστή (σύστημα SCADA). Ο χειριστής δεν χρειάζεται να έχει ιδιαίτερες γνώσεις για να λειτουργήσει το όλο σύστημα διότι δεν έρχεται σε άμεση επαφή με τον Προγραμματιζόμενο Λογικό Ελεγκτή (PLC) , ο οποίος είναι υπεύθυνος για την σωστή λειτουργία του συστήματος. Ο χειριστής έρχεται μόνο σε επαφή με το σύστημα SCADA , το οποίο βασίζεται στο πρόγραμμα του PLC για να λειτουργήσει. Έτσι το σύστημα που υλοποιήθηκε παρέχει ασφαλιστικές δικλείδες για την αυτόματη αντιμετώπιση οποιουδήποτε είδους σφάλματος παρουσιαστεί (υλικού αλλά και χειρισμού) , προειδοποιητικά μηνύματα και καταγραφή ιστορικού βλαβών. Ουσιαστικά προλαμβάνει και αντιμετωπίζει το σφάλμα αλλά και το λάθος του χειριστή αυτόματα και ταχύτατα. Κάποια μέρη της εργασίας με ιδιαίτερο ενδιαφέρον είναι: για την διεκπεραίωση της παροχής Ε9 γίνεται παραλληλισμός σύγχρονης γεννήτριας στο δίκτυο αυτόματα μέσω του Προγραμματιζόμενου Λογικού Ελεγκτή. Επίσης δημιουργήθηκε ένας ομαλός εκκινητής (soft starter) για έλεγχο ασύγχρονης μηχανής εκμεταλλευόμενοι την παροχή Ε10. Όλες οι παροχές μπορούν να δοθούν στους εργαστηριακούς πάγκους για εκπαιδευτικούς αλλά και ερευνητικούς σκοπούς. Στην διπλωματική γίνεται εκτενή αναφορά στις παραπάνω παροχές ,στις συνδεσμολογίες που κάναμε ,στον προγραμματισμό του PLC καθώς και στο σύστημα SCADA που έρχεται σε επαφή ο χειριστής. / This diploma thesis’ objective was to create a fully automated electrical voltage production and supply system using Programmable Logical Controller (PLC) and Supervisory Control and Data Acquisition (SCADA). This system is located in the Electromechanical Energy Conversion Laboratory of Electrical and Computer Engineering Department. The initial design and implementation of the system was in 1987 by ΑSEA BROWN BOVERI (ABB). The basic operational problems combined with little technical support led to the insufficient operation and gradually to the collapse of the system. Some supplies (E1 and E3) operated later on, using typical automation control with button console. So, Professor Dr.-Ing. Athanasios N. Safacas initiated the reoperation of the automated system with the use of new PLCs. For this purpose he assigned seven diploma thesis to the students of Electrical and Computer Engineering Department. To complete the project, the system had to be fully analyzed and its components fully registered. The project was finalized after three phases. The system includes 10 different types of electrical voltage in 12 different laboratory locations. During the third phase, that this diploma thesis is part of, the following supplies needed to be implemented : • Ε6 = 0V…230V DC • Ε7 = 0V…230V DC • Ε8 = 0V…500V DC • Ε9 = 149V…450V DC • Ε10 = 150V…500V AC, 15…100Hz These are variable voltage that are produced by electrical machine pairs. The machine are located at the basement of Electromechanical Energy Conversion Laboratory and they are operated by SCADA. The system is operated exclusively by SCADA ,so the operator doesn’t need to have special knowledge of the whole system. The system is secured against any error (hardware or operation) and offers warning messaging and saves error history. Some sections of great interest are: • The E9 supply operates through parallelism of synchronous generator with the energy network. This is achieved auto by the Programmable Logical Controller. • The E10 supply was used to provide a soft starter simulation in order to control the speed of asynchronous machine. In the diploma thesis includes all the information about the power supplies, the connections that were made, the PLC programming and the making of the SCADA system. Αυτοματισμός 621.381 5 PLC SCADA Parallelism of synchronous generator Automation
134	Programmer le parallélisme avec des futures en Heptagon un langage synchrone flot de données et étude des réseaux de Kahn en vue d’une compilation synchrone / Programming parallelism with futures in Heptagon a synchronous functional language, and, study of Kahn networks aiming synchronous compilation Gérard, Léonard 25 September 2013 (has links) Les langages synchrones ont été fondés pour modéliser et implémenter les systèmes réactifs temps-réels critiques. Avec la complexité toujours croissante des systèmes contrôlés, la vitesse d'exécution devient un critère important. Nous sommes donc à la recherche d'une exécution parallèle, combinant efficacité et sûreté.Les langages synchrones ont toujours intégré la notion de parallélisme, mais ce, pour l'expressivité de la modélisation. Leurs compilations visent principalement les circuits ou la génération de code séquentiel. Tous ont une sémantique formelle, qui rend possible la distribution correcte du code. Mais la préservation de cette sémantique peut être un obstacle à l'efficacité du code généré, particulièrement s'il est nécessaire de préserver une notion d'instant global au système.Le modèle sémantique qui nous intéresse est celui des réseaux de Kahn. Ces réseaux modélisent des calculateurs distribués, communiquant au travers de files de taille non bornée. Dans ce cadre, la distribution ne demande aucune communication ni synchronisation supplémentaire. En considérant l'histoire des files de communication, la sémantique de Kahn permet de s'abstraire de l'exécution effective, tout en garantissant le déterminisme du calcul. Pour cela, chaque nœud du réseau doit avoir une sémantique fonctionnelle continue.Le langage que nous développons est Heptagon, un langage synchrone fonctionnel du premier ordre, déscendant de Lustre. Son compilateur est un prototype universitaire, apparenté à l'outil industriel Scade. Grâce à sa sémantique de Kahn, la distribution d'un programme Heptagon ne pose pas de question, son efficacité beaucoup plus.L'efficacité requiert de minimiser les synchronisations. Cela revêt deux aspects non indépendants. Avoir un découplage suffisant des calculs : il y a des délais dans les dépendances entre calculs. Avoir une granularité importante des calculs : un fort ratio temps de calcul sur fréquence de communication. Or la sémantique synchrone et les horloges d'un programme Heptagon reflètent exactement l'inverse. Elles permettent au programmeur de se contenter d'un découplage d'un instant et à chaque instant, au maximum une valeur est calculée. De plus, les instants sont typiquement courts, pour assurer que le système réagit rapidement.Des précédents travaux sur le sujet, nous tirons deux constats.Le premier est que nous souhaitons le contrôle du parallélisme par le programmeur, directement dans le code source. Il doit pouvoir maîtriser à quels instants il y a communication ou synchronisation. La solution que nous proposons dans ce manuscrit est l'utilisation des futures dans Heptagon. Ils fournissent ce pouvoir au programmeur, tout en restant des annotations qui peuvent être supprimées sans changer la sémantique dénotationnelle du programme.Le deuxième constat est que la question de la granularité des calculs est une question profonde, touchant en particulier aux questions de dépendance de données, de choix des horloges et de compilation modulaire. Heptagon, comme ses parents, restreint les réseaux de Kahn qui peuvent être écrits, de telle sorte que ces trois questions se traitent séparément. Pour mieux comprendre le lien entre ces éléments, nous revenons aux réseaux de Kahn. Notre principal résultat est la définition de la sous-classe des réseaux ordonnés réactifs. Ceux-ci sont les seuls pour lesquels nous pouvons décrire modulairement le comportement avec des horloges, sans restreindre les contextes d'appels. Ces réseaux ont une signature d'horloge en forme normale, qui maximise la granularité. Pour l'exprimer, nous introduisons les horloges entières, décrivant la communication de plusieurs valeurs en un seul instant. Nous appliquons ensuite nos résultats pour voir sous un nouveau jour Heptagon, Signal, les politiques des objets de Lucid Synchrone, mais aussi proposer une analyse pleinement modulaire de Lucy-n langage synchrone le plus fidèle aux réseaux de Kahn. / Synchronous languages are used to program critical reactive systems. Today, systems require to find a way to execute them safely and in parallel. Parallelism has always been part of synchronous langages, but for modeling purpose. Their formal semantics allow to distribute them, but preserving the semantics may be ressource costly and prevent good parallel execution.The Kahn networks model is of great interest. It models distributed computers, communicating through unbounded FIFOs, ensuring that the computed values are deterministic, without any need of added synchronization.We develop the langage Heptagon, a first order functional synchronous son of Lustre.The compiler is an academic prototype of the industrial tool Scade. Thanks to its Kahn semantics, it can be distributed. In order to be efficient, one need to maximize the decoupling of computations and maximize the computation granularity. However, synchronous langages allow for very tight computation coupling and usually require thin computation granularity to ensure reactivity of the system.We opt for two research directions. The first one is to give the control of the execution parallelism to the programer. To this mean, we add futures to the source langage Heptagon. They provide control over starting and end of parallel computations, while preserving the functional semantics. Moreover, we provide a compilation for embedded systems, using statically allocated memory. The second one is to study Kahn synchronous semantics to understand data dependencies and maximize granularity of the computations. This touches deeply to the synchronous languages, mixing the usually separated questions of causality and clock calculus. We define the class of reactive ordered Kahn networks. They are the one which may be modularly compiled and whose behavior may be expressed with a clock signature. Moreover, we show that their is a normal form for this signature, maximizing the granularity of the network. To express it, we extend clocks to integer clocks. Then we come back to the synchronous languages we know to understand how to use it. The result is fully used and explained on Lucy-n, the synchronous language closest to Kahn networks. Langage synchrone Compilation Futures Kahn Parallélisme Stabilité Séquentiel Synchronous languages Compilation Futures Kahn Parallelism Stability Sequentiality
135	Squelettes algorithmiques méta-programmés : implantations, performances et sémantique / Metaprogrammed algorithmic skeletons : implementations, performances and semantics Javed, Noman 21 October 2011 (has links) Les approches de parallélisme structuré sont un compromis entre la parallélisation automatique et la programmation concurrentes et réparties telle qu'offerte par MPI ou les Pthreads. Le parallélisme à squelettes est l'une de ces approches. Un squelette algorithmique peut être vu comme une fonction d'ordre supérieur qui capture un algorithme parallèle classique tel qu'un pipeline ou une réduction parallèle. Souvent la sémantique des squelettes est simple et correspondant à celle de fonctions d'ordre supérieur similaire dans les langages de programmation fonctionnels. L'utilisation combine les squelettes disponibles pour construire son application parallèle. Lorsqu'un programme parallèle est conçu, les performances sont bien sûr importantes. Il est ainsi très intéressant pour le programmeur de disposer d'un modèle de performance, simple mais réaliste. Le parallélisme quasi-synchrone (BSP) offre un tel modèle. Le parallélisme étant présent maintenant dans toutes les machines, du téléphone au super-calculateur, il est important que les modèles de programmation s'appuient sur des sémantiques formelles pour permettre la vérification de programmes. Les travaux menés on conduit à la conception et au développement de la bibliothèque Orléans Skeleton Library ou OSL. OSL fournit un ensemble de squelettes algorithmiques data-parallèles quasi-synchrones. OSL est une bibliothèque pour le langage C++ et utilise des techniques de programmation avancées pour atteindre une bonne efficacité. Les communications se basent sur la bibliothèque MPI. OSL étant basée sur le modèle BSP, il est possible non seulement de prévoir les performances des programmes OSL mais également de fournir une portabilité des performances. Le modèle de programmation d'OSL a été formalisé dans l'assistant de preuve Coq. L'utilisation de cette sémantique pour la preuve de programmes est illustrée par un exemple. / Structured parallelism approaches are a trade-off between automatic parallelisation and concurrent and distributed programming such as Pthreads and MPI. Skeletal parallelism is one of the structured approaches. An algorithmic skeleton can be seen as higher-order function that captures a pattern of a parallel algorithm such as a pipeline, a parallel reduction, etc. Often the sequential semantics of the skeleton is quite simple and corresponds to the usual semantics of similar higher-order functions in functional programming languages. The user constructs a parallel program by combined calls to the available skeletons. When one is designing a parallel program, the parallel performance is of course important. It is thus very interesting for the programmer to rely on a simple yet realistic parallel performance model. Bulk Synchronous Parallelism (BSP) offers such a model. As the parallelism can now be found everywhere from smart-phones to the super computers, it becomes critical for the parallel programming models to support the proof of correctness of the programs developed with them. . The outcome of this work is the Orléans Skeleton Library or OSL. OSL provides a set of data parallel skeletons which follow the BSP model of parallel computation. OSL is a library for C++ currently implemented on top of MPI and using advanced C++ techniques to offer good efficiency. With OSL being based over the BSP performance model, it is possible not only to predict the performances of the application but also provides the portability of performance. The programming model of OSL is formalized using the big-step semantics in the Coq proof assistant. Based on this formal model the correctness of an OSL example is proved. Squelettes algorithmiques Parallélisme quasi-synchrone Algorithmic skeletons Bulk synchronous parallelism
136	Extension paramétrée de compilateur certifié pour la programmation parallèle / Parameterised extension of certified compiler for parallel programming Dailler, Sylvain 17 December 2015 (has links) Les applications informatiques sont de plus en plus présentes dans nos vies. Pour les applications critiques (médecine, transport, . . .), les conséquences d’une erreur informatique ont un coût inacceptable, que ce soit sur le plan humain ou financier. Une des méthodes pour éviter la présence d’erreurs dans les programmes est la vérification déductive. Celle-ci s’applique à des programmes écrits dans des langages de haut-niveau transformés, par des compilateurs, en programmes écrits en langage machine. Les compilateurs doivent être corrects pour ne pas propager d’erreurs au langage machine. Depuis 2005, les processeurs multi-coeurs se sont répandus dans l’ensemble des systèmes informatiques. Ces architectures nécessitent des compilateurs et des preuves de correction adaptées. Notre contribution est l’extension modulaire d’un compilateur vérifié pour un langage parallèle ciblant des architectures parallèles multi-coeurs. Les spécifications des langages (et leurs sémantiques opérationnelles) présents aux divers niveaux du compilateur ainsi que les preuves de la correction du compilateur sont paramétrées par des modules spécifiant des éléments de parallélisme tels qu’un modèle mémoire faible et des notions de synchronisation et d’ordonnancement entre processus légers. Ce travail ouvre la voie à la conception d’un compilateur certifié pour des langages parallèles de haut-niveau tels que les langages à squelettes algorithmiques. / Nowadays, we are using an increasing number of computer applications. Errors in critical applications (medicine, transport, . . .) may carry serious health or financial issues. Avoiding errors in programs is a challenge and may be achieved by deductive verification. Deductive verification applies to program written in a high-level languages, which are transformed into machine language by compilers. These compilers must be correct to ensure the nonpropagation of errors to machine code. Since 2005, multicore processors have spread in all electronic devices. So, these architectures need adapted compilers and proofs of correctness. Our work is the modular extension of a verified compiler for parallel languages targeting multicore architectures. Specifications of these languages (and their operational semantics) needed at all levels of the compiler and proofs of correctness of this compiler are parameterized by modules specifying elements of parallelism such as a relaxed memory model and notions of synchronization and scheduling between threads. This work is the first step in the conception of a certified compiler for high-level parallel languages such as algorithmic skeletons. Compilation Vérification Parallélisme Modularité Assistants de preuve Compilation Verification Parallelism Modularity Proof assistants 005.453
137	Providing adaptability to MPI applications on current parallel architectures / Provendo adaptabilidade em aplicações MPI nas arquiteturas paralelas atuais Cera, Marcia Cristina January 2012 (has links) Atualmente, adaptabilidade é uma característica desejada em aplicações paralelas. Por exemplo, o crescente número de usuários competindo por recursos em arquiteturas paralelas gera mudanças constantes no conjunto de processadores disponíveis. Aplicações adaptativas são capazes de executar usando um conjunto volátil de processadores, oferecendo urna melhor utilização dos recursos. Este comportamento adaptativo é conhecido corno maleabilidade. Outro exemplo vem da constante evolução das arquiteturas multi-core, as quais aumentam o número de cores em seus chips a cada nova geração. Adaptabilidade é a chave para permitir que os programas paralelos sejam portáveis de uma máquina a outra. Assim. os programas paralelos são capazes de adaptar a extração do paralelismo de acordo com o grau de paralelismo específico da arquitetura alvo. Este comportamento pode ser visto como um caso particular de evolutividade. Nesse sentido, esta tese está focada em: (i) maleabilidade para adaptar a execução das aplicações paralelas às mudanças na disponibilidade dos processadores; e (ii) evolutividade para adaptar a extração do paralelismo de acordo com propriedades da arquitetura e dos dados de entrada. Portanto, a questão remanescente é "Como prover e suportar aplicações adaptativas?". Esta tese visa responder tal questão com base no MPI (Message-Passing Interface), o qual é a API paralela padrão para HPC em ambientes distribuídos. Nosso trabalho baseia-se nas características do MPI-2 que permitem criar processos em tempo de execução, dando alguma flexibilidade às aplicações MPI. Aplicações MPI maleáveis usam a criação dinâmica de processos para expandir-se nas ações de crescimento (para usar processadores extras). As ações de diminuição (para liberar processadores) finalizam os processos MPI que executam nos processadores requeridos, preservando os dados da aplicação. Note que as aplicações maleáveis requerem suporte do ambiente de execução, uma vez que precisam ser notificadas sobre a disponibilidade dos processadores. Aplicações MPI evolutivas seguem o paradigma do paralelismo de tarefas explícitas para permitir adaptação em tempo de execução. Assim, a criação dinâmica de processos é usada para extrair o paralelismo, ou seja, para criar novas tarefas MPI sob demanda. Para prover tais aplicações nós definimos tarefas MPI abstratas, implementamos a sincronização entre elas através da troca de mensagens, e propusemos uma abordagem para ajustar a granularidade das tarefas MPI, visando eficiência em ambientes distribuídos. Os resultados experimentais validaram nossa hipótese de que aplicações adaptativas podem ser providas usando características do MPI-2. Adicionalmente, esta tese identificou os requisitos rio nível do ambiente de execução para suportá-las em clusters. Portanto, as aplicações MPI maleáveis melhoraram a utilização de recursos de clusters; e as aplicações de tarefas explícitas adaptaram a extração do paralelismo de acordo com a arquitetura alvo. mostrando que este paradigma também é eficiente em ambientes distribuídos. / Currently, adaptability is a desired feature in parallel applications. For instante, the increasingly number of user competing for resources of the parallel architectures causes dynamic changes in the set of available processors. Adaptive applications are able to execute using a set of volatile processors, providing better resource utilization. This adaptive behavior is known as malleability. Another example comes from the constant evolution of the multi-core architectures, which increases the number of cores to each new generation of chips. Adaptability is the key to allow parallel programs portability from one multi-core machine to another. Thus, parallel programs can adapt the unfolding of the parallelism to the specific degree of parallelism of the target architecture. This adaptive behavior can be seen as a particular case of evolutivity. In this sense, this thesis is focused on: (i) malleability to adapt the execution of parallel applications as changes in processors availability; and (ii) evolutivity to adapt the unfolding of the parallelism at runtime as the architecture and input data properties. Thus, the open issue is "How to provide and support adaptive applications?". This thesis aims to answer this question taking into account the MPI (Message-Passing Interface), which is the standard parallel API for HPC in distributed-memory environments. Our work is based on MPI-2 features that allow spawning processes at runtime. adding some fiexibility to the MPI applications. Malleable MPI applications use dynamic process creation to expand themselves in growth action (to use further processors). The shrinkage actions (to release processors) end the execution of the MPI processes on the required processors in such a way that the application's data are preserved. Notice that malleable applications require a runtime environment support to execute, once they must be notified about the processors availability. Evolving MPI applications follow the explicit task parallelism paradigm to allow their runtime adaptation. Thus, dynamic process creation is used to unfold the parallelism, i.e., to create new MPI tasks on demand. To provide these applications we defined the abstract MPI tasks, implemented the synchronization among these tasks through message exchanges, and proposed an approach to adjust MPI tasks granularity aiming at efficiency in distributed-memory environments. Experimental results validated our hypothesis that adaptive applications can be provided using the MPI-2 features. Additionally, this thesis identifies the requirements to support these applications in cluster environments. Thus, malleable MPI applications were able to improve the cluster utilization; and the explicit task ones were able to adapt the unfolding of the parallelism to the target architecture, showing that this programming paradigm can be efficient also in distributed-memory contexts. Mpi Processamento paralelo Processamento : Alto desempenho MPI Adaptability Malleability Explicit task parallelism
138	Um mecanismo de busca especulativa de múltiplos fluxos de instruções / A multistreamed speculative instruction fetch mechanism Santos, Rafael Ramos dos January 1997 (has links) Este trabalho apresenta um novo modelo de busca especulativa de múltiplos fluxos de instruções em arquiteturas superescalares. A avaliação de desempenho de uma arquitetura superescalar com esta característica é também apresentada como forma de validar o modelo proposto e comparar seu desempenho frente a uma arquitetura superescalar real. O modelo em questão pretende eliminar a latência de busca de instruções introduzida pela ocorrência de comandos de desvio em pipelines superescalares. O desempenho de uma arquitetura superescalar dotada de escalonamento dinâmico de instruções, previsão de desvios e execução especulatva é bastante inferior ao desempenho máximo teórico esperado. Como demonstrado em outros trabalhos, isto ocorre devido às constantes quebras de fluxo, derivadas de instruções de desvio, e do conseqüente esvaziamento da fila de instruções. O emprego desta técnica permite encadear instruções pertencentes a diferentes fluxos lógicos, logo após a identificação de uma instrução de desvio, disponibilizando um maior número de instruções ao mecanismo de escalonamento dinâmico e diminuindo o número de ciclos com despacho nulo devido as quebras de fluxo. Algumas considerações sobre a implementação do modelo descrito são apresentadas ao final do trabalho assim como sugestões para trabalhos futuros. / This work presents a new model to fetch instructions along multiple streams in superscalar pipelines. Also, the performance evaluation of a superscalar architecture including this feature is presented in order to validate the model and to compare its performance with a real superscalar architecture. The proposed technique intents to eliminate the instruction fetch latency introduced by branch instructions in superscalar pipelines. The performance delivered by a superscalar architecture which incorporate dynamic instruction scheduling, branch prediction and speculative execution is not the expected one which should be at least proportional to the number of functional units. Related works have shown that constant stream breaks caused by disruptions in the sequential flow of control reduce the amount of instructions into the instruction queue. This technique allows instruction fetch through different logic streams, as soon as the branch instruction has been detected during the fetch. The scheduler needs a large instruction window to be able to schedule efficiently consequently the instructions window should hold as many instructions as possible to allow an efficient schedule. The improvement realized by the proposed scheme is to increase the size of the instruction window by putting there more instructions avoiding interruptions on the event of branch occurrence. Some considerations about the implementation of this model are presented at final as well as suggestions to future works. Arquitetura de computadores Arquiteturas super escalares Pipelining Instruction-level parallelism Superscalar architectures
139	Uma interface de programação distribuída para aplicações em otimização combinatória / A Programming interface for distributed applications in combinatorial optimization Dantas, Allberson Bruno de Oliveira January 2011 (has links) DANTAS, Allberson Bruno de Oliveira. Uma interface de programação distribuída para aplicações em otimização combinatória. 2011. 86 f. Dissertação (Mestrado em ciência da computação)- Universidade Federal do Ceará, Fortaleza-CE, 2011. / Submitted by Elineudson Ribeiro (elineudsonr@gmail.com) on 2016-07-08T17:57:51Z No. of bitstreams: 1 2011_dis_abodantas.pdf: 805347 bytes, checksum: c9671608a7d738f843239856e546e201 (MD5) / Approved for entry into archive by Rocilda Sales (rocilda@ufc.br) on 2016-07-13T12:23:03Z (GMT) No. of bitstreams: 1 2011_dis_abodantas.pdf: 805347 bytes, checksum: c9671608a7d738f843239856e546e201 (MD5) / Made available in DSpace on 2016-07-13T12:23:03Z (GMT). No. of bitstreams: 1 2011_dis_abodantas.pdf: 805347 bytes, checksum: c9671608a7d738f843239856e546e201 (MD5) Previous issue date: 2011 / This work was motivated by the need of exploiting the potential of distributed paralelism in combinatorial optimization applications. propose a distributed programming interface, To achieve this goal, we in which we cherish two main requirements: e ciency and reuse. The rst stems from the need of HPC (High applications require maximum possible performance. Performance Computing) Therefore, we specify our interface as an extension of the MPI library, which is assumed to be e cient for distributed applications. The reuse requirement must make compatible two important features: asynchronism and collective operations. Asynchronism must be present at our interface, once most of combinatorial optimization applications have an asynchronous nature. Collective operations are features that should be available in the interface, so that they can be used by applications in their execution. In order reach the reuse requirement, we based this interface on the Event- and Pulse-driven Models of Distributed Computing, once they are asynchronous and allow the incorporation of collective operations. We implemented partially the interface de ned in this work. In order to validate the use of the inteface by combinatorial optimization applications, we selected two applications and implemented them using our interface. They are the Branch-and-Bound technique and the Maximum Stable Set Problem (MSSP). We also provide some experimental results. / Este trabalho foi motivado pela necessidade da exploração do potencial do paralelismo distribuído em aplicações em Otimização Combinatória. Para tanto, propomos uma interface de programação distribuída, na qual prezamos dois requisitos principais: eficiência e reuso. O primeiro advém da necessidade de aplicações de CAD exigirem máximo desempenho possível. Assim sendo, especificamos esta interface como uma extensão da biblioteca MPI, a qual é assumida como eficiente para aplicações distribuídas. O requisito reuso deve tornar compatíveis duas características importantes: assincronismo e operações coletivas. O assincronismo deve estar presente na interface, uma vez que as aplicações em Otimização Combinatória, em sua maioria, possuem uma natureza assíncrona. Operações coletivas são funcionalidades que devem estar disponíveis na interface, de modo que possam ser utilizadas por aplicações em suas execuções. Tendo em vista atender o requisito reuso, baseamos esta interface nos Modelos de Computação Distribuída Dirigidos por Eventos e por Pulsos, pois os mesmos são assíncronos e permitem a incorporação de operações coletivas. Implementamos parcialmente a inteface definida neste trabalho. Tendo em vista validar uso desta inteface por aplicações em Otimização Combinatória, selecionamos duas aplicações e as implementamos utilizando a interface. São elas a técnica Branch-and-Bound e o Problema do Conjunto Independente Máximo (CIM). Fornecemos também alguns resultados experimentais. Ciência da computação Paralelismo Algoritmos Distribuídos Otimização Combinatória Parallelism Distributed Algorithms Combinatorial Optimization
140	A transparent and energy aware reconfigurable multiprocessor platform for efficient ILP and TLP exploitation Rutzig, Mateus Beck January 2012 (has links) As the number of embedded applications is increasing, the current strategy of several companies is to launch a new platform within short periods, to execute the application set more efficiently, with low energy consumption. However, for each new platform deployment, new tool chains must come along, with additional libraries, debuggers and compilers. This strategy implies in high hardware redesign costs, breaks binary compatibility and results in a high overhead in the software development process. Therefore, focusing on area savings, low energy consumption, binary compatibility maintenance and mainly software productivity improvement, we propose the exploitation of Custom Reconfigurable Arrays for Multiprocessor System (CReAMS). CReAMS is composed of multiple adaptive reconfigurable systems to efficiently explore Instruction and Thread Level Parallelism (ILP and TLP) at hardware level, in a totally transparent fashion. Conceived as homogeneous organization, CReAMS shows a reduction of 37% in energy-delay product (EDP) compared to an ordinary multiprocessing platform when assuming the same chip area. When a variety of processor with different capabilities on exploiting ILP are coupled in a single die, conceiving CReAMS as a heterogeneous organization, performance improvements of up to 57% and energy savings of up to 36% are showed in comparison with the homogenous platform. In addition, the efficiency of the adaptability provided by CReAMS is demonstrated in a comparison to a multiprocessing system composed of 4- issue Out-of-Order SparcV8 processors, 28% of performance improvements are shown considering a power budget scenario. Multiprocessadores Microeletrônica Sistemas embarcados Multiprocessors Reconfigurable architectures Instruction and thread level parallelism

Search results