Global ETD Search

1	Modèle de calcul et d'exécution pour des applications flots de données dynamiques avec contraintes temps réel / A model of programming languages for dynamic real-time streaming applications Do, Xuan Khanh 17 October 2016 (has links) Il y a un intérêt croissant pour le développement d'applications sur les plates-formes multiprocesseurs homo- et hétérogènes en raison de l'extension de leur champ d'application et de l'apparition des puces many-core, telles que Kalray MPPA-256 (256 cœurs) ou TEGRA X1 de NVIDIA (256 GPU et 8 cœurs 64 bits CPU). Étant donné l'ampleur de ces nouveaux systèmes massivement parallèles, la mise en œuvre des applications sur ces plates-formes est difficile à cause de leur complexité, qui tend à augmenter, et de leurs exigences strictes à la fois qualitatives (robustesse, fiabilité) et quantitatives (débit, consommation d’énergie). Dans ce contexte, les Modèles de Calcul (MdC) flot de données ont été développés pour faciliter la conception de ces applications. Ces MdC sont par définition composées de filtres qui échangent des flux de données via des liens de communication. Ces modèles fournissent une représentation intuitive des applications flot de données, tout en exposant le parallélisme de tâches de l’application. En outre, ils fournissent des capacités d'analyse statique pour la vivacité et l’exécution en mémoire bornée. Cependant, de nouvelles applications de signalisation et de traitement des médias complexes présentent souvent plusieurs défis majeurs qui ne correspondent pas aux restrictions des modèles flot de données statiques classiques: 1) Comment fournir des services garantis contre des interférences inévitables qui peuvent affecter des performances temps réel ?, et 2) Comment ces langages flot de données qui sont souvent trop statiques pourraient répondre aux besoins des applications embarquées émergentes, qui nécessitent une exécution plus dynamique et plus dépendante du contexte ? Pour faire face au premier défi, nous proposons un ordonnancement hybride, nommé Self-Timed Periodic (STP), qui relie des MdC flot de données classiques et des modèles de tâches temps réel. Cet ordonnancement peut aussi être considéré comme un modèle d'exécution combinant l'ordonnancement classique dirigé seulement par les contraintes de dépendance d'exécution appelé Self-Timed Scheduling (STS), évalué comme le plus approprié pour des applications modélisées sous forme de graphes flot de données, avec l'ordonnancement périodique: STS améliore les indicateurs de performance des programmes, tandis que le modèle périodique capture les aspects de synchronisation. Nous avons évalué la performance de notre ordonnancement sur un ensemble de 10 applications et nous avons constaté que dans la plupart des cas, notre approche donne une amélioration significative de la latence par rapport à un ordonnancement purement périodique ou Strictly Periodic Scheduling (SPS), et rivalise bien avec STS. Les expériences montrent également que, pour presque tous les cas de test, STP donne un débit optimal. Sur la base de ces résultats, nous avons évalué la latence entre le temps d'initiation de tous les deux acteurs dépendants, et nous avons introduit une approche basée sur la latence pour le traitement des flux à tolérance de pannes modélisée comme un graphe Cyclo-Static Dataflow (CSDF), dans le but d'aborder des problèmes de défaillance de nœud ou de réseau… / There is an increasing interest in developing applications on homo- and heterogeneous multiprocessor platforms due to their broad availability and the appearance of many-core chips, such as the MPPA-256 chip from Kalray (256 cores) or TEGRA X1 from NVIDIA (256 GPU and 8 64-bit CPU cores). Given the scale of these new massively parallel systems, programming languages based on the dataflow model of computation have strong assets in the race for productivity and scalability, meeting the requirements in terms of parallelism, functional determinism, temporal and spatial data reuse in these systems. However, new complex signal and media processing applications often display several major challenges that do not fit the classical static restrictions: 1) How to provide guaranteed services against unavoidable interferences which can affect real-time performance?, and 2) How these streaming languages which are often too static could meet the needs of emerging embedded applications, such as context- and data-dependent dynamic adaptation? To tackle the first challenge, we propose and evaluate an analytical scheduling framework that bridges classical dataflow MoCs and real-time task models. In this framework, we introduce a new scheduling policy noted Self-Timed Periodic (STP), which is an execution model combining Self-Timed scheduling (STS), considered as the most appropriate for streaming applications modeled as data-flow graphs, with periodic scheduling: STS improves the performance metrics of the programs, while the periodic model captures the timing aspects. We evaluate the performance of our scheduling policy for a set of 10 real-life streaming applications and find that in most of the cases, our approach gives a significant improvement in latency compared to the Strictly Periodic Schedule (SPS), and competes well with STS. The experiments also show that, for more than 90% of the benchmarks, STP scheduling results in optimal throughput. Based on these results, we evaluate the latency between initiation times of any two dependent actors, and we introduce a latency-based approach for fault-tolerant stream processing modeled as a Cyclo-Static Dataflow (CSDF) graph, addressing the problem of node or network failures. For the second challenge, we introduce a new dynamic Model of Computation (MoC), called Transaction Parameterized Dataflow (TPDF), extending CSDF with parametric rates and a new type of control actor, channel and port to express dynamic changes of the graph topology and time-triggered semantics. TPDF is designed to be statically analyzable regarding the essential deadlock and boundedness properties, while avoiding the aforementioned restrictions of decidable dataflow models. Moreover, we demonstrate that TPDF can be used to accurately model task timing requirements in a great variety of situations and introduce a static scheduling heuristic to map TPDF to massively parallel embedded platforms. We validate the model and associated methods using a set of realistic applications and random graphs, demonstrating significant buffer size and performance improvements (e.g., throughput) compared to state of the art models including Cyclo-Static Dataflow (CSDF) and Scenario-Aware Dataflow (SADF). Many-Core Parallélisme Dataflow Systèmes embarqués Applications dynamiques Ordonnancement Dataflow Many-Core Scheduling 004
2	Оптимизација CFD симулације на групама вишејезгарних хетерогених архитектура / Optimizacija CFD simulacije na grupama višejezgarnih heterogenih arhitektura / Optimization of CFD simulations on groups of many-core heterogeneous architectures Tekić Jelena 07 October 2019 (has links) <p>Предмет  истраживања  тезе  је  из области  паралелног  програмирања,<br />имплементација  CFD  (Computational Fluid  Dynamics)  методе  на  више<br />хетерогених  вишејезгарних  уређаја истовремено.  У  раду  је  приказано<br />неколико  алгоритама  чији  је  циљ убрзање  CFD  симулације  на персоналним  рачунарима.  Показано је  да  описано  решење  постиже задовољавајуће  перформансе  и  на HPC  уређајима  (Тесла  графичким картицама).  Направљена  је симулација  у  микросервис архитектури  која  је  портабилна  и флексибилна и додатно олакшава рад на персоналним рачунарима.</p> / <p>Predmet  istraživanja  teze  je  iz oblasti  paralelnog  programiranja,<br />implementacija  CFD  (Computational Fluid  Dynamics)  metode  na  više<br />heterogenih  višejezgarnih  uređaja istovremeno.  U  radu  je  prikazano<br />nekoliko  algoritama  čiji  je  cilj ubrzanje  CFD  simulacije  na personalnim  računarima.  Pokazano je  da  opisano  rešenje  postiže zadovoljavajuće  performanse  i  na HPC  uređajima  (Tesla  grafičkim karticama).  Napravljena  je simulacija  u  mikroservis arhitekturi  koja  je  portabilna  i fleksibilna i dodatno olakšava rad na personalnim računarima.</p> / <p>The  case  study  of  this  dissertation belongs  to  the  field  of  parallel programming,  the  implementation  of CFD  (Computational  Fluid  Dynamics) method  on  several  heterogeneous multiple  core  devices  simultaneously. The  paper  presents  several  algorithms aimed  at  accelerating  CFD  simulation on common computers. Also it has been shown  that  the  described  solution achieves  satisfactory  performance  on<br />HPC  devices  (Tesla  graphic  cards). Simulation is  created    in  micro-service architecture that is portable and flexible and  makes  it  easy  to  test  CFD<br />simulations on common computers.</p>
3	Architecture-aware Task-scheduling : A thermal approach Podobas, Artur, Brorsson, Mats January 2011 (has links) Current task-centric many-core schedulers share a “naive” view of processor architecture; a view that does not care about its thermal, architectural or power consuming properties. Future processor will be more heterogeneous than what we see today, and following Moore’s law of transistor doubling, we foresee an increase in power consumption and thus temperature. Thermal stress can induce errors in processors, and so a common way to counter this is by slowing the processor down; something task-centric schedulers should strive to avoid. The Thermal-Task-Interleaving scheduling algorithm proposed in this paper takes both the application temperature behavior and architecture into account when making decisions. We show that for a mixed workload, our scheduler outperforms some of the standard, architecture-unaware scheduling solutions existing today. / QC 20120215 OpenMP Tasks Power Thermal Temperature Scheduling Many-core Tilera
4	Acceleration of Parallel Applications by Moving Code Instead of Data Farahaninia, Farzad January 2014 (has links) After the performance improvement rate in single-core processors decreased in 2000s, most CPU manufacturers have steered towards parallel computing. Parallel computing has been in the spotlight for a while now. Several hardware and software innovations are being examined and developed, in order to improve the efficiency of parallel computing. Signal processing is an important application area of parallel computing, and this makes parallel computing interesting for Ericsson AB, a company that among other business areas, is mainly focusing on communication technologies. The Ericsson baseband research team at Lindholmen, has been developing a small, experimental basic operating system (BOS) for research purposes within the area of parallel computing. One major overhead in parallel applications, which increases the latency in applications, is the communication overhead between the cores. It had been observed that in some signal processing applications, it is common for some tasks of the parallel application to have a large data size but a small code size. The question was risen then, could it be beneficial to move code instead of data in such cases, to reduce the communication overhead. In this thesis work the gain and practical difficulties of moving code are investigated through implementation. A method has been successfully developed and integrated into BOS to move the code between the cores on a multi-core architecture. While it can be a very specific class of applications in which it is useful to move code, it is shown that it is possible to move the code between the cores with zero extra overhead. Many Core Parallel Applications Epiphany LTE Signal Processing Embedded
5	Scalably Verifiable Cache Coherence Zhang, Meng January 2013 (has links) <p>The correctness of a cache coherence protocol is crucial to the system since a subtle bug in the protocol may lead to disastrous consequences. However, the verification of a cache coherence protocol is never an easy task due to the complexity of the protocol. Moreover, as more and more cores are compressed into a single chip, there is an urge for the cache coherence protocol to have higher performance, lower power consumption, and less storage overhead. People perform various optimizations to meet these goals, which unfortunately, further exacerbate the verification problem. The current situation is that there are no efficient and universal methods for verifying a realistic cache coherence protocol for a many-core system. </p><p>We, as architects, believe that we can alleviate the verification problem by changing the traditional design paradigm. We suggest taking verifiability as a first-class design constraint, just as we do with other traditional metrics, such as performance, power consumption, and area overhead. To do this, we need to incorporate verification effort in the early design stage of a cache coherence protocol and make wise design decisions regarding the verifiability. Such a protocol will be amenable to verification and easier to be verified in a later stage. Specifically, we propose two methods in this thesis for designing scalably verifiable cache coherence protocols. </p><p>The first method is Fractal Coherence, targeting verifiable hierarchical protocols. Fractal Coherence leverages the fractal idea to design a cache coherence protocol. The self-similarity of the fractal enables the inductive verification of the protocol. Such a verification process is independent of the number of nodes and thus is scalable. We also design example protocols to show that Fractal Coherence protocols can attain comparable performance compared to a traditional snooping or directory protocol. </p><p>As a system scales hierarchically, Fractal Coherence can perfectly solve the verification problem of the implemented cache coherence protocol. However, Fractal Coherence cannot help if the system scales horizontally. Therefore, we propose the second method, PVCoherence, targeting verifiable flat protocols. PVCoherence is based on parametric verification, a widely used method for verifying the coherence of a flat protocol with infinite number of nodes. PVCoherence captures the fundamental requirements and limitations of parametric verification and proposes a set of guidelines for designing cache coherence protocols that are compatible with parametric verification. As long as designers follow these guidelines, their protocols can be easily verified. </p><p>We further show that Fractal Coherence and PVCoherence can also facilitate the verification of memory consistency, another extremely challenging problem. One piece of previous work proves that the verification of memory consistency can be decomposed into three steps. The most complex and non-scalable step is the verification of the cache coherence protocol. If we design the protocol following the design methodology of Fractal Coherence or PVCoherence, we can easily verify the cache coherence protocol and overcome the biggest obstacle in the verification of memory consistency. </p><p>As system expands and cache coherence protocols get more complex, the verification problem of the protocol becomes more prominent. We believe it is time to reconsider the traditional design flow in which verification is totally separated from the design stage. We show that by incorporating the verifiability in the early design stage and designing protocols to be scalably verifiable in the first place, we can greatly reduce the burden of verification. Meanwhile, we perform various experiments and show that we do not lose benefits in performance as well as in other metrics when we obtain the correctness guarantee.</p> / Dissertation Computer engineering Cache Coherence Many-core Scalability Verification
6	PERFORMANCE-AWARE RESOURCE MANAGEMENT OF MULTI-THREADED APPLICATIONS FOR MANY-CORE SYSTEMS Olsen, Daniel 01 August 2016 (has links) Future integrated systems will contain billions of transistors, composing tens to hundreds of IP cores. Modern computing platforms take advantage of this manufacturing technology advancement and are moving from Multi-Processor Systems-on-Chip (MPSoC) towards Many-Core architectures employing high numbers of processing cores. These hardware changes are also driven by application changes. The main characteristic of modern applications is the increased parallelism and the need for data storage and transfer. Resource management is a key technology for the successful use of such many-core platforms. The thread to core mapping can deal with the run-time dynamics of applications and platforms. Thus, the efficient resource management enables the efficient usage of the platform resources. maximizing platform utilization, minimizing interconnection network communication load and energy budget. In this thesis, we present a performance-aware resource management scheme for many- core architectures. Particular, the developed framework takes as input parallel applications and performs an application profiling. Based on that profile information, a thread to core mapping algorithm finds (i) the appropriate number of threads that this application will have in order to maximize the utilization of the system and (ii) the best mapping for maximizing the performance of the application under the selected number of threads. In order to validate the proposed algorithm, we used and extended the Sniper, state-of-art, many-core simulator. Last, we developed a discrete event simulator, on top of Sniper simulator, in order to test and validate multiple scenarios faster. The results show that the the proposed methodology, achieves on average a gain of 23% compared to a performance oriented mapping presented and each application completes its workload 18% faster on average. Many-core Systems Mapping Multi-Threaded Applications Resource management
7	SIMPLE POOL ARCHITECTURE FOR APPLICATION RESOURCE ALLOCATION IN MANY-CORE SYSTEMS Koduri, Jayasimha sai 01 December 2017 (has links) The technology push by Moore's law brings a paradigm shift in the adaption of many core systems which replace high frequency superscalar processors with many simpler ones. On the software side, in order to utilize the available computational power, applications are following the high performance parallel/multi-threading model. Thus, many-core systems raise the challenges of resource allocation and fragmentation making necessary ecient run-time resource management techniques. In this thesis, we propose SPA, a Simple Pool Architecture for managing resource allocation in many-core systems. The proposed framework follows a distributed approach in which cores are organized into clusters and multiple clusters form a pool. Clusters are created based on system's characteristics and the allocation of cores is performed in a distributed manner so as to increase resource utilization and reduce fragmentation. Specifically, SPA is responsible (i) to generate the pool-based structure and organize cores into clusters depending on the NoC architecture; (ii) to serve, at run-time, the needs of multithreaded applications, in terms or processing cores; and (iii) to allocate resources in order to take advantage of spatial features, shared resources and reduce fragmentation. Experimental results show that SPA produces on average 15% better application response time while waiting time is reduced by 45% on average compared to other state-of-art methodologies. abstraction architecture many-core systems NoC resource allocation resource utilization
8	Spatial Isolation against Logical Cache-based Side-Channel Attacks in Many-Core Architectures / Isolation physique contre les attaques logiques par canaux cachés basées sur le cache dans des architectures many-core Méndez Real, Maria 20 September 2017 (has links) L’évolution technologique ainsi que l’augmentation incessante de la puissance de calcul requise par les applications font des architectures ”many-core” la nouvelle tendance dans la conception des processeurs. Ces architectures sont composées d’un grand nombre de ressources de calcul (des centaines ou davantage) ce qui offre du parallélisme massif et un niveau de performance très élevé. En effet, les architectures many-core permettent d’exécuter en parallèle un grand nombre d’applications, venant d’origines diverses et de niveaux de sensibilité et de confiance différents, tout en partageant des ressources physiques telles que des ressources de calcul, de mémoire et de communication. Cependant, ce partage de ressources introduit également des vulnérabilités importantes en termes de sécurité. En particulier, les applications sensibles partageant des mémoires cache avec d’autres applications, potentiellement malveillantes, sont vulnérables à des attaques logiques de type canaux cachés basées sur le cache. Ces attaques, permettent à des applications non privilégiées d’accéder à des informations secrètes sensibles appartenant à d’autres applications et cela malgré des méthodes de partitionnement existantes telles que la protection de la mémoire et la virtualisation. Alors que d’importants efforts ont été faits afin de développer des contremesures à ces attaques sur des architectures multicoeurs, ces solutions n’ont pas été originellement conçues pour des architectures many-core récemment apparues et nécessitent d’être évaluées et/ou revisitées afin d’être applicables et efficaces pour ces nouvelles technologies. Dans ce travail de thèse, nous proposons d’étendre les services du système d’exploitation avec des mécanismes de déploiement d’applications et d’allocation de ressources afin de protéger les applications s’exécutant sur des architectures many-core contre les attaques logiques basées sur le cache. Plusieurs stratégies de déploiement sont proposées et comparées à travers différents indicateurs de performance. Ces contributions ont été implémentées et évaluées par prototypage virtuel basé sur SystemC et sur la technologie ”Open Virtual Platforms” (OVP). / The technological evolution and the always increasing application performance demand have made of many-core architectures the necessary new trend in processor design. These architectures are composed of a large number of processing resources (hundreds or more) providing massive parallelism and high performance. Indeed, many-core architectures allow a wide number of applications coming from different sources, with a different level of sensitivity and trust, to be executed in parallel sharing physical resources such as computation, memory and communication infrastructure. However, this resource sharing introduces important security vulnerabilities. In particular, sensitive applications sharing cache memory with potentially malicious applications are vulnerable to logical cache-based side-channel attacks. These attacks allow an unprivileged application to access sensitive information manipulated by other applications despite partitioning methods such as memory protection and virtualization. While a lot of efforts on countering these attacks on multi-core architectures have been done, these have not been designed for recently emerged many-core architectures and require to be evaluated, and/or revisited in order to be practical for these new technologies. In this thesis work, we propose to enhance the operating system services with security-aware application deployment and resource allocation mechanisms in order to protect sensitive applications against cached-based attacks. Different application deployment strategies allowing spatial isolation are proposed and compared in terms of several performance indicators. Our proposal is evaluated through virtual prototyping based on SystemC and Open Virtual Platforms(OVP) technology. Architectures many-core Multi-core architectures Open Virtual Platforms 005.8
9	Threaded Dynamic Memory Management in Many-Core Processors Herrmann, Edward C. 03 August 2010 (has links) No description available. Electrical Engineering dynamic memory threaded library many-core
10	Performance and Power Optimization of Parallel Discrete Event Simulations Using DVFS Child, Ryan 08 October 2012 (has links) No description available. Computer Engineering DVFS Time Warp parallel simulation many-core processors

Search results