Global ETD Search

11	Modelo de migração de tarefas para MPSoCs baseados em redes-em-chip / Task migration model for NoC-based MPSoCs Barcelos, Daniel January 2008 (has links) Em relação a sistemas multiprocessados integrados em uma única pastilha (MPSoC), tanto a alocação dinâmica quanto a migração de tarefas são áreas de pesquisa recentes e abertas. Este artigo propõe uma organização de memória híbrida para sistemas com comunicação baseados em redes-em-chip, como maneira de minimizar a energia gasta durante a transferência de código decorrente de uma alocação ou migração de tarefa. É também introduzido um novo mecanismo de migração de tarefas, que, por sua vez, pode utilizar check-pointing ou outra técnica mais transparente. O aumento do uso de sistemas multiprocessados na computação embarcada torna importante a avaliação de diferentes organizações de memória. Enquanto memórias distribuídas proporcionam acessos mais rápidos, memórias compartilhadas tornam possível o compartilhamento de dados sem a interferência dos processadores. Nos experimentos realizados, foi focada a redução da energia gasta na comunicação em um contexto onde uma migração de tarefas ou uma alocação dinâmica fosse necessária. Os resultados indicam que, considerando a migração do código, a solução proposta apresenta melhor eficiência do que soluções unicamente distribuídas ou compartilhadas. Foi também verificado que, em alguns casos, a estratégia híbrida reduz os tempos de migração. Na solução apresentada, o código pode ser transferido do nó onde a tarefa era originalmente executada ou de uma memória posicionada no centro da rede. A escolha entre as duas opções é feita em tempo de execução de uma maneira intuitiva, sendo a escolha baseada na distância entre os nós envolvidos na transferência. Os resultados indicam que a organização proposta reduz a energia de transferência de código em 24% e 10% em média, se comparada, respectivamente, a soluções utilizando somente memória global ou distribuída. O modelo de migração de tarefas proposto é baseado na linguagem Java e na comunicação por troca de mensagens. Todo seu desenvolvimento se deu em software, não requerendo nenhuma modificação no sistema. O custo energético da migração foi então avaliado. Entende-se por custo energético a energia gasta nos processadores para envio e recebimento das mensagens e na estrutura de comunicação, uma rede-em-chip. Trabalhos já existentes não consideram o custo de migração, comparando apenas o arranjo inicial e final das tarefas no sistema. Este trabalho, entretanto, avalia todo o processo de migração. Através de experimentos, é estimado o tempo mínimo de execução da plataforma, como função do tamanho da tarefa e da distância entre os nós da rede, necessário para amortizar a energia gasta no processo de migração, considerando que os processadores utilizam a técnica de DVS para reduzir o consumo de acordo com suas cargas de processamento. / Regarding embedded Multi-processor Systems-on-Chip (MPSoCs), dynamic task allocation and task migration are still open research areas. This work proposes a hybrid memory organization for NoC-based systems as the way to minimize the energy spent during the code transfer when task migration or dynamic task allocation needs to be performed. It is also introduced a new flexible task migration mechanism, which can use check-pointing or a more transparent technique. The increasing use of multi-processor architectures in embedded computing makes it important to evaluate different options for memory organization. While distributed memory allows faster accesses, a global memory makes possible the sharing of data without processor interference. In the experiments, it is targeted the communication energy reduction in a context where task migration or dynamic task allocation is required. Results indicate that the proposed hybrid memory organization presents better efficiency than distributed- or global-only organizations regarding code migration. It is also noticed that, in some cases, the hybrid strategy reduces the task migration times. In the hybrid approach, the code can be transferred from the node where the task was originally running or from a memory positioned at the center of the system. The choice between the two options is done at runtime in a very intuitive way, based on the distance between the nodes involved on the transfer. Results are very encouraging and indicate that the proposed hybrid organization reduces the code transfer energy by 24% and 10% on average, as compared to global- and distributed-only memory organizations, respectively. The proposed migration model is based on the Java language and on message passing communication method. It is mainly software-based, and does not require any system modification. The energy cost of the migration process is then evaluated, i.e., the energy spent on the sending and receiving cores and on the communication structure, a wormhole-based Network-on-Chip (NoC). Previous works have compared system figures before and after task migration, while this study evaluates the whole migration process. Finally, it is derived the minimum execution time of the embedded system, as a function of the task size and of the distance between the cores on the NoC, that is required to amortize the energy spent on the migration process, considering that processors use Dynamic Voltage Scaling to reduce power consumption according to their current workloads. Microeletrônica Task migration Embedded systems Network-on-chip Multi-processor systems Distributed systems
12	Power Constrained Performance Optimization in Chip Multi-processors Ma, Kai 03 September 2013 (has links) No description available. Electrical Engineering Computer Engineering
13	A Scalable Approach to Multi-core Prototyping Newcomb, Jamie David 22 April 2008 (has links) In recent years, multi-core processors and multi-processor networks have grown in popularity as a solution to the limits on increasing clock speed, rising power consumption, and the nanometer manufacturing processes. Multi-core processors and multi-processor networks are seen as the next step in the advancement of computational capabilities by way of concurrent processing. However, parallel software design is difficult due to the immaturity of scalable architectures and software development environments for multi-core hardware. How should processors effectively and quickly pass information, with as little overhead as possible? What kind of communication architecture is best suited for parallelism? How can large-scale architectures be quickly produced, verified and properly utilized by software? Using commercially available FPGA development boards, Xilinx tools and components, this thesis offers a light-weight solution to these questions for effective, low-overhead, low-latency multi-core communication and fast prototyping of multi-processor networks for scalable processor arrays. / Master of Science multi-gigabit Aurora Xilinx Field programmable gate arrays multi-processor array multi-core
14	Automatic methods for distribution of data-parallel programs on multi-device heterogeneous platforms Moreń, Konrad 07 February 2024 (has links) This thesis deals with the problem of finding effective methods for programming and distributing data-parallel applications for heterogeneous multiprocessor systems. These systems are ubiquitous today. They range from embedded devices with low power consumption to high performance distributed systems. The demand for these systems is growing steadily. This is due to the growing number of data-intensive applications and the general growth of digital applications. Systems with multiple devices offer higher performance but unfortunately add complexity to the software development for such systems. Programming heterogeneous multiprocessor systems present several unique challenges compared to single device systems. The first challenge is the programmability of such systems. Despite constant innovations in programming languages and frameworks, they are still limited. They are either platform specific, like CUDA which supports only NVIDIA GPUs, or applied at a low level of abstraction, such as OpenCL. Application developers that design OpenCL programs must manually distribute data to the different devices and synchronize the distributed computations. These capabilities have an impact on the productivity of the developers. To reduce the programming complexity and the development time, this thesis introduces two approaches that automatically distribute and synchronize the data-parallel workloads. Another challenge is the multi-device hardware utilization. In contrast to single-device platforms, the application optimization process for a multi-device system is even more complicated. The application designers need to apply not only optimization strategies specific for a single-device architecture. They need also focus on the careful workload balancing between all the platform processors. For the balancing problem, this thesis proposes a method based on the platform model. The platform model is created with machine learning techniques. Using machine learning, this thesis builds automatically a reliable platform model, which is portable and adaptable to different platform setups, with a minimum manual involvement of the programmers. info:eu-repo/classification/ddc/006 ddc:006
15	Design and Analysis of Modular Architectures for an RNS to Mixed Radix Conversion Multi-processor Shivashankar, Nithin 27 October 2014 (has links) No description available. Computer Engineering Residue Number System Mixed Radix RNS to Mixed Radix Conversion Multi-processor FPGA parallelization pipelining Modular Inverse
16	Resource Optimized Scheduling For Enhanced Power Efficiency And Throughput On Chip Multi Processor Platforms Kundan, Shivam 01 May 2024 (has links) (PDF) The parallel nature of process execution on Chip Multi-Processors (CMPs) has boosted levels of application performance far beyond the capabilities of erstwhile single-core designs. Generally, CMPs offer improved performance by integrating multiple simpler cores onto a single die that share certain computing resources among them such as last-level caches, data buses, and main memory. This ensures architectural simplicity while also boosting performance for multi-threaded applications. However, a major trade-off associated with this approach is that concurrently executing applications incur performance degradation if their collective resource requirements exceed the total amount of resources available to the system. If dynamic resource allocation is not carefully considered, the potential performance gain from having multiple cores may be outweighed by the losses due to contention for allocation of shared resources. Additionally, CMPs with inbuilt dynamic voltage-frequency scaling (DVFS) mechanisms may try to compensate for the performance bottleneck by scaling to higher clock frequencies. For performance degradation due to shared-resource contention, this does not necessarily improve performance but does ensure a significant penalty on power consumption due to the quadratic relation of electrical power and voltage (P_dynamic ∝ V^2 * f).This dissertation presents novel methodologies for balancing the competing requirements of high performance, fairness of execution, and enforcement of priority, while also ensuring overall power efficiency of CMPs. Specifically, we (1) Analyze the problem of resource interference during concurrent process execution and propose two fine-grained scheduling methodologies for improving overall performance and fairness, (2) Develop an approach for enforcement of priority (i.e., minimum performance) for specific processes while avoiding resource starvation for others, and (3) Present a machine-learning approach for maximizing the power efficiency (performance-per-Watt) of CMPs through estimation of a workload's performance and power consumption limits at different clock frequencies.As modern computing workloads become increasingly dynamic, and computers themselves become increasingly ubiquitous, the problem of finding the ideal balance between performance and power consumption of CMPs is of particular relevance today, especially given the unprecedented proliferation of embedded devices for use in Internet-of-Things, edge computing, smart wearables, and even exotic experiments such as space probes comprised entirely of a CMP, sensors, and an antenna ("space chips"). Additionally, reducing power consumption while maintaining constant performance can contribute to addressing the growing problem of dark silicon. chip multi processor embedded systems performance per watt power-aware scheduling resource-aware scheduling shared resource contention
17	Optimisation multicritères et applications aux systèmes multi-processeurs embarqués / Multi-Criteria Optimization and its Application to Multi-Processor Embedded Systems Legriel, Julien 04 October 2011 (has links) Dans cette thèse nous développons de nouvelles techniques pour résoudre les problèmes d'optimisation multi-critère. Ces problèmes se posent naturellement dans de nombreux domaines d'application (sinon tous) où les choix sont évalués selon différents critères conflictuels (coûts et performance par exemple). Contrairement au cas de l'optimisation classique, de tels problèmes n'admettent pas en général un optimum unique mais un ensemble de solutions incomparables, aussi connu comme le front de Pareto, qui représente les meilleurs compromis possibles entre les objectifs conflictuels. La contribution majeure de la thèse est le développement d'algorithmes pour trouver ou approximer ces solutions de Pareto pour les problèmes combinatoires difficiles. Plusieurs problèmes de ce type se posent naturellement lors du processus de placement et d'ordonnancement d'une application logicielle sur une architecture multi-coeur comme P2012, qui est actuellement développé par STMicroelectronics. / In this thesis we develop new techniques for solving multi-criteria optimization problems. Such problems arise naturally in many (if not all) application domains where choices are evaluated according to two or more conflicting criteria such as price vs. performance. Unlike ordinary optimization, such problems typically do not admit a unique optimum but a set of incomparable solutions, also known as the Pareto Front, which represent the best possible trade-offs between the conflicting goals. The major contribution of the thesis is the development of algorithms for finding or approximating these Pareto solutions for hard combinatorial problems that arise naturally in the process of mapping and scheduling application software on multi-core architectures such as P2012 which is currently being developed by ST Microelectronics. Optimisation multi-critère Systèmes multi-processeur SAT Ordonnancement Recherche stochastique locale Multi-criteria optimization Multi-processor systems SAT Scheduling Stochastic local search 004
18	Réseau sur puce sécurisé pour applications cryptographiques sur FPGA / Secure Network-on-Chip for cryptographic applications on FPGA Druyer, Rémy 26 October 2017 (has links) Que ce soit au travers des smartphones, des consoles de jeux portables ou bientôt des supercalculateurs, les systèmes sur puce (System-on-chip (SoC)) ont vu leur utilisation largement se répandre durant ces deux dernières décennies. Ce phénomène s’explique notamment par leur faible consommation de puissance au regard des performances qu’ils sont capables de délivrer, et du large panel de fonctions qu’ils peuvent intégrer. Les SoC s’améliorant de jour en jour, ils requièrent de la part des systèmes d’interconnexions qui supportent leurs communications, des performances de plus en plus élevées. Pour répondre à cette problématique les réseaux sur puce (Network-on-Chip (NoC)) ont fait leur apparition.En plus des ASIC, les circuit reconfigurables FPGA sont un des choix possibles lors de la réalisation d’un SoC. Notre première contribution a donc été de réaliser et d’étudier les performances du portage du réseau sur puce générique Hermes initialement conçu pour ASIC, sur circuit reconfigurable. Cela nous a permis de confirmer que l’architecture du système d’interconnexions doit être adaptée à celle du circuit pour pouvoir atteindre les meilleures performances possibles. Par conséquent, notre deuxième contribution a été la conception de l’architecture de TrustNoC, un réseau sur puce optimisé pour FPGA à hautes performances en latence, en fréquence de fonctionnement, et en quantité de ressources logiques occupées.Un autre aspect primordial qui concerne les systèmes sur puce, et plus généralement de tous les systèmes numériques est la sécurité. Notre dernière principale contribution a été d’étudier les menaces qui s’exercent sur les SoC durant toutes les phases de leur vie, puis de développer à partir d’un modèle de menaces, des mécanismes matériels de sécurité permettant de lutter contre des détournements d’IP, et des attaques logicielles. Nous avons également veillé à limiter au maximum le surcoût qu’engendre les mécanismes de sécurité sur les performances sur réseau sur puce. / Whether through smartphones, portable game consoles, or high performances computing, Systems-on-Chip (SoC) have seen their use widely spread over the last two decades. This can be explained by the low power consumption of these circuits with the regard of the performances they are able to deliver, and the numerous function they can integrate. Since SoC are improving every day, they require better performances from interconnects that support their communications. In order to address this issue Network-on-Chip have emerged.In addition to ASICs, FPGA circuits are one of the possible choices when conceiving a SoC. Our first contribution was therefore to perform and study the performance of Hermes NoC initially designed for ASIC, on reconfigurable circuit. This allowed us to confirm that the architecture of the interconnection system must be adapted to that of the circuit in order to achieve the best possible performances. Thus, our second contribution was to design TrustNoC, an optimized NoC for FPGA platform, with low latency, high operating frequency, and a moderate quantity of logical resources required for implementation.Security is also a primordial aspect of systems-on-chip, and more generally, of all digital systems. Our latest contribution was to study the threats that target SoCs during all their life cycle, then to develop and integrate hardware security mechanisms to TrustNoC in order to counter IP hijacking, and software attacks. During the design of security mechanisms, we tried to limit as much as possible the overhead on NoC performances. Fpga Réseau sur puce Sécurité Fpga Network-On-Chip Security
19	Hiérarchie mémoire dans les systèmes intégrés multiprocesseurs construits autour de réseaux sur puce / Memory hierarchy in embedded multiprocessor system built around networks on chip Belhadj Amor, Hela 05 October 2017 (has links) Les systèmes parallèles de type multi/pluri-cœurs permettant d'obtenir une grande puissance de calcul à bas coût énergétique sont de nos jours une réalité. Néanmoins, l'exploitation des performances de ces architectures dépend de l'efficacité du système à gérer les accès aux données. Le but de nos travaux est d'améliorer l'efficacité de ces accès en exploitant les caractéristiques de l'architecture matérielle.Dans une première partie, nous proposons une nouvelle organisation de la hiérarchie des mémoires caches qui maximise l'utilisation de l'espace de stockage disponible à chaque niveau. Cette solution, basée sur les architectures à accès non uniforme au cache (NUCA), supporte les transferts inter et intra-niveau de la hiérarchie. Elle requiert un protocole de cohérence de cache qui s'adapte à ses spécifications.Certes, le transfert des données au niveau de la hiérarchie est aussi un déterminant de la performance du système. Dans une seconde partie, nous prenons en compte les besoins de communication spécifiques du protocole. Nous proposons un réseau virtualisé comme support de communication ad-hoc afin de gérer le trafic de cohérence à moindre coût. Ce dernier relie les caches d'un même niveau pour supporter les transferts intra-niveaux, qui sont une spécificité de notre protocole, en vue de réduire la latence moyenne d'accès. / Multi/many-cores parallel systems for high-power computing at low energy costs are nowadays a reality. However, exploiting the performance of these architectures depends on the efficiency of the system in managing data accesses. The aim of our work is to improve the efficiency of these accesses by exploiting the hardware architecture characteristics.In a first part, we propose a new cache hierarchy organization that aims at maximizing the use of the available storage space at each level. This solution, based on non-uniform cache access architectures (NUCA), supports inter and intra-level transfers of the hierarchy. It requires a cache coherency protocol that suits its specifications.Obviously, the transfer of data in the hierarchy is also a determinant of the system performance. In a second part, we consider the specific communication needs of the protocol. We suggest the use of a virtualized network as an ad-hoc communication medium to manage consistency traffic at a lower cost. It links the caches of the same level to support intra-level transfers, which are a specificity of our protocol, in order to reduce the average access latency. Hiérarchie mémoire Cohérence Réseau sur Puce (NoC) Non Uniform Cache Architecture (NUCA) Multi-Processor System-On-Chip (MPSoC Memory hierarchy Consistency Network-On-Chip (NoC) Non Uniform Cache Architecture (NUCA) 004
20	Design and Development of a CubeSat Hardware Architecture with COTS MPSoC using Radiation Mitigation Techniques Vasudevan, Siddarth January 2020 (has links) CubeSat missions needs components that are tolerant against the radiation in space. The hardware components must be reliable, and it must not compromise the functionality on-board during the mission. At the same time, the cost of hardware and its development should not be high. Hence, this thesis discusses the design and development of a CubeSat architecture using a Commercial Off-The- Shelf (COTS) Multi-Processor System on Chip (MPSoC). The architecture employs an affordable Rad-Hard Micro-Controller Unit as a Supervisor for the MPSoC. Also, it uses several radiation mitigation techniques such as the Latch-up protection circuit to protect it against Single-Event Latch-ups (SELs), Readback scrubbing for Non- Volatile Memories (NVMs) such as NOR Flash and Configuration scrubbing for the FPGA present in the MPSoC to protect it against Single-Event Upset (SEU)s, reliable communication using Cyclic Redundancy Check (CRC) and Space packet protocol. Apart from such functionalities, the Supervisor executes tasks such as Watchdog that monitors the liveliness of the applications running in the MPSoC, data logging, performing Over-The-Air Software/Firmware update. The thesis work implements functionalities such as Communication, Readback memory scrubbing, Configuration scrubbing using SEM-IP, Watchdog, and Software/Firmware update. The execution times of the functionalities are presented for the application done in the Supervisor. As for the Configuration scrubbing that was implemented in Programmable Logic (PL)/FPGA, results of area and latency are reported. / CubeSat-uppdrag behöver komponenter som är toleranta mot strålningen i rymden. Maskinvarukomponenterna måste vara pålitliga och funktionaliteten ombord får inte äventyras under uppdraget. Samtidigt bör kostnaden för hårdvara och dess utveckling inte vara hög. Därför diskuterar denna avhandling design och utveckling av en CubeSatarkitektur med hjälp av COTS (eng. Custom-off-The-Shelf) MPSoC (eng. Multi Processor System-on-Chip). Arkitekturen använder en prisvärd strålningshärdad (eng. Rad-Hard) Micro-Controller Unit(MCU) som Övervakare för MPSoC:en och använder också flera tekniker för att begränsa strålningens effekter såsom kretser för att skydda kretsen från s.k. Single Event Latch-Ups (SELs), återläsningsskrubbning för icke-volatila minnen (eng. Non-Volatile Memories) NVMs som NOR Flash och skrubbning av konfigurationsminnet skrubbning för FPGA:er i MPSoC:en för att skydda dem mot Single-Event Upsets (SEUs), och tillhandahålla pålitlig kommunikation mha CRC och Space Packet Protocol. Bortsett från sådana funktioner utför Övervakaren uppgifter som Watchdog för att övervaka att applikationerna som körs i MPSoC:en fortfarande är vid liv, dataloggning, och Over- the-Air-uppdateringar av programvaran/Firmware. Examensarbetet implementerar funktioner såsom kommunikation, återläsningsskrubbning av minnet, konfigurationsminnesskrubbning mha SEM- IP, Watchdog och uppdatering av programvara/firmware. Exekveringstiderna för utförandet av funktionerna presenteras för den applikationen som körs i Övervakaren. När det gäller konfigurationsminnesskrubbningen som implementerats i den programmerbara logiken i FPGA:n, rapporteras area och latens. Single-Event Upset Single-Event Latch-up Cyclic Redundancy Check Scrubbing Radiation hardened Micro-Controller Unit Error Injection Space packet protocol NOR Flash Magnetoresistive RAM Computer and Information Sciences Data- och informationsvetenskap

Search results