Global ETD Search

1	Customer Oriented Design And Resource Utilisation (CODARU) Mousavi Khalkhali, Alireza January 2000 (has links) No description available. 670.285
2	Design of an Asynchronous Ring Bus Architecture for Multi-Core Systems Lei, Kin-fong 18 August 2010 (has links) In the multi-core systems, the data transfer between cores becomes a major challenge. The on-chip interconnect networks should be low latency, high throughput, scalability, better router or arbitration strategy, and low power consumption. An asynchronous ring bus, which is 33 bit width, adopting dual-rail single-track data protocol is proposed in this thesis. It provides not only robust but also high-speed asynchronous circuits condition. Owing to asynchronous circuits design, there are different transfer times in different hop counts. The shorter the distance is, the faster the data can be transferred. Unlink the synchronous ring bus, the bus frequency must be limited by the longest hop count latency. On the other hand, the transmission time of asynchronous circuits will not be held up by the longest distance even though the number of core is increased. For providing higher throughput, multiple cores which are able to access the bus simultaneously make a direct connection between each other. In bus arbitration, distribution arbiter is adopted to arbitrate the right to use the bus and solve the collision. Finally, the system performance in different arbitration strategies has been estimated in TSMC 0.18£gm process in this thesis. The transmission time of the shortest distance is 1.5 ns approximately, and the longest distance first has a better performance in different arbitration strategies. On-Chip Interconnect Networks Asynchronous Ring Bus Multi-Core Systems
3	Asynchronous Ring Network Mechanism with A Fair Arbitration Strategy for Network on Chip Wong, Chen-Ang 14 August 2012 (has links) The multi-core systems are usually implemented on homogeneous or heterogeneous cores, in order to design the better NOC (network on chip), it must consider the performance, scalability, simplifies hardware design and arbitration strategy at the on chip network. The routers are designed with circuit-switched network, circuit switching is asynchronous circuits and routers have no queuing (buffering), therefore, it is simple and efficient in implementation. Synchronous circuit is network with a clock source, but the distributing global clock has many problems such as power consumption, increasing the area and Clock skew. Ring topology with multi-transaction bus architecture. It could make multiple packets to access the bus at the same time, so that the multi-transaction bus architecture is better to get more throughputs. When the number of cores increase, the central arbiter circuit is more complexity, this thesis presents an SAP (self-adjusting priority) schedule that can fairly adjust priorities of each component by appropriately exchanging weighting at distributed arbiter. When numerous requests encounter contention on a network, a winner owning the highest priority will exchange its priority with the lowest priority of these requests. This principle guarantees that winners will decreased the opportunity of incurring network at the next time. In opposition, these losers can obtain the higher priority than that of the original. Therefore, the proposed scheme not only offers fair strategy, but also simplifies hardware design. switch circuit multi-core systems arbitration strategy Arbiter distributed system
4	Performance Evaluation of Node.js on Multi-core Computing Systems Azmat, Janty January 2018 (has links) Since JavaScript code that is executed by the Node.js run-time environment is run in a single thread without really utilizing the full power of multi-core systems, fairly new approaches attempt to solve this situation. Some of these approaches are considered well publicly tested and are widely used at the time of writing this document. The objectives for this study are to check which ones of these approaches achieve the better scalability in accordance to the number of handled requests, and to what extent those approaches utilize the multi-core power compared to the raw Node.js environment with the normal CPU scheduling. Node.js parallel computing multi-core systems Engineering and Technology Teknik och teknologier
5	PERFORMANCE-AWARE RESOURCE MANAGEMENT OF MULTI-THREADED APPLICATIONS FOR MANY-CORE SYSTEMS Olsen, Daniel 01 August 2016 (has links) Future integrated systems will contain billions of transistors, composing tens to hundreds of IP cores. Modern computing platforms take advantage of this manufacturing technology advancement and are moving from Multi-Processor Systems-on-Chip (MPSoC) towards Many-Core architectures employing high numbers of processing cores. These hardware changes are also driven by application changes. The main characteristic of modern applications is the increased parallelism and the need for data storage and transfer. Resource management is a key technology for the successful use of such many-core platforms. The thread to core mapping can deal with the run-time dynamics of applications and platforms. Thus, the efficient resource management enables the efficient usage of the platform resources. maximizing platform utilization, minimizing interconnection network communication load and energy budget. In this thesis, we present a performance-aware resource management scheme for many- core architectures. Particular, the developed framework takes as input parallel applications and performs an application profiling. Based on that profile information, a thread to core mapping algorithm finds (i) the appropriate number of threads that this application will have in order to maximize the utilization of the system and (ii) the best mapping for maximizing the performance of the application under the selected number of threads. In order to validate the proposed algorithm, we used and extended the Sniper, state-of-art, many-core simulator. Last, we developed a discrete event simulator, on top of Sniper simulator, in order to test and validate multiple scenarios faster. The results show that the the proposed methodology, achieves on average a gain of 23% compared to a performance oriented mapping presented and each application completes its workload 18% faster on average. Many-core Systems Mapping Multi-Threaded Applications Resource management
6	SIMPLE POOL ARCHITECTURE FOR APPLICATION RESOURCE ALLOCATION IN MANY-CORE SYSTEMS Koduri, Jayasimha sai 01 December 2017 (has links) The technology push by Moore's law brings a paradigm shift in the adaption of many core systems which replace high frequency superscalar processors with many simpler ones. On the software side, in order to utilize the available computational power, applications are following the high performance parallel/multi-threading model. Thus, many-core systems raise the challenges of resource allocation and fragmentation making necessary ecient run-time resource management techniques. In this thesis, we propose SPA, a Simple Pool Architecture for managing resource allocation in many-core systems. The proposed framework follows a distributed approach in which cores are organized into clusters and multiple clusters form a pool. Clusters are created based on system's characteristics and the allocation of cores is performed in a distributed manner so as to increase resource utilization and reduce fragmentation. Specifically, SPA is responsible (i) to generate the pool-based structure and organize cores into clusters depending on the NoC architecture; (ii) to serve, at run-time, the needs of multithreaded applications, in terms or processing cores; and (iii) to allocate resources in order to take advantage of spatial features, shared resources and reduce fragmentation. Experimental results show that SPA produces on average 15% better application response time while waiting time is reduced by 45% on average compared to other state-of-art methodologies. abstraction architecture many-core systems NoC resource allocation resource utilization
7	Design of multi-core dataflow cryptprocessor Alzahrani, Ali Saeed 28 August 2018 (has links) Embedded multi-core systems are implemented as systems-on-chip that rely on packet store-and-forward networks-on-chip for communications. These systems do not use buses nor global clock. Instead routers are used to move data between the cores and each core uses its own local clock. This implies concurrent asynchronous computing. Implementing algorithms in such systems is very much facilitated using dataflow concepts. In this work, we propose a methodology for implementing algorithms on dataflow platforms. The methodology can be applied to multi-threaded, multi-core platforms or a combination of these platforms as well. This methodology is based on a novel dataflow graph representation of the algorithm. We applied the proposed methodology to obtain a novel dataflow multi-core computing model for the secure hash algorithm-3. The resulting hardware was implemented in FPGA to verify the performance parameters. The proposed model of computation has advantages such as flexible I/O timing in term of scheduling policy, execution of tasks as soon as possible, and self-timed event-driven system. In other words, I/O timing and correctness of algorithm evaluation are dissociated in this work. The main advantage of this proposal is the ability to dynamically obfuscate algorithm evaluation to thwart side-channel attacks without having to redesign the system. This has important implications for cryptographic applications. Also, the dissertation proposes four countermeasure techniques against side-channel attacks for SHA-3 hashing. The countermeasure techniques are based on choosing stochastic or deterministic input data scheduling strategies. Extensive simulations of the SHA-3 algorithm and the proposed countermeasures approaches were performed using object-oriented MATLAB models to verify and validate the effectiveness of the techniques. The design immunity for the proposed countermeasures is assessed. / Graduate / 2020-11-19 Embedded multi-core systems object-oriented MATLAB models FPGA
8	Roko: Balancing Performance and Usability in Coarse-grain Parallelization Segulja, Cedomir 06 April 2010 (has links) We present Roko, a system that allows parallelization of sequential C codes with a modest user intervention. The user exposes parallelism at the function level by annotating the code with pragmas. Roko defines only two pragmas: the parallel pragma is used to denote function calls that will be executed asynchronously, and the exposed pragma is used to describe data usage of the marked function calls. Architecturally, Roko consists of three components: a compiler that analyzes pragmas, a software environment that spreads the execution over multiple processors, and a hardware support that implements a novel synchronization scheme, versioning. We have designed, implemented and evaluated an FPGA-based prototype of Roko. Our experimental evaluation shows: (i) that few simple pragmas are all that is needed to expose parallelism in benchmark applications and (ii) that Roko can deliver good performance in terms of application speedup. Programming Model Parallelization Synchronization Concurrency Control Multi-core Systems FPGA Applications 0984
9	Roko: Balancing Performance and Usability in Coarse-grain Parallelization Segulja, Cedomir 06 April 2010 (has links) We present Roko, a system that allows parallelization of sequential C codes with a modest user intervention. The user exposes parallelism at the function level by annotating the code with pragmas. Roko defines only two pragmas: the parallel pragma is used to denote function calls that will be executed asynchronously, and the exposed pragma is used to describe data usage of the marked function calls. Architecturally, Roko consists of three components: a compiler that analyzes pragmas, a software environment that spreads the execution over multiple processors, and a hardware support that implements a novel synchronization scheme, versioning. We have designed, implemented and evaluated an FPGA-based prototype of Roko. Our experimental evaluation shows: (i) that few simple pragmas are all that is needed to expose parallelism in benchmark applications and (ii) that Roko can deliver good performance in terms of application speedup. Programming Model Parallelization Synchronization Concurrency Control Multi-core Systems FPGA Applications 0984
10	Microarchitecture and FPGA Implementation of the Multi-level Computing Architecture Capalija, Davor 30 July 2008 (has links) We design the microarchitecture of the Multi-Level Computing Architecture (MLCA), focusing on its Control Processor (CP). The design of the microarchitecture of the CP faces us with both opportunities and challenges that stem from the coarse granularity of the tasks and the large number of inputs and outputs for each task instruction. Thus, we explore changes to standard superscalar microarchitectural techniques. We design the entire CP microarchitecture and implement it on an FPGA using SystemVerilog. We synthesize and evaluate the MLCA system based on a 4-processor shared-memory multiprocessor. The performance of realistic applications shows scalable speedups that are comparable to that of simulation. We believe that our implementation achieves low complexity in terms of FPGA resource usage and operating frequency. In addition, we argue that our design methodology allows the scalability of the CP as the entire system grows. Computer architecture FPGA applications Microarchitecture Parallelism Embedded systems Multi-core systems 0984

Search results