Global ETD Search

1	Design of The Rendezvous Mechanism In The Multi-Core AMBA System Chang, Mu-Chi 06 August 2008 (has links) In current chip multi-processors (CMPs), the on-chip network is a major factor affecting overall system performance. Different kinds of communication protocols vary from different communication architectures of current SOC designs. For example, the AMBA is master-slave architecture, which transacts and communicates the data of between the two CORE (Master) through the Memory (Slave). The architecture cost long time for load and store with memory. Hence, this paper design and implement a Rendezvous protocol on AMBA architecture, which is called Rendezvous of Advanced High performance Bus (RAHB), to let two processors can communicate with each other without memory reference overheads. The RAHB is compatible with the AHB architecture, and add Rendezvous communication protocol in the AMBA architecture to perform the direct transmission of data. Without referring the memory, the RAHB can improve the efficiency of communication in multi-core. For experimental evaluation, we evaluate the performance between RAHB and AHB, RAHB speedup (B/s) is average up to 50% for different data length and performance up 30% to 40% for executing test program. Inter-processor communication AMBA multi-core Rendezvous
2	On the simulation and design of manycore CMPs Thompson, Christopher Callum January 2015 (has links) The progression of Moore’s Law has resulted in both embedded and performance computing systems which use an ever increasing number of processing cores integrated in a single chip. Commercial systems are now available which provide hundreds of cores, and academics have proposed architectures for up to 1024 cores. Embedded multicores are increasingly popular as it is easier to guarantee hard-realtime constraints using individual cores dedicated for tasks, than to use traditional time-multiplexed processing. However, finding the optimal hardware configuration to meet these requirements at minimum cost requires extensive trial and error approaches to investigate the design space. This thesis tackles the problems encountered in the design of these large scale multicore systems by first addressing the problem of fast, detailed micro-architectural simulation. Initially addressing embedded systems, this work exploits the lack of hardware cache-coherence support in many deeply embedded systems to increase the available parallelism in the simulation. Then, through partitioning the NoC and using packet counting and cycle skipping reduces the amount of computation required to accurately model the NoC interconnect. In combination, this enables simulation speeds significantly higher than the state of the art, while maintaining less error, when compared to real hardware, than any similar simulator. Simulation speeds reach up to 370MIPS (Million (target) Instructions Per Second), or 110MHz, which is better than typical FPGA prototypes, and approaching final ASIC production speeds. This is achieved while maintaining an error of only 2.1%, significantly lower than other similar simulators. The thesis continues by scaling the simulator past large embedded systems up to 64-1024 core processors, adding support for coherent architectures using the same packet counting techniques along with low overhead context switching to enable the simulation of such large systems with stricter synchronisation requirements. The new interconnect model was partitioned to enable parallel simulation to further improve simulation speeds in a manner which did not sacrifice any accuracy. These innovations were leveraged to investigate significant novel energy saving optimisations to the coherency protocol, processor ISA, and processor micro-architecture. By introducing a new instruction, with the name wait-on-address, the energy spent during spin-wait style synchronisation events can be significantly reduced. This functions by putting the core into a low-power idle state while the cache line of the indicated address is monitored for coherency action. Upon an update or invalidation (or traditional timer or external interrupts) the core will resume execution, but the active energy of running the core pipeline and repeatedly accessing the data and instruction caches is effectively reduced to static idle power. The thesis also shows that existing combined software-hardware schemes to track data regions which do not require coherency can adequately address the directory-associativity problem, and introduces a new coherency sharer encoding which reduces the energy consumed by sharer invalidations when sharers are grouped closely together, such as would be the case with a system running many tasks with a small degree of parallelism in each. The research concludes by using the extremely fast simulation speeds developed to produce a large set of training data, collecting various runtime and energy statistics for a wide range of embedded applications on a huge diverse range of potential MPSoC designs. This data was used to train a series of machine learning based models which were then evaluated on their capacity to predict performance characteristics of unseen workload combinations across the explored MPSoC design space, using only two sample simulations, with promising results from some of the machine learning techniques. The models were then used to produce a ranking of predicted performance across the design space, and on average Random Forest was able to predict the best design within 89% of the runtime performance of the actual best tested design, and better than 93% of the alternative design space. When predicting for a weighted metric of energy, delay and area, Random Forest on average produced results within 93% of the optimum result. In summary this thesis improves upon the state of the art for cycle accurate multicore simulation, introduces novel energy saving changes the the ISA and microarchitecture of future multicore processors, and demonstrates the viability of machine learning techniques to significantly accelerate the design space exploration required to bring a new manycore design to market. 004
3	Resource Optimization of MPSoC for Industrial Use-cases Kågesson, Filip, Cederbom, Simon January 2019 (has links) Today’s embedded systems require more and more performance but they are still required to meet power constraints. Single processor systems can deliver high performance but this leads to high power consumption. One solution to this problem is to use a multiprocessor system instead which is able to provide high performance and at the same time meet the power constraints. The reason that such a system can meet the power constraints is that it can have a lower clock frequency than a similar single processor system. The focus of the project is to explore possibilities when developing new multiprocessor systems. The project makes a comparison of asymmetric multiprocessing (AMP) systems and symmetric multiprocessing (SMP) systems in terms of task management and communication between the processors. A comparison is made between the Advanced High-performance Bus (AHB) interface and the Advanced eXtensible Interface (AXI). The fixed priority and round-robin arbitration algorithms is also compared. The project also contains a practical part where a demo is developed to show that an inter-processor communication using exclusive access is possible to implement. The theoretical part of the project containing the comparisons result in good comparisons that can be used to get an overview of what to use when developing new Multiprocessor System on Chip (MPSoC) designs. The demo developed in this project failed to meet the requirement of having a fully functional spinlock. This problem can be solved in the future if new hardware is developed. / Dagens inbyggda system kräver mer och mer prestanda men de måste fortfarande klara av kraven kring strömförbrukning. System med en processor kan leverera hög prestanda men detta leder till hög strömförbrukning. En lösning till detta problem är att använda ett multiprocessorsystem istället som klarar av att leverera hög prestanda och samtidigt klara av kraven kring strömförbrukning. Anledningen till att denna typ av system klarar av kraven kring strömförbrukning är att de kan använda en lägre klockfrekvens än ett system med en processor. Fokuset på detta projektet ligger på att utforska möjligheterna som finns när nya multiprocessorsystem ska utvecklas. Projektet gör en jämförelse mellan asymmetriska och symmetriska multiprocessorsystem i termer av uppgiftshantering och kommunikation mellan processorerna. En jämförelse har gjorts mellan Advanced High-Performance Bus (AHB) gränssnittet och Advanced eXtensible Interface (AXI) gränssnittet. Fixed priority och round-robin algoritmerna för hantering av krockar mellan processorerna har också jämförts. Det finns även en praktisk del i projektet där en demo har utvecklats för att visa en fungerande kommunikation mellan processorer som använder funktionaliteten för exklusiv åtkomst till den gemensamma bussen. Den teoretiska delen av projektet som innehåller jämförelserna resulterar i bra jämförelser som kan användas när nya multiprocessorsystem utvecklas. Demon som har utvecklats i detta projekt misslyckades med att klara av kravet kring att ha ett fullt fungerande lås. Detta problemet kan lösas i framtiden ifall ny hårdvara utvecklas. MPSoC inter-processor communication message passing Computer Systems Datorsystem Embedded Systems Inbäddad systemteknik
4	Komunikace na čipu ADSP-SC58x / Communication on the ADSP-SC58x Chip Havran, Jan January 2018 (has links) This projects describes the design of communication between SHARC and ARM cores on ADSP-SC58x platform, concretely between bare-metal and Linux applications on ADSP-SC589 chips. There are outlined several available technologies for data transfer, such as MCAPI, MDMA or shared memory. There are also designed and implemented new communication principes based on current implementations of these technologies.

1

Page generated in 0.1018 seconds