1 |
Does the Halting Necessary for Hardware Trace Collection Inordinately Perturb the Results?Watson, Myles G. 16 November 2004 (has links) (PDF)
Processor address traces are invaluable for characterizing workloads and testing proposed memory hierarchies. Long traces are needed to exercise modern cache designs and produce meaningful results, but are difficult to collect with hardware monitors because microprocessors access memory too frequently for disks or other large storage to keep up. The small, fast buffers of the monitors fill quickly; in order to obtain long contiguous traces, the processor must be stopped while the buffer is emptied. This halting may perturb the traces collected, but this cannot be measured directly, since long uninterrupted traces cannot be collected. We make the case that hardware performance counters, which collect runtime statistics without influencing execution, can be used to measure halting effects. We use the performance counters of the Pentium 4 processor to collect statistics while halting the processor as if traces were being collected. We then compare these results to the statistics obtained from unhalted runs. We present our results in terms of which counters are affected, why, and what this means for trace-collection systems.
|
2 |
Knihovna procesorů pro návrh vestavěných systémů / Processors Library for the Embedded System DesignZvonček, Radovan January 2011 (has links)
This work deals with designing a library of processor models used in embedded systems. Processor architectures are described using the ISAC language. The ISAC language is one of several outcomes of the Lissom project that is taking place at the Faculty of Information Technology, BUT, Brno. The beginning of this work is aimed to provide the introduction to processor architectures used in today's embedded systems. Remaining sections are devoted to presentations of exemplary processor architectures and the description of their implementation. This work is finalized by concluding the gathered experience with emphasis on the suitability of the ISAC language for architecture description and the efficiency of its simulation.
|
3 |
Proceedings of the 4th Many-core Applications Research Community (MARC) SymposiumJanuary 2012 (has links)
In continuation of a successful series of events, the 4th Many-core Applications Research Community (MARC) symposium took place at the HPI in Potsdam on December 8th and 9th 2011. Over 60 researchers from different fields presented their work on many-core hardware architectures, their programming models, and the resulting research questions for the upcoming generation of heterogeneous parallel systems.
|
4 |
Programmable Address Generation Unit for Deep Neural Network AcceleratorsKhan, Muhammad Jazib January 2020 (has links)
The Convolutional Neural Networks are getting more and more popular due to their applications in revolutionary technologies like Autonomous Driving, Biomedical Imaging, and Natural Language Processing. With this increase in adoption, the complexity of underlying algorithms is also increasing. This trend entails implications for the computation platforms as well, i.e. GPUs, FPGA, or ASIC based accelerators, especially for the Address Generation Unit (AGU), which is responsible for the memory access. Existing accelerators typically have Parametrizable Datapath AGUs, which have minimal adaptability towards evolution in algorithms. Hence new hardware is required for new algorithms, which is a very inefficient approach in terms of time, resources, and reusability. In this research, six algorithms with different implications for hardware are evaluated for address generation, and a fully Programmable AGU (PAGU) is presented, which can adapt to these algorithms. These algorithms are Standard, Strided, Dilated, Upsampled and Padded convolution, and MaxPooling. The proposed AGU architecture is a Very Long Instruction Word based Application Specific Instruction Processor which has specialized components like hardware counters and zero-overhead loops and a powerful Instruction Set Architecture (ISA), which can model static and dynamic constraints and affine and non-affine Address Equations. The target has been to minimize the flexibility vs. area, power, and performance trade-off. For a working test network of Semantic Segmentation, results have shown that PAGU shows close to the ideal performance, one cycle per address, for all the algorithms under consideration excepts Upsampled Convolution for which it is 1.7 cycles per address. The area of PAGU is approx. 4.6 times larger than the Parametrizable Datapath approach, which is still reasonable considering the high flexibility benefits. The potential of PAGU is not just limited to neural network applications but also in more general digital signal processing areas, which can be explored in the future. / Convolutional Neural Networks blir mer och mer populära på grund av deras applikationer inom revolutionerande tekniker som autonom körning, biomedicinsk bildbehandling och naturligt språkbearbetning. Med denna ökning av antagandet ökar också komplexiteten hos underliggande algoritmer. Detta medför implikationer för beräkningsplattformarna såväl som GPU: er, FPGAeller ASIC-baserade acceleratorer, särskilt för Adressgenerationsenheten (AGU) som är ansvarig för minnesåtkomst. Befintliga acceleratorer har normalt Parametrizable Datapath AGU: er som har mycket begränsad anpassningsförmåga till utveckling i algoritmer. Därför krävs ny hårdvara för nya algoritmer, vilket är en mycket ineffektiv metod när det gäller tid, resurser och återanvändbarhet. I denna forskning utvärderas sex algoritmer med olika implikationer för hårdvara för adressgenerering och en helt programmerbar AGU (PAGU) presenteras som kan anpassa sig till dessa algoritmer. Dessa algoritmer är Standard, Strided, Dilated, Upsampled och Padded convolution och MaxPooling. Den föreslagna AGU-arkitekturen är en Very Long Instruction Word-baserad applikationsspecifik instruktionsprocessor som har specialiserade komponenter som hårdvara räknare och noll-overhead-slingor och en kraftfull Instruktionsuppsättning Arkitektur (ISA) som kan modellera statiska och dynamiska begränsningar och affinera och icke-affinerad adress ekvationer. Målet har varit att minimera flexibiliteten kontra avvägning av område, kraft och prestanda. För ett fungerande testnätverk av semantisk segmentering har resultaten visat att PAGU visar nära den perfekta prestanda, 1 cykel per adress, för alla algoritmer som beaktas undantar Upsampled Convolution för vilken det är 1,7 cykler per adress. Området för PAGU är ungefär 4,6 gånger större än Parametrizable Datapath-metoden, vilket fortfarande är rimligt med tanke på de stora flexibilitetsfördelarna. Potentialen för PAGU är inte bara begränsad till neurala nätverksapplikationer utan också i mer allmänna digitala signalbehandlingsområden som kan utforskas i framtiden.
|
Page generated in 0.0652 seconds