Spelling suggestions: "subject:"reconfigurable architectures"" "subject:"econfigurable architectures""
1 |
Reconfigurable architectures for the next generation of mobile device telecommunications systemsEl-Rayis, Ahmed Osman January 2014 (has links)
Mobile devices have become a dominant tool in our daily lives. Business and personal usage has escalated tremendously since the emergence of smartphones and tablets. The combination of powerful processing in mobile devices, such as smartphones and the Internet, have established a new era for communications systems. This has put further pressure on the performance and efficiency of telecommunications systems in delivering the aspirations of users. Mobile device users no longer want devices that merely perform phone calls and messaging. Rather, they look for further interactive applications such as video streaming, navigation and real time social interaction. Such applications require a new set of hardware and standards. The WiFi (IEEE 802.11) standard has been at the forefront of reliable and high-speed internet access telecommunications. This is due to its high signal quality (quality of service) and speed (throughput). However, its limited availability and short range highlights the need for further protocols, in particular when far away from access points or base stations. This led to the emergence of 3G followed by 4G and the upcoming 5G standard that, if fully realised, will provide another dimension in “anywhere, anytime internet connectivity.” On the other hand, the WiMAX (IEEE 802.16) standard promises to exceed the WiFi signal coverage range. The coverage range could be extended to kilometres at least with a better or similar WiFi signal level. This thesis considers a dynamically reconfigurable architecture that is capable of processing various modules within telecommunications systems. Forward error correction, coder and navigation modules are deployed in a unified low power communication platform. These modules have been selected since they are among those with the highest demand in terms of processing power, strict processing time or throughput. The modules are mainly realised within WiFi and WiMAX systems in addition to global positioning systems (GPS). The idea behind the selection of these modules is to investigate the possibility of designing an architecture capable of processing various systems and dynamically reconfiguring between them. The GPS system is a power-hungry application and, at the same time, it is not needed all of the time. Hence, one key idea presented in this thesis is to effectively exploit the dynamic reconfiguration capability so as to reconfigure the architecture (GPS) when it is not needed in order to process another needed application or function such as WiFi or WiMAX. This will allow lower energy consumption and the optimum usage of the hardware available on the device. This work investigates the major current coarse-grain reconfigurable architectures. A novel multi-rate convolution encoder is then designed and realised as a reconfigurable fabric. This demonstrates the ability to adapt the algorithms involved to meet various requirements. A throughput of between 200 and 800 Mbps has been achieved for the rates 1/2 to 7/8, which is a great achievement for the proposed novel architecture. A reconfigurable interleaver is designed as a standalone fabric and on a dynamically reconfigurable processor. High throughputs exceeding 90 Mbps are achieved for the various supported block sizes. The Reed Solomon coder is the next challenging system to be designed into a dynamically reconfigurable processor. A novel Galois Field multiplier is designed and integrated into the developed Reed Solomon reconfigurable processor. As a result of this work, throughputs of 200Mbps and 93Mbps respectively for RS encoding and decoding are achieved. A GPS correlation module is also investigated in this work. This is the main part of the GPS receiver responsible for continuously tracking GPS satellites and extracting messages from them. The challenging aspect of this part is its real-time nature and the associated critical time constraints. This work resulted in a novel dynamically reconfigurable multi-channel GPS correlator with up to 72 simultaneous channels. This work is a contribution towards a global unified processing platform that is capable of processing communication-related operations efficiently and dynamically with minimum energy consumption.
|
2 |
Gamification to Solve a Mapping Problem in Electrical EngineeringBalavendran Joseph, Rani Deepika 05 1900 (has links)
Coarse-Grained Reconfigurable Architectures (CGRAs) are promising in developing high performance low-power portable applications. In this research, we crowdsource a mapping problem using gamification to harnass human intelligence. A scientific puzzle game, Untangled, was developed to solve a mapping problem by encapsulating architectural characteristics. The primary motive of this research is to draw insights from the mapping solutions of players who possess innate abilities like decision-making, creative problem-solving, recognizing patterns, and learning from experience. In this dissertation, an extensive analysis was conducted to investigate how players' computational skills help to solve an open-ended problem with different constraints. From this analysis, we discovered a few common strategies among players, and subsequently, a library of dictionaries containing identified patterns from players' solutions was developed. The findings help to propose a better version of the game that incorporates these techniques recognized from the experience of players. In the future, an updated version of the game that can be developed may help low-performance players to provide better solutions for a mapping problem. Eventually, these solutions may help to develop efficient mapping algorithms, In addition, this research can be an exemplar for future researchers who want to crowdsource such electrical engineering problems and this approach can also be applied to other domains.
|
3 |
Sistema de tradução binária de dois níveis para execução multi-ISA / Tow-level binary translation system for multiple-isa executionFajardo Junior, Jair January 2011 (has links)
Atualmente, a adição de uma nova função implementada em hardware em um processador não deve impor nenhuma mudança no conjunto de instruções (ISA – Instruction Set Architecture) suportado para atingir melhorias em seu desempenho. O objetivo é manter a compatibilidade retroativa e futura de programas já compilados. Todavia, este fato se torna, muitas vezes, um fator impeditivo para o aprimoramento ou desenvolvimento de uma nova arquitetura. Desta maneira, a utilização de mecanismos de Tradução Binária abre novas oportunidades aos projetistas, já que estes mecanismos permitem a execução de programas já compilados em arquiteturas que suportam conjuntos de instruções diferentes do previsto inicialmente. Assim, para eliminar o custo adicional apresentado por estes sistemas de tradução, será proposto um novo mecanismo de tradução binária dinâmico de dois níveis. Enquanto o primeiro nível é responsável pela tradução de facto das instruções do conjunto nativo para instruções de uma linguagem de máquina intermediária, o segundo nível otimiza estas instruções já traduzidas para serem executadas na arquitetura alvo. O sistema é totalmente flexível, pois pode suportar a tradução de conjuntos de instruções completamente diferentes; assim como a utilização de arquiteturas de hardware com as mais diversas características. Este trabalho apresenta o primeiro esforço nesta direção: um estudo de caso onde ocorre a tradução de código x86 para MIPS (linguagem intermediária), que será otimizado para ser executado em uma arquitetura que realiza reconfiguração dinâmica. Resta demonstrado que é possível manter a compatibilidade binária, com melhoria no desempenho em torno de 45% em média e consumo de energia semelhante ao da execução nativa. / In these days, every new added hardware feature must not change the underlying instruction set architecture (ISA), in order to avoid adaptation or recompilation of existing code. Therefore, Binary Translation (BT) opens new possibilities for designers, previously tied to a specific ISA and all its legacy hardware issues, since it allows the execution of already compiled applications on different architectures. To overcome the BT inherent performance penalty, we propose a new mechanism based on a dynamic two-level binary translation system. While the first level is responsible for the BT de facto to an intermediate machine language, the second level optimizes the already translated instructions to be executed on the target architecture. The system is totally flexible, supporting the porting of radically different ISAs and the employment of different target architectures. This work presents the first effort towards this direction: it translates code implemented in the x86 ISA to MIPS assembly (the intermediate language), which will be optimized by the target architecture: a dynamically reconfigurable architecture. In this work is showed that is possible to maintain binary compatibility with performance improvements on average 45% and similar energy consumption when compared to native execution.
|
4 |
Sistema de tradução binária de dois níveis para execução multi-ISA / Tow-level binary translation system for multiple-isa executionFajardo Junior, Jair January 2011 (has links)
Atualmente, a adição de uma nova função implementada em hardware em um processador não deve impor nenhuma mudança no conjunto de instruções (ISA – Instruction Set Architecture) suportado para atingir melhorias em seu desempenho. O objetivo é manter a compatibilidade retroativa e futura de programas já compilados. Todavia, este fato se torna, muitas vezes, um fator impeditivo para o aprimoramento ou desenvolvimento de uma nova arquitetura. Desta maneira, a utilização de mecanismos de Tradução Binária abre novas oportunidades aos projetistas, já que estes mecanismos permitem a execução de programas já compilados em arquiteturas que suportam conjuntos de instruções diferentes do previsto inicialmente. Assim, para eliminar o custo adicional apresentado por estes sistemas de tradução, será proposto um novo mecanismo de tradução binária dinâmico de dois níveis. Enquanto o primeiro nível é responsável pela tradução de facto das instruções do conjunto nativo para instruções de uma linguagem de máquina intermediária, o segundo nível otimiza estas instruções já traduzidas para serem executadas na arquitetura alvo. O sistema é totalmente flexível, pois pode suportar a tradução de conjuntos de instruções completamente diferentes; assim como a utilização de arquiteturas de hardware com as mais diversas características. Este trabalho apresenta o primeiro esforço nesta direção: um estudo de caso onde ocorre a tradução de código x86 para MIPS (linguagem intermediária), que será otimizado para ser executado em uma arquitetura que realiza reconfiguração dinâmica. Resta demonstrado que é possível manter a compatibilidade binária, com melhoria no desempenho em torno de 45% em média e consumo de energia semelhante ao da execução nativa. / In these days, every new added hardware feature must not change the underlying instruction set architecture (ISA), in order to avoid adaptation or recompilation of existing code. Therefore, Binary Translation (BT) opens new possibilities for designers, previously tied to a specific ISA and all its legacy hardware issues, since it allows the execution of already compiled applications on different architectures. To overcome the BT inherent performance penalty, we propose a new mechanism based on a dynamic two-level binary translation system. While the first level is responsible for the BT de facto to an intermediate machine language, the second level optimizes the already translated instructions to be executed on the target architecture. The system is totally flexible, supporting the porting of radically different ISAs and the employment of different target architectures. This work presents the first effort towards this direction: it translates code implemented in the x86 ISA to MIPS assembly (the intermediate language), which will be optimized by the target architecture: a dynamically reconfigurable architecture. In this work is showed that is possible to maintain binary compatibility with performance improvements on average 45% and similar energy consumption when compared to native execution.
|
5 |
Sistema de tradução binária de dois níveis para execução multi-ISA / Tow-level binary translation system for multiple-isa executionFajardo Junior, Jair January 2011 (has links)
Atualmente, a adição de uma nova função implementada em hardware em um processador não deve impor nenhuma mudança no conjunto de instruções (ISA – Instruction Set Architecture) suportado para atingir melhorias em seu desempenho. O objetivo é manter a compatibilidade retroativa e futura de programas já compilados. Todavia, este fato se torna, muitas vezes, um fator impeditivo para o aprimoramento ou desenvolvimento de uma nova arquitetura. Desta maneira, a utilização de mecanismos de Tradução Binária abre novas oportunidades aos projetistas, já que estes mecanismos permitem a execução de programas já compilados em arquiteturas que suportam conjuntos de instruções diferentes do previsto inicialmente. Assim, para eliminar o custo adicional apresentado por estes sistemas de tradução, será proposto um novo mecanismo de tradução binária dinâmico de dois níveis. Enquanto o primeiro nível é responsável pela tradução de facto das instruções do conjunto nativo para instruções de uma linguagem de máquina intermediária, o segundo nível otimiza estas instruções já traduzidas para serem executadas na arquitetura alvo. O sistema é totalmente flexível, pois pode suportar a tradução de conjuntos de instruções completamente diferentes; assim como a utilização de arquiteturas de hardware com as mais diversas características. Este trabalho apresenta o primeiro esforço nesta direção: um estudo de caso onde ocorre a tradução de código x86 para MIPS (linguagem intermediária), que será otimizado para ser executado em uma arquitetura que realiza reconfiguração dinâmica. Resta demonstrado que é possível manter a compatibilidade binária, com melhoria no desempenho em torno de 45% em média e consumo de energia semelhante ao da execução nativa. / In these days, every new added hardware feature must not change the underlying instruction set architecture (ISA), in order to avoid adaptation or recompilation of existing code. Therefore, Binary Translation (BT) opens new possibilities for designers, previously tied to a specific ISA and all its legacy hardware issues, since it allows the execution of already compiled applications on different architectures. To overcome the BT inherent performance penalty, we propose a new mechanism based on a dynamic two-level binary translation system. While the first level is responsible for the BT de facto to an intermediate machine language, the second level optimizes the already translated instructions to be executed on the target architecture. The system is totally flexible, supporting the porting of radically different ISAs and the employment of different target architectures. This work presents the first effort towards this direction: it translates code implemented in the x86 ISA to MIPS assembly (the intermediate language), which will be optimized by the target architecture: a dynamically reconfigurable architecture. In this work is showed that is possible to maintain binary compatibility with performance improvements on average 45% and similar energy consumption when compared to native execution.
|
6 |
Réalisation d'un système d'exploitation pour l'architecture reconfigurable dynamiquement OLLAF / Operating system realization for dynamically reconfigurable architecture OLLAFKtata, Ismail 21 June 2013 (has links)
Actuellement on assiste à une émergence des applications des systèmes embarqués destinées à un large public d'utilisateurs. Ces applications sont de plus en plus complexes et diversifiées. Elles nécessitent une capacité de calcul accrue et doivent satisfaire, dans leurs exécutions, la prise en compte du temps réel. De plus, ces systèmes sur puce fonctionnent dans des conditions souvent difficiles et perturbantes. Ainsi, certaines contraintes temporelles, contraintes de ressources, contraintes de précédence ainsi que d'autres caractéristiques des systèmes généraux peuvent changer au cours d'exécution. Pour respecter leurs contraintes, ces systèmes doivent être capables de supporter la nature dynamique du monde réel depuis la modélisation de l'application jusqu'à son implémentation sur la plateforme d'exécution. Dans cette thèse une nouvelle approche combinant la modélisation haut niveau et l'ordonnancement sur une architecture reconfigurable dynamiquement de nouveau type, a été proposée. Cette approche est originale depuis ça conception en ciblant des applications fortement dynamiques et flexibles. De plus, l'ordonnanceur ainsi développé intègre un nouveau service qui est responsable de la prédiction des variables dynamiques afin d'aboutir à une meilleure exploitation de l'architecture et meilleure performance d'exécution. Des expérimentations ont été présentées sur des applications temps réel. / Embedded systems have important requirements such as reducing complexity and saving development effort. They have also to take account of applications constraints related to timing, resources, tasks precedence relations and other characteristics of general systems that may change during execution. To meet their constraints, these systems must be capable of supporting the dynamic nature of the real world at an early phase of their design. Dynamically reconfigurable architecture (DRA) is presented as the ideal solution to satisfy the highly dynamic and non-deterministic behaviour of current applications since it provides both high performance and run-time flexibility. In this thesis a new approach combining the high level modeling and scheduling on a dynamically reconfigurable architecture of a new type, has been proposed. Based on an original task graph model, the scheduling is performed by a predictive approach. The proposed method aims to better manage the reconfiguration process and minimize its latency. Experimental results based on the original DRA named OLLAF demonstrate the benefits and efficiency of our scheduling technique.
|
7 |
A reconfigurable heterogeneous multicore system with homogeneous ISA / Um sistema multinucleo, heterogeneo e reconfiguravel de ISA homogêneaSouza, Jeckson Dellagostin January 2016 (has links)
Dada a grande diversidade de aplicações embarcadas presentes nos atuais dispositivos portáveis, ambos os paralelismos em nível de threads e de instruções devem ser explorados para obter ganhos de desempenho e energia. Enquanto MPSoCs (sistemas em chip de múltiplos núcleos) são amplamente usados para esse propósito, estes falham quando consideramos produtividade de software, já que eles são compostos de chips com diferentes arquiteturas que precisam ser programados separadamente. Por outro lado, processadores multi núcleos de propósito geral implementam a mesma arquitetura, mas são compostos de núcleos homogêneos de processadores superescalares que consomem muita potência. Nesta dissertação, propõe-se um novo sistema, que tira proveito de circuitos reconfiguráveis para criar diferentes organizações que implementam a mesma arquitetura, capazes de apresentar alto desempenho com baixo custo energético. Para garantir a compatibilidade binária, usa-se um mecanismo de tradução binária que transforma o código a ser executado no circuito reconfigurável durante a execução. Usando aplicações representativas, mostra-se que uma versão do sistema heterogêneo pode ganhar da sua versão homogênea em média de 59% em desempenho e 10% em energia, com melhoras em EDP (Energy-Delay Product – Produto da energia pelo tempo de execução) em quase todos os cenários. Além disso, este trabalho também propõe e avalia seis escalonadores para este sistema heterogêneo: dois algoritmos estáticos, os quais alocam as threads no primeiro núcleo livre, onde elas permanecerão durante toda a execução; um escalonador direcionado por contagem de instruções, o qual realoca as threads durante pontos de sincronização de acordo com a sua contagem de instruções; um escalonador de Feedback, que usa dados de dentro da unidade reconfigurável para realocar threads; o PC-Feedback, que adiciona um mecanismo de reuso de dados ao último escalonador; e um escalonador Oráculo, que é capaz de decidir a melhor alocação de threads possível. Mostra-se que o algoritmo estático pode ter alto desempenho em aplicações com alto paralelismo, contudo para um desempenho mais uniforme em todas as aplicações os algoritmos de Feedback e PC-Feedback são mais indicados. / Given the large diversity of embedded applications one can find in current portable devices, for energy and performance reasons one must exploit both Thread- and Instruction Level Parallelism. While MPSoCs (Multiprocessor system-on-chip) are largely used for this purpose, they fail when one considers software productivity, since it comprises different ISAs (Instruction Set Architecture) that must be programmed separately. On the other hand, general purpose multicores implement the same ISA, but are composed of a homogeneous set of very power consuming superscalar processors. In this dissertation, we show how one can effectively use a reconfigurable unit to provide a number of different possible heterogeneous configurations while still sustaining the same ISA, capable of reaching high performance with low energy cost. To ensure ISA compatibility, we use a binary translation mechanism that transforms code to be executed on the fabric at run-time. Using representative benchmarks, we show that one version of the heterogeneous system can outperform its homogenous counterpart in average by 59% in performance and 10% in energy, with EDP (Energy-Delay Product) improvements in almost every scenario. Furthermore, this work also proposes and evaluates six schedulers for the heterogeneous system: two static algorithms, which allocate the threads on the first free core, where they will run during the entire execution; an Instruction Count (IC) Driven scheduler, which reallocates threads during synchronization points accordingly to their instruction count; a Feedback scheduler, which uses data from inside the reconfigurable unit to reallocate threads; the PCFeedback scheduler, that adds a reuse mechanism to the last one; and an Oracle scheduler, which is capable of deciding the best thread allocation possible. We show that the static algorithm can reach high performance in applications with high parallelism, however for uniform performance in all applications, the Feedback and PC-Feedback algorithms are better designated.
|
8 |
Fault mitigation strategies for reliable FPGA architectures / Stratégies de tolérance aux fautes pour des architecture fiables de circuits reconfigurablesBasheer Ahmed, Chagun Basha 31 March 2016 (has links)
Les circuits reconfigurables (Field Programmable Gate Arrays - FPGAs) sont largement utilisés dans divers domaines d'application en raison de leur flexibilité, de leur haute densité d'intégration, de leur niveau de performance et du faible coût de développement associé. Toutefois, leur grande sensibilité aux défauts dus aux rayonnements électromagnétiques tels que les "Single Event Effets" (SEE), est un défi qui doit être abordée pendant la conception du système. Ces SEE sont une préoccupation majeure dans la sécurité et pour les systèmes critiques tels que les systèmes de l'automobile et de l'avionique. En général, la plupart des FPGA d'aujourd'hui ne sont pas conçus pour fonctionner dans ces environnements difficiles, sauf pour les circuits spécifiques qui ont été durcies par construction au niveau du processus de fabrication. Ces circuits ont un surcoût très élevé et des performances moindres, ce qui les rend moins intéressant que leur équivalent non protégé. Le projet ARDyT vise à développer une architecture FPGA fiable à faible coût avec une suite d'outils de conception, offrant un environnement complet pour la conception d'un système tolérant aux fautes. Ce travail de thèse présente l'architecture du FPGA ARDyT, qui intègre des stratégies de prises en charge des fautes adaptées aux différents éléments de l'architecture. L'un des principaux objectifs du projet ARDyT est de gérer les changements de valeurs multiples (multi bit upsets (MBUs)) dans le flux binaire de configuration du FPGA. Ces stratégies de tolérance aux fautes pour protéger les ressources logiques et le flux binaire de configuration sont discutées en détail. Une architecture spécifique du bloc logique élémentaire configurable est proposée afin de simplifier la stratégie de prise en compte des fautes dans les ressources logiques. Un nouveau système de correction d'erreur intégrée (3-Dimensional Hamming - 3DH) est proposé pour gérer les MBU dans le flux binaire de configuration. L'ensemble de la stratégie de gestion des fautes est implémenté dans l'architecture au travers d'un manager de la fiabilité centralisée nommée R3M (Run-time Reconfigurable Resource Manager), et d'une suite d'outils adaptée. / Reconfigurable Field Programmable Gate Arrays (FPGAs) are extensively employed in various application domains due to their flexibility, high-density functionality, high performance and low-cost development compared to ASICs (Application Specific Integrated Circuits). However, the challenge that must be tackled during system design is their high susceptibility to the radiation induced faults such as Single Event Effects (SEEs). These radiation induced faults are a major concern in safety and mission critical systems such as automotive and avionics systems. In general, most of today’s COTS FPGAs are not designed to work under these harsh environments, except for specific circuits that have been radiation-hardened at the fabrication process level, but at a very high cost overhead, which makes them less interesting from an economic and performance point of view. The project ARDyT is aimed to develop a low-cost reliable FPGA architecture with supporting EDA tool-suite that offers a complete environment for a fault tolerant system design. This thesis work presents the proposed ARDyT FPGA architecture, which incorporates appropriate fault mitigation strategies at different levels. One of the main objectives of ARDyT project is to handle multi-bit upsets (MBUs) in the configuration bistream. Fault mitigation strategies to protect logic resources and configuration bitstream are discussed in detail. A fault-aware customized configurable logic block architecture is proposed to support logic resource fault mitigation strategy. A new built-in 3-Dimensional Hamming (3DH) error correcting scheme is proposed to handle MBUs in the configuration bitstream. The additional features introduced in this architecture ensure complete reliability with the help of centralized reliability manager named R3M (Run-time Reconfigurable Resource Manager), corresponding tool-suite and increased flexibility in the design.
|
9 |
A transparent and energy aware reconfigurable multiprocessor platform for efficient ILP and TLP exploitationRutzig, Mateus Beck January 2012 (has links)
As the number of embedded applications is increasing, the current strategy of several companies is to launch a new platform within short periods, to execute the application set more efficiently, with low energy consumption. However, for each new platform deployment, new tool chains must come along, with additional libraries, debuggers and compilers. This strategy implies in high hardware redesign costs, breaks binary compatibility and results in a high overhead in the software development process. Therefore, focusing on area savings, low energy consumption, binary compatibility maintenance and mainly software productivity improvement, we propose the exploitation of Custom Reconfigurable Arrays for Multiprocessor System (CReAMS). CReAMS is composed of multiple adaptive reconfigurable systems to efficiently explore Instruction and Thread Level Parallelism (ILP and TLP) at hardware level, in a totally transparent fashion. Conceived as homogeneous organization, CReAMS shows a reduction of 37% in energy-delay product (EDP) compared to an ordinary multiprocessing platform when assuming the same chip area. When a variety of processor with different capabilities on exploiting ILP are coupled in a single die, conceiving CReAMS as a heterogeneous organization, performance improvements of up to 57% and energy savings of up to 36% are showed in comparison with the homogenous platform. In addition, the efficiency of the adaptability provided by CReAMS is demonstrated in a comparison to a multiprocessing system composed of 4- issue Out-of-Order SparcV8 processors, 28% of performance improvements are shown considering a power budget scenario.
|
10 |
Μεθοδολογίες σχεδίασης υψηλής απόδοσης για ενσωματωμένες πλατφόρμες / High-performance design methodologiesΓαλάνης, Μιχαήλ 06 November 2007 (has links)
Στην παρούσα διδακτορική διατριβή προτείνονται μεθοδολογίες σχεδίασης εφαρμογών σε ενσωματωμένες πλατφόρμες ειδικού σκοπού για την βελτίωση της απόδοσης εφαρμογών που εκτελούνται σε αυτές. Τα θεωρούμενα συστήματα στοχεύουν σε αριθμητικά απαιτητικές εφαρμογές, όπως είναι εφαρμογές Ψηφιακής Επεξεργασίας Σήματος και πολυμέσων. Οι περιγραφές των εφαρμογών γίνεται σε γλώσσα υψηλού επιπέδου γεγονός που διευκολύνει την υλοποίηση των εφαρμογών στις θεωρούμενες επεξεργαστικές πλατφόρμες. Οι μεθοδολογίες έχουν αυτοματοποιηθεί, με την χρήση πρωτότυπων και εμπορικά διαθέσιμων εργαλείων, για την αποτελεσματική και γρήγορη αποτίμηση των λύσεων σχεδίασης και απεικόνισης.
Αρχικά, προτείνεται μια μέθοδος για την αποτελεσματική υλοποίηση εφαρμογών Ψηφιακής Επεξεργασίας Σήματος σε ένα σύστημα μικροεπεξεργαστή που περιέχει σαν επιταχυντή κρίσιμων τμημάτων ένα ευέλικτο χειριστή δεδομένων (data-path). Η υπεροχή του προτεινόμενου data-path σε σχέση με υπάρχοντες χειριστές δεδομένων δείχνεται για ένα σύνολο χαρακτηριστικών αριθμητικών υπολογιστικών πυρήνων (kernels). Παρουσιάζεται μια αυτοματοποιημένη μέθοδος σύνθεσης πυρήνων για το χειριστή δεδομένων. Αυτή η διαδικασία σύνθεσης ενσωματώνεται σε ένα γενικό περιβάλλον σχεδίασης εφαρμογών για το θεωρούμενο σύστημα που έχει σαν στόχο την βελτίωση της απόδοσης και την μείωση κατανάλωση ενέργειας.
Στην συνέχεια, παρουσιάζεται ένα περιβάλλον λογισμικού που υλοποιεί μια φορμαλισμένη μεθοδολογία για τον διαχωρισμό εφαρμογών Ψηφιακής Επεξεργασίας Σήματος μεταξύ επαναπροσδιοριζόμενων τμημάτων μικτής υφής για πρώτη φορά στην βιβλιογραφία. Κρίσιμα τμήματα επιταχύνονται στο επαναπροσδιοριζόμενο υλικό χονδροειδούς υφής για να ικανοποιηθούν οι χρονικοί περιορισμοί του κώδικα της εφαρμογής που απεικονίζεται στην επαναπροσδιοριζόμενη λογική του συστήματος. Η επαναπροσδιοριζόμενη λογική λεπτής υφής υλοποιείται από ένα ενσωματωμένο Field Programmable Gate Array (FPGA), ενώ η επαναπροσδιοριζόμενη λογική χονδροειδούς υφής από ένα δικό μας αναπτυγμένο χειριστή +δεδομένων υψηλής απόδοσης. Η αποτελεσματικότητα του πρωτότυπου λογισμικού επιβεβαιώνεται χρησιμοποιώντας ρεαλιστικές εφαρμογές. Αναλυτικά πειράματα δείχνουν σημαντικές βελτιώσεις στην απόδοση, ενώ καθορισμένοι χρονικοί περιορισμοί ικανοποιούνται για όλες τις δοκιμασμένες εφαρμογές.
Παρουσιάζεται η ενσωμάτωση ενός προτεινόμενου ευέλικτου προτύπου Επαναπροσδιοριζόμενης Αρχιτεκτονικής Πίνακα (ΕΑΠ) χονδροειδούς υφής σε δύο διαφορετικά συστήματα σε ολοκληρωμένα κυκλώματα. Για την αποτελεσματική εκτέλεση υπολογιστικά απαιτητικών τμημάτων στην ΕΑΠ αναπτύχθηκε μια πρωτότυπη αυτοματοποιημένη διαδικασία απεικόνισης, που βασίζεται σε έναν νέο αλγόριθμο διοχέτευσης βρόχου. Η αποτελεσματικότητα της ΕΑΠ και της αντίστοιχης διαδικασίας απεικόνισης διαπιστώνονται με εκτέλεση ρεαλιστικών εφαρμογών. Στο πρώτο σύστημα η ΕΑΠ μαζί με ένα FPGA σχηματίζουν την επαναπροσδιοριζόμενη λογική μιας υβριδικής πλατφόρμας. Στο δεύτερο σύστημα σε ολοκληρωμένο κύκλωμα, η ΕΑΠ συνδέεται άμεσα με έναν μικροεπεξεργαστή γενικού σκοπού ενεργώντας σαν συνεπεξεργαστής για την εκτέλεση κρίσιμων βρόχων. Πρωτότυπα αυτοματοποιημένα περιβάλλοντα σχεδίασης προτείνονται για την αποτελεσματική και εύκολη υλοποίηση ολόκληρων εφαρμογών στα συστήματα.
Τέλος, προτείνεται μια πρωτότυπη μεθοδολογία διαχωρισμού υλικού/λογισμικού για την βελτίωση της απόδοσης ρεαλιστικών εφαρμογών σε ένα ενσωματωμένο σύστημα σε ολοκληρωμένο κύκλωμα που αποτελείται από έναν προγραμματιζόμενο μικροεπεξεργαστή και FPGA επαναπροσδιοριζόμενη λογική. Η μεθοδολογία έχει αυτοματοποιηθεί σε μεγάλο βαθμό με την χρήση ακαδημαϊκών και εμπορικών εργαλείων. Το FPGA ενεργεί σαν επιταχυντής κρίσιμων τμημάτων κώδικα βελτιώνοντας την απόδοση των εφαρμογών κοντά σε θεωρητικά μέγιστα όρια επιταχύνσεων. Αναλυτικά πειράματα με διαφορετικού τύπου μικροεπεξεργαστές και FPGA δείχνουν την αποτελεσματικότητα της μεθοδολογίας. / In this Ph.D. dissertation, design methodologies for embedded platforms with the aim of improving the performance of realistic applications executed on them are proposed. The considered system platforms target on arithmetic intensive applications, as in the case of Digital Signal Processing and multimedia applications. The applications are coded in a high-level language, fact that eases the implementation of applications in the considered processing platforms. The methodologies have been automated, with the usage of prototype and commercial tools, for the efficient and rapid evaluation of the design and mapping solutions.
Initially, a method is proposed for the effective implementation of Digital Signal Processing applications on a microprocessor system that includes as an accelerator of critical application parts a flexible data-path. The effectiveness of the proposed data-path relative to existing ones is illustrated for a set of characteristic arithmetic intensive kernels. An automated synthesis methodology for kernels is presented. This synthesis method is incorporated on a design flow for the considered system that aims in improving application performance and reducing energy consumption.
Afterwards, a software framework that implements a formalized methodology for partitioning Digital Signal Processing and multimedia applications between mixed granularity reconfigurable hardware parts is presented. Critical application parts are accelerated on the coarse-grained reconfigurable hardware for satisfying timing constraints of application code mapped on the reconfigurable logic of the platform. The fine-grained reconfigurable hardware is implemented by an embedded Field Programmable Gate Array (FPGA), whereas the coarse-grained reconfigurable logic by an our-developed high-performance reconfigurable data-path. The efficiency of the prototype software is justified using realistic applications. Analytical experiments illustrate that important performance improvements are achieved, while the targeted timing constraints are satisfied for all the tested applications.
The incorporation of a proposed flexible template of a Coarse Grained Reconfigurable Array (CGRA) in two different system on chip is presented. For the efficient execution of computational intensive parts on the CGRA an automated mapping process, that it is based on a software loop pipelining algorithm, is developed. The efficiency of the CGRA and of its respective mapping procedure are realized with the execution of real-life applications. In the first system, the CGRA together with an FPGA form the reconfigurable logic of a hybrid platform. In the second system on chip, the CGRA is directly attached to a general purposed microprocessor acting as a co-processor for the execution of critical loops. Automated design frameworks are proposed for the efficient and straightforward implementation of complete applications on the systems.
Finally, a hardware/software partitioning methodology is proposed for the performance improvements of realistic applications in an embedded system on chip that it is composed by a programmable microprocessor and an FPGA reconfigurable hardware. The methodology has been automated in a large extend with the usage of academic and commercial tools. The FPGA acts as an accelerator for critical code segments improving by this way the performance of applications close to maximum theoretical bounds. Extensive experiments with different types of microprocessors and FPGAs show the effectiveness of the methodology.
|
Page generated in 0.108 seconds