Global ETD Search

41	Energy efficient instruction decoding in application: Specific instruction - set processors / Αποκωδικοποίηση εντολών για χαμηλή κατανάλωση ενέργειας σε επεξεργαστές συνόλου εντολών ειδικού σκοπού Κάργας, Χρήστος 04 September 2013 (has links) With commercial processor design tools, a designer can quickly design a C- programmable ASIP for a specific application domain. There are several such ASIPs available for both wireless (UWB baseband processing), encryption, and biomedical processing (particularly for ECG beat detection). In traditional CPUs and DSPs the impact of the instruction-set definition and the complexity of the instruction decoder can be substantial, especially in terms of power consumption. Fully orthogonal VLIW processors, do not incur the cost of an instruction decoder that severely. Instead the instruction word becomes very large, thereby shifting the (power-)cost to the program memory or instruction cache. For the purposes of this thesis a SIMD processor is developed and is compared to a soft-SIMD to observe its area, performance and energy efficiency for a bioimaging benchmark and how the processor description in the ASIP language nML, defines the generated HDL. This SIMD processor is turned into orthogonal and using iterative experiments it is investigated, what is the impact on power while manipulating the instruction-set architecture in combination with the program memory size. It is also investigated how instruction-set re-configuration can be exploited to improve power efficiency. Using this investigation guidelines for low-power ASIP design can be produced. / Με τη σύγχρονη τεχνολογία σχεδιασμού επεξεργαστών, ο σχεδιαστής μπορεί με ευκολία να σχεδιάσει ένα προγραμματιζόμενο Επεξεργαστή Συνόλου Εντολών Ειδικού Σκοπού (ASIP - Application-Specific Instruction-set Processor) για ένα συγκεκριμένο εύρος εφαρμογών. Υπάρχουν διάφοροι τέτοιοι επεξεργαστές διαθέσιμοι για ασύρματες εφαρμογές, κρυπτογράφηση και βιοϊατρικές εφαρμογές (π.χ. στον αλγόριθμο εντοπισμού χτύπου ηλεκτροκαρδιογραφήματος). Στους παραδοσιακούς επεξεργαστές και επεξεργαστές σήματος (DSP - Digital Signal Processor) ο ορισμός του συνόλου εντολών και η πολυπλοκότητα έχουν μεγάλη επίδραση, ειδικά στην κατανάλωση ισχύος. Μία πιθανή λύση σε αυτό το πρόβλημα είναι οι ορθογώνιοι επεξεργαστές μεγάλου μεγέθους λέξης εντολής (VLIW - Very Large Instruction Word). Με τον όρο ορθογώνιο επεξεργαστή, ορίζεται ένας επεξεργαστής οριζόντιου σύνολου εντολών, άρα ένας επεξεργαστής στον οποίο μπορεί να υπάρξει κάθε διαθέσιμος συνδυασμός μεταξύ των διαθέσιμων εντολών και των μεθόδων διευθυνσιοδότησης για πρόσβαση στη μνήμη και το αρχείο καταχωρητών. Οι ορθογώνιοι επεξεργαστές δεν επιβαρύνουν τόσο τον αποκωδικοποιητή εντολών. Αντί αυτού το μέγεθος της λέξης της εντολής γίνεται πολύ μεγάλο, και έτσι μετατίθεται το ενεργειακό κόστος στην μνήμη εντολών προγράμματος (program memory )ή την κρυφή μνήμη εντολών προγράμματος (instruction cache). Για τους σκοπούς αυτής της διπλωματικής εργασίας, αναπτύχθηκε ένας επεξεργαστής SIMD, ο οποίος συγκρίνεται με έναν soft-SIMD για να μελετηθούν η απαιτούμενη περιοχή στο ενσωματωμένο, επιδόσεις και κατανάλωση ενέργειας για μία βιοϊατρική εφαρμογή, καθώς και το πως η περιγραφή ενός επεξεργαστή στη γλώσσα περιγραφής επεξεργαστών ASIP nML ορίζει την παραγούμενη γλώσσα περιγραφής υλικού (HDL - Hardware Description Language). Ο επεξεργαστής αυτός μετατρέπεται σε ορθογώνιο, και με τη χρήση επαναληπτικών πειραμάτων μελετάται η επίδραση στην κατανάλωση ενέργειας κατά τη διάρκεια αλλαγών στην αρχιτεκτονική του συνόλου εντολών και του μεγέθους της μνήμης εντολών προγράμματος. Ακόμη μελετάται πως μπορεί να εκμεταλλευτεί ο σχεδιαστής την αναδιάρθρωση του συνόλου εντολών για να βελτιώσει την κατανάλωση ενέργειας. Processors Low energy consumption Orthogonal instruction-set VLIW ASIP 621.391 Επεξεργαστές
42	Geração de processador para aplicacao especifica / Application specific processor generation Kreutz, Marcio Eduardo January 1997 (has links) Este trabalho propõe a geração de uma arquitetura dedicada a aplicações específicas, baseadas no microcontrolador MCS8051. Por ser utilizado na solução de problemas em indústrias locais, este processador foi escolhido para servir como base em um sistema dedicado. O 8051 dedicado gerado deverá permitir a integração completa do sistema, proporcionando um aumento do valor agregado e, conseqüentemente, a diminuição do custo. Busca-se com a otimização da arquitetura obter um conjunto de instruções reduzido, construído com as instruções mais utilizadas em cada aplicação. O objetivo principal da otimização do conjunto de instruções está relacionado ao fato de que os circuitos decodificadores e geradores de microcódigo da parte de controle ocupam uma área significativa do processador. Uma otimização no sentido de reduzir-se o conjunto de instruções, portanto, resulta numa economia de área, o que vem de encontro com a idéia da integração completa do sistema com o processador. Um processador dedicado a aplicações específicas (ASIP) irá possuir um custo maior do que a sua versão original, devido as otimizações realizadas. Para compensar este custo, uma alternativa a seguir é a integração completa do sistema. Um Sistema Integrado para Aplicações Específicas (SIAE) torna-se desejável, pois aumentando o valor agregado do circuito possibilita-se a redução do custo pela eliminação de conexões da placa, do encapsulamento de outros circuitos, entre outros motivos. Todavia, para que um SIAE possa ser construído com um custo aceitável, é necessário que seja construído em uma área que não exceda muito a área original do processador. Tenta-se fazer isto neste trabalho, através da implementação de aplicações com poucas instruções diferentes. Por ser uma arquitetura comercial, o 8051 possui um grande parque de software desenvolvido e resolvendo problemas. Isto pode ser considerado uma vantagem pois, software básicos como por exemplo, compiladores, já estão desenvolvidos. Outra vantagem é o grande número de engenheiros treinados na sua utilização. Desse modo, torna-se necessária a criação de uma compatibilidade de software, para preservar o que já está desenvolvido. Uma vez que a programação em nível de linguagem montadora tende a constituir-se em uma tarefa cansativa e sujeita a erros, é desejável que se tenha uma compatibilidade em alto nível, ou seja, através de um compilador. Para criar a compatibilidade de SW necessária é realizada a otimização de um compilador C desenvolvido para o 8051. A escolha pela linguagem C deve-se ao fato de sua grande utilização. O compilador C otimizado procura utilizar um conjunto de instruções reduzido para obter a economia de área. Quando uma instrução necessita ser utilizada e não está presente no conjunto de instruções desejado, o compilador tenta substituí-la por outra(s). Um conjunto de instruções é utilizado para cada aplicação, sendo constituído pelas instruções mais utilizadas por esta. Para determinar as instruções mais utilizadas de cada aplicação é realizada uma análise estática sobre um código em linguagem montadora previamente compilado. As instruções implementadas serão sempre parte do conjunto de instruções original do 8051, de modo que novas instruções não serão criadas.Um programa em linguagem montadora gerado com um conjunto de instruções reduzido (RISC) normalmente terá um número maior de instruções do que o seu 10 equivalente com o conjunto de instruções completo (CISC). Isto ocorre porque possivelmente algumas substituições de uma instrução por outras, terão que ser realizadas. Como as instruções que serão utilizadas nas substituições pertencem ao conjunto de instruções original, o programa gerado com o compilador otimizado poderá executar em um tempo maior do que se fosse compilado com o código CISC. Para compensar esse atraso foi implementado um pipeline de instruções para o 8051. Este trabalho apresenta resultados da Síntese Lógica em Standard Cell e FPGA da arquitetura otimizada. Além disso, resultados de programas em linguagem montadora gerados com o compilador otimizado, são também apresentados. / This work discusses a processor for specific applications architecture, based on the MCS8051 microcontroller. This processor is used as a solution for many local industry applications, being the base of dedicated systems. The dedicated 8051 generated should allow complete integration of the system, and with the added value to the chip, reduced costs. The architecture optimization will produce as result a reduced instruction set, made by the often used instructions for each application. The main instruction set optimization goal refers to the instrucions decoders and microcode generators in the control part, because a large area in the processor is needed to implement them. Thus, a reduced instruction set will allow area savings, making possible the complete system integration in a chip. An ASIP architecture will have a higher cost than the original one. An alternative to solve this problem is add value to the chip, creating an Application Specific Integrated System (ASIS). An ASIS can be made with a acceptable cost, if it’s possible to integrate other circuits to the chip without area increase. This can be done in the area saved by using fewer implemented instructions. Because the 8051 is a commercial architecture, there is a large amount of software developed for it. This can be considered an advantage because basic softwares like compilers are available, being not necessary to create them. Another advantage refers to the large number of engineers trained to use the 8051. To preserve the already developed applications it’s necessary to mantain software compatibility. Assembler level programming is very boring an error prone task, being desirable to have software compatibility at higher levels through the use of high level languages. To create the necessary SW compatibility, a C compiler developed for 8051 was optimized. The chose for C language refers to its large utilization. The optimized C compiler tries to use a reduced instruction set, formed with the most important instructions for each application, in order ro save area. When an instruction needs to be used in an application, and it’s not present in the instruction set, the compiler tries to replace it with other instructions. The compiler will not use instructions not present in the original 8051 instruction set. So, new instrucions will be not created. To create an instruction set formed with the most important instructions for each application, a static analysis is made on a precompiled assembler source. An assembler source generated with a reduced instruction set (RISC) will probably have more instructions than the same assembler generated with a full instruction set (CISC). This can be explained because of the replacements instruction. If one instruction is replaced by other two, and these are from the original instruction set, probably the time needed to execute them would be higher. In order to deal with this problem, an instruction pipeline was implemented to the 8051. This work presents Standard Cells and FPGA results of Logic Synthesis of the optimized architecture. Also, assembly programs generated by the optimized compiler are presented. Microeletrônica Microcontroladores Application specific microcontrollers Pipeline Application specific integrated system MCS8051 microcontroller
43	Geração de processador para aplicacao especifica / Application specific processor generation Kreutz, Marcio Eduardo January 1997 (has links) Este trabalho propõe a geração de uma arquitetura dedicada a aplicações específicas, baseadas no microcontrolador MCS8051. Por ser utilizado na solução de problemas em indústrias locais, este processador foi escolhido para servir como base em um sistema dedicado. O 8051 dedicado gerado deverá permitir a integração completa do sistema, proporcionando um aumento do valor agregado e, conseqüentemente, a diminuição do custo. Busca-se com a otimização da arquitetura obter um conjunto de instruções reduzido, construído com as instruções mais utilizadas em cada aplicação. O objetivo principal da otimização do conjunto de instruções está relacionado ao fato de que os circuitos decodificadores e geradores de microcódigo da parte de controle ocupam uma área significativa do processador. Uma otimização no sentido de reduzir-se o conjunto de instruções, portanto, resulta numa economia de área, o que vem de encontro com a idéia da integração completa do sistema com o processador. Um processador dedicado a aplicações específicas (ASIP) irá possuir um custo maior do que a sua versão original, devido as otimizações realizadas. Para compensar este custo, uma alternativa a seguir é a integração completa do sistema. Um Sistema Integrado para Aplicações Específicas (SIAE) torna-se desejável, pois aumentando o valor agregado do circuito possibilita-se a redução do custo pela eliminação de conexões da placa, do encapsulamento de outros circuitos, entre outros motivos. Todavia, para que um SIAE possa ser construído com um custo aceitável, é necessário que seja construído em uma área que não exceda muito a área original do processador. Tenta-se fazer isto neste trabalho, através da implementação de aplicações com poucas instruções diferentes. Por ser uma arquitetura comercial, o 8051 possui um grande parque de software desenvolvido e resolvendo problemas. Isto pode ser considerado uma vantagem pois, software básicos como por exemplo, compiladores, já estão desenvolvidos. Outra vantagem é o grande número de engenheiros treinados na sua utilização. Desse modo, torna-se necessária a criação de uma compatibilidade de software, para preservar o que já está desenvolvido. Uma vez que a programação em nível de linguagem montadora tende a constituir-se em uma tarefa cansativa e sujeita a erros, é desejável que se tenha uma compatibilidade em alto nível, ou seja, através de um compilador. Para criar a compatibilidade de SW necessária é realizada a otimização de um compilador C desenvolvido para o 8051. A escolha pela linguagem C deve-se ao fato de sua grande utilização. O compilador C otimizado procura utilizar um conjunto de instruções reduzido para obter a economia de área. Quando uma instrução necessita ser utilizada e não está presente no conjunto de instruções desejado, o compilador tenta substituí-la por outra(s). Um conjunto de instruções é utilizado para cada aplicação, sendo constituído pelas instruções mais utilizadas por esta. Para determinar as instruções mais utilizadas de cada aplicação é realizada uma análise estática sobre um código em linguagem montadora previamente compilado. As instruções implementadas serão sempre parte do conjunto de instruções original do 8051, de modo que novas instruções não serão criadas.Um programa em linguagem montadora gerado com um conjunto de instruções reduzido (RISC) normalmente terá um número maior de instruções do que o seu 10 equivalente com o conjunto de instruções completo (CISC). Isto ocorre porque possivelmente algumas substituições de uma instrução por outras, terão que ser realizadas. Como as instruções que serão utilizadas nas substituições pertencem ao conjunto de instruções original, o programa gerado com o compilador otimizado poderá executar em um tempo maior do que se fosse compilado com o código CISC. Para compensar esse atraso foi implementado um pipeline de instruções para o 8051. Este trabalho apresenta resultados da Síntese Lógica em Standard Cell e FPGA da arquitetura otimizada. Além disso, resultados de programas em linguagem montadora gerados com o compilador otimizado, são também apresentados. / This work discusses a processor for specific applications architecture, based on the MCS8051 microcontroller. This processor is used as a solution for many local industry applications, being the base of dedicated systems. The dedicated 8051 generated should allow complete integration of the system, and with the added value to the chip, reduced costs. The architecture optimization will produce as result a reduced instruction set, made by the often used instructions for each application. The main instruction set optimization goal refers to the instrucions decoders and microcode generators in the control part, because a large area in the processor is needed to implement them. Thus, a reduced instruction set will allow area savings, making possible the complete system integration in a chip. An ASIP architecture will have a higher cost than the original one. An alternative to solve this problem is add value to the chip, creating an Application Specific Integrated System (ASIS). An ASIS can be made with a acceptable cost, if it’s possible to integrate other circuits to the chip without area increase. This can be done in the area saved by using fewer implemented instructions. Because the 8051 is a commercial architecture, there is a large amount of software developed for it. This can be considered an advantage because basic softwares like compilers are available, being not necessary to create them. Another advantage refers to the large number of engineers trained to use the 8051. To preserve the already developed applications it’s necessary to mantain software compatibility. Assembler level programming is very boring an error prone task, being desirable to have software compatibility at higher levels through the use of high level languages. To create the necessary SW compatibility, a C compiler developed for 8051 was optimized. The chose for C language refers to its large utilization. The optimized C compiler tries to use a reduced instruction set, formed with the most important instructions for each application, in order ro save area. When an instruction needs to be used in an application, and it’s not present in the instruction set, the compiler tries to replace it with other instructions. The compiler will not use instructions not present in the original 8051 instruction set. So, new instrucions will be not created. To create an instruction set formed with the most important instructions for each application, a static analysis is made on a precompiled assembler source. An assembler source generated with a reduced instruction set (RISC) will probably have more instructions than the same assembler generated with a full instruction set (CISC). This can be explained because of the replacements instruction. If one instruction is replaced by other two, and these are from the original instruction set, probably the time needed to execute them would be higher. In order to deal with this problem, an instruction pipeline was implemented to the 8051. This work presents Standard Cells and FPGA results of Logic Synthesis of the optimized architecture. Also, assembly programs generated by the optimized compiler are presented. Microeletrônica Microcontroladores Application specific microcontrollers Pipeline Application specific integrated system MCS8051 microcontroller
44	Placement de tâches dynamique et flexible sur processeur multicoeur asymétrique en fonctionnalités / Dynamic and flexible task mapping on functionally asymmetric multi-core processor Aminot, Alexandre 01 October 2015 (has links) Pour répondre aux besoins de plus en plus hétérogènes des applications (puissance et efficacité énergétique), nous nous intéressons dans cette thèse aux architectures émergentes de type multi-cœur asymétrique en fonctionnalités (FAMP). Ces architectures sont caractérisées par une mise en œuvre non-uniforme des extensions matérielles dans les cœurs (ex. unitée de calculs à virgule flottante (FPU)). Les avantages en surface sont apparents, mais qu'en est-il de l'impact au niveau logiciel, énergétique et performance?Pour répondre à ces questions, la thèse explore la nature de l'utilisation des extensions dans des applications de l'état de l'art et compare différentes méthodes existantes. Pour optimiser le placement de tâches et ainsi augmenter l'efficacité, la thèse propose une solution dynamique au niveau ordonnanceur, appelée ordonnanceur relaxé.Les extensions matérielles sont intéressantes car elles permettent des accélérations difficilement atteignables par la parallélisation sur un multi-cœur. Néanmoins, leurs utilisations par les applications sont faibles et leur coût en termes de surface et consommation énergétique sont importants.En se basant sur ces observations, les points suivants ont été développés:Nous présentons une étude approfondie sur l'utilisation de l'extension vectorielle et FPU dans des applications de l'état de l'artNous comparons plusieurs solutions de gestion des extensions à différent niveaux de granularité temporelle d'action pour comprendre les limites de ces solutions et ainsi définir à quel niveau il faut agir. Peu d'études traitent la question de la granularité d'action pour gérer les extensions.Nous proposons une solution pour estimer en ligne la dégradation de performance à exécuter une tâche sur un cœur sans extension. Afin de permettre la mise à l'échelle des multi-cœurs, le système d'exploitation doit avoir de la flexibilité dans le placement de tâches. Placer une tâche sur un cœur sans extension peut avoir d'importantes conséquences en énergie et en performance. Or à ce jour, il n'existe pas de solution pour estimer cette potentielle dégradation.Nous proposons un ordonnanceur relaxé, basé notre modèle d'estimation de dégradation, qui place les tâches sur un ensemble de cœurs hétérogènes de manière efficace. Nous étudions la flexibilité gagnée ainsi que les conséquences aux niveaux performances et énergie.Les solutions existantes proposent des méthodes pour placer les tâches sur un ensemble de cœurs hétérogènes, or, celles-ci n'étudient pas le compromis entre qualité de service et gain en consommation pour les architectures FAMP.Nos expériences sur simulateur ont montré que l'ordonnanceur peut atteindre une flexibilité de placement significative avec une dégradation en performance de moins de 2%. Comparé à un multi-cœur symétrique, notre solution permet un gain énergétique moyen au niveau cœur de 11 %. Ces résultats sont très encourageant et contribuent au développement d'une plateforme complète FAMP. Cette thèse a fait l'objet d'un dépôt de brevet, de trois communications scientifiques internationales (plus une en soumission), et a contribué à deux projets européens. / To meet the increasingly heterogeneous needs of applications (in terms of power and efficiency), this thesis focus on the emerging functionally asymmetric multi-core processor (FAMP) architectures. These architectures are characterized by non-uniform implementation of hardware extensions in the cores (ex. Floating Point Unit (FPU)). The area savings are apparent, but what about the impact in software, energy and performance?To answer these questions, the thesis investigates the nature of the use of extensions in state-of-the-art's applications and compares various existing methods. To optimize the tasks mapping and increase efficiency, the thesis proposes a dynamic solution at scheduler level, called relaxed scheduler.Hardware extensions are valuable because they speed up part of code where the parallelization on multi-core isn't efficient. However, the hardware extensions are under-exploited by applications and their cost in terms of area and power consumption are important.Based on these observations, the following contributions have been proposed:We present a detailed study on the use of vector and FPU extensions in state-of-the-art's applicationsWe compare multiple extension management solutions at different levels of temporal granularity of action, to understand the limitations of these solutions and thus define at which level we must act. Few studies address the issue of the granularity of action to manage extensions.We offer a solution for estimating online performance degradation to run a task on a core without a given extension. To allow the scalability of multi-core, the operating system must have flexibility in the placement of tasks. Placing a task on a core with no extension can have important consequences for energy and performance. But to date, there is no way to estimate this potential degradation.We offer a relaxed scheduler, based on our degradation estimation model, which maps the tasks on a set of heterogeneous cores effectively. We study the flexibility gained and the implications for performance and energy levels. Existing solutions propose methods to map tasks on a heterogeneous set of cores, but they do not study the tradeoff between quality of service and consumption gain for FAMP architectures.Our experiments with simulators have shown that the scheduler can achieve a significantly higher mapping flexibility with a performance degradation of less than 2 %. Compared to a symmetrical multi-core, our solution enables an average energy gain at core level of 11 %. These results are very encouraging and contribute to the development of a comprehensive FAMP platform . This thesis has been the subject of a patent application, three international scientific communications (plus one submission), and contributes to two active european projects. Architecture hétérogènes Jeux d'instructions Efficacité énergétique Multi-Coeurs Ordonnanceur Heterogeneous Architecture Instruction set Power Efficiency Multicore Scheduler 621
45	Metodologia para adapta??o de microarquiteturas microprogramadas soft-core ? uma ISA padr?o: estudo do impacto sobre a complexidade de hardware para o padr?o MIPS Casillo, Leonardo Augusto 11 July 2013 (has links) Made available in DSpace on 2014-12-17T14:55:12Z (GMT). No. of bitstreams: 1 LeonardoAC_TESE.pdf: 2904145 bytes, checksum: 71e85899a3f895d52ccee194769fd506 (MD5) Previous issue date: 2013-07-11 / In academia, it is common to create didactic processors, facing practical disciplines in the area of Hardware Computer and can be used as subjects in software platforms, operating systems and compilers. Often, these processors are described without ISA standard, which requires the creation of compilers and other basic software to provide the hardware / software interface and hinder their integration with other processors and devices. Using reconfigurable devices described in a HDL language allows the creation or modification of any microarchitecture component, leading to alteration of the functional units of data path processor as well as the state machine that implements the control unit even as new needs arise. In particular, processors RISP enable modification of machine instructions, allowing entering or modifying instructions, and may even adapt to a new architecture. This work, as the object of study addressing educational soft-core processors described in VHDL, from a proposed methodology and its application on two processors with different complexity levels, shows that it s possible to tailor processors for a standard ISA without causing an increase in the level hardware complexity, ie without significant increase in chip area, while its level of performance in the application execution remains unchanged or is enhanced. The implementations also allow us to say that besides being possible to replace the architecture of a processor without changing its organization, RISP processor can switch between different instruction sets, which can be expanded to toggle between different ISAs, allowing a single processor become adaptive hybrid architecture, which can be used in embedded systems and heterogeneous multiprocessor environments / No meio acad?mico, ? comum a cria??o de processadores denominados did?ticos, voltados para pr?ticas de disciplinas de hardware na ?rea de Computa??o e que podem ser utilizados como plataformas em disciplinas de softwares, sistemas operacionais e compiladores. Muitas vezes, tais processadores s?o descritos sem uma ISA padr?o, o que exige a cria??o de compiladores e outros softwares b?sicos para prover a interface hardware/software dificultando sua integra??o com outros processadores e demais dispositivos. Utilizar dispositivos reconfigur?veis descritos em uma linguagem do tipo HDL permitem a cria??o ou modifica??o de qualquer componente da microarquitetura, ocasionando a altera??o das unidades funcionais do caminho de dados que representa a parte operativa de um processador, bem como da m?quina de estados que implementa a unidade de controle do mesmo conforme surgem novas necessidades. Em particular, os processadores RISP possibilitam a altera??o das instru??es da m?quina, permitindo inserir ou modificar instru??es, podendo at? mesmo se adaptar a uma nova arquitetura. Este trabalho aborda como objeto de estudo dois processadores did?ticos soft-core descritos em VHDL com diferentes n?veis de complexidade de hardware adaptados a uma ISA padr?o a partir de uma metodologia proposta sem provocar aumento no n?vel de complexidade do hardware, ou seja, sem o acr?scimo significativo da ?rea em chip, ao mesmo tempo em que o seu n?vel de desempenho na execu??o de aplica??es permanece inalterado ou ? aprimorado. As modifica??es tamb?m permitem afirmar que, al?m de ser poss?vel substituir a arquitetura de um processador sem alterar sua organiza??o, um processador RISP pode alternar entre diferentes conjuntos de instru??o, o que pode ser expandido para altern?ncia entre diferentes ISAs, permitindo a um mesmo processador se tornar uma arquitetura h?brida adaptativa, pass?vel de ser utilizada em sistemas embarcados e ambientes multiprocessados heterog?neos CNPQ::ENGENHARIAS::ENGENHARIA ELETRICA
46	Geração de processador para aplicacao especifica / Application specific processor generation Kreutz, Marcio Eduardo January 1997 (has links) Este trabalho propõe a geração de uma arquitetura dedicada a aplicações específicas, baseadas no microcontrolador MCS8051. Por ser utilizado na solução de problemas em indústrias locais, este processador foi escolhido para servir como base em um sistema dedicado. O 8051 dedicado gerado deverá permitir a integração completa do sistema, proporcionando um aumento do valor agregado e, conseqüentemente, a diminuição do custo. Busca-se com a otimização da arquitetura obter um conjunto de instruções reduzido, construído com as instruções mais utilizadas em cada aplicação. O objetivo principal da otimização do conjunto de instruções está relacionado ao fato de que os circuitos decodificadores e geradores de microcódigo da parte de controle ocupam uma área significativa do processador. Uma otimização no sentido de reduzir-se o conjunto de instruções, portanto, resulta numa economia de área, o que vem de encontro com a idéia da integração completa do sistema com o processador. Um processador dedicado a aplicações específicas (ASIP) irá possuir um custo maior do que a sua versão original, devido as otimizações realizadas. Para compensar este custo, uma alternativa a seguir é a integração completa do sistema. Um Sistema Integrado para Aplicações Específicas (SIAE) torna-se desejável, pois aumentando o valor agregado do circuito possibilita-se a redução do custo pela eliminação de conexões da placa, do encapsulamento de outros circuitos, entre outros motivos. Todavia, para que um SIAE possa ser construído com um custo aceitável, é necessário que seja construído em uma área que não exceda muito a área original do processador. Tenta-se fazer isto neste trabalho, através da implementação de aplicações com poucas instruções diferentes. Por ser uma arquitetura comercial, o 8051 possui um grande parque de software desenvolvido e resolvendo problemas. Isto pode ser considerado uma vantagem pois, software básicos como por exemplo, compiladores, já estão desenvolvidos. Outra vantagem é o grande número de engenheiros treinados na sua utilização. Desse modo, torna-se necessária a criação de uma compatibilidade de software, para preservar o que já está desenvolvido. Uma vez que a programação em nível de linguagem montadora tende a constituir-se em uma tarefa cansativa e sujeita a erros, é desejável que se tenha uma compatibilidade em alto nível, ou seja, através de um compilador. Para criar a compatibilidade de SW necessária é realizada a otimização de um compilador C desenvolvido para o 8051. A escolha pela linguagem C deve-se ao fato de sua grande utilização. O compilador C otimizado procura utilizar um conjunto de instruções reduzido para obter a economia de área. Quando uma instrução necessita ser utilizada e não está presente no conjunto de instruções desejado, o compilador tenta substituí-la por outra(s). Um conjunto de instruções é utilizado para cada aplicação, sendo constituído pelas instruções mais utilizadas por esta. Para determinar as instruções mais utilizadas de cada aplicação é realizada uma análise estática sobre um código em linguagem montadora previamente compilado. As instruções implementadas serão sempre parte do conjunto de instruções original do 8051, de modo que novas instruções não serão criadas.Um programa em linguagem montadora gerado com um conjunto de instruções reduzido (RISC) normalmente terá um número maior de instruções do que o seu 10 equivalente com o conjunto de instruções completo (CISC). Isto ocorre porque possivelmente algumas substituições de uma instrução por outras, terão que ser realizadas. Como as instruções que serão utilizadas nas substituições pertencem ao conjunto de instruções original, o programa gerado com o compilador otimizado poderá executar em um tempo maior do que se fosse compilado com o código CISC. Para compensar esse atraso foi implementado um pipeline de instruções para o 8051. Este trabalho apresenta resultados da Síntese Lógica em Standard Cell e FPGA da arquitetura otimizada. Além disso, resultados de programas em linguagem montadora gerados com o compilador otimizado, são também apresentados. / This work discusses a processor for specific applications architecture, based on the MCS8051 microcontroller. This processor is used as a solution for many local industry applications, being the base of dedicated systems. The dedicated 8051 generated should allow complete integration of the system, and with the added value to the chip, reduced costs. The architecture optimization will produce as result a reduced instruction set, made by the often used instructions for each application. The main instruction set optimization goal refers to the instrucions decoders and microcode generators in the control part, because a large area in the processor is needed to implement them. Thus, a reduced instruction set will allow area savings, making possible the complete system integration in a chip. An ASIP architecture will have a higher cost than the original one. An alternative to solve this problem is add value to the chip, creating an Application Specific Integrated System (ASIS). An ASIS can be made with a acceptable cost, if it’s possible to integrate other circuits to the chip without area increase. This can be done in the area saved by using fewer implemented instructions. Because the 8051 is a commercial architecture, there is a large amount of software developed for it. This can be considered an advantage because basic softwares like compilers are available, being not necessary to create them. Another advantage refers to the large number of engineers trained to use the 8051. To preserve the already developed applications it’s necessary to mantain software compatibility. Assembler level programming is very boring an error prone task, being desirable to have software compatibility at higher levels through the use of high level languages. To create the necessary SW compatibility, a C compiler developed for 8051 was optimized. The chose for C language refers to its large utilization. The optimized C compiler tries to use a reduced instruction set, formed with the most important instructions for each application, in order ro save area. When an instruction needs to be used in an application, and it’s not present in the instruction set, the compiler tries to replace it with other instructions. The compiler will not use instructions not present in the original 8051 instruction set. So, new instrucions will be not created. To create an instruction set formed with the most important instructions for each application, a static analysis is made on a precompiled assembler source. An assembler source generated with a reduced instruction set (RISC) will probably have more instructions than the same assembler generated with a full instruction set (CISC). This can be explained because of the replacements instruction. If one instruction is replaced by other two, and these are from the original instruction set, probably the time needed to execute them would be higher. In order to deal with this problem, an instruction pipeline was implemented to the 8051. This work presents Standard Cells and FPGA results of Logic Synthesis of the optimized architecture. Also, assembly programs generated by the optimized compiler are presented. Microeletrônica Microcontroladores Application specific microcontrollers Pipeline Application specific integrated system MCS8051 microcontroller
47	A virtual machine framework for domain-specific languages Fick, David 19 October 2007 (has links) Experts in a field regularly apply a defined set of rules or procedures to carry out a problem-solving task or analysis on a given problem. Often the problem can be represented as a computer model, be it mathematical, chemical, or physics based, and so on. It would certainly be advantageous for a domain expert who is not proficient in software development to express solutions to problems in a domain-specific notation that can be executed as a program. Many new ideas aim to make software development easier and shift the development role closer to the end-user. One such means of development is the use of a small, intuitive programming language called a Domain-Specific Language (DSL.) This dissertation examines a generic approach to constructing a Virtual Machine (VM) to provide the runtime semantics for a particular DSL. It proposes a generic, object-oriented framework, called a VM Framework, in which to build a VM by subtyping abstract instruction and environment classes that are part of the VM Framework. The subtyped classes constitute an environment and an interface called an instruction set architecture and the instructions can access and operate on the environment in a deterministic way to provide the runtime semantics of a DSL program. Both instruction classes and environment classes encapsulate functionality of an existing domain, represented programmatically as a namespace construct. The namespace is home to related classes that provide the various concepts inherent of a domain. These are concepts understood by a domain expert and in this dissertation it is shown how they are exposed as DSL constructs. With the use of compiler writing tools, a compiler can be created for a DSL that generates an appropriate instruction sequence that can be executed by the VM. The grammar of the DSL is shown to feature constructs that allow a domain expert to express concepts of the underlying domain in an intuitive manner. The dissertation details how a VM is configured for a specific set of instructions and an environment. Instruction sets and environments can be extended creating VMs with additional semantics for DSLs that are similar, or contain subsets of semantics of other DSLs. The languages are intended to be intuitive and it is shown using examples how a specific DSL program is mapped to an instruction sequence with the instruction set architecture and environment in mind. Comparative performance in relation to other DSL implementations, including a hard-coded approach of a VM and an interpreted approach are also provided. The VM Framework is proven to be most effective in rapidly prototyping a DSL for a particular problem domain. The dissertation also provides examples of DSLs such as a real-valued expression language and a scene description language that uses a ray-tracer for rendering geometric objects onto a canvas. It is shown how the scene description language is an extension to the real-valued expression language in terms of their underlying VMs. All DSL grammars are provided. / Dissertation (MSc (Computer Science))--University of Pretoria, 2007. / Computer Science / MSc / unrestricted Domain expert Language prototyping Instruction set architecture Environment Generic framework Syntax Comparative performance Semantics Virtual machine Domain-specific language UCTD
48	Scheduling Algorithms for Instruction Set Extended Symmetrical Homogeneous Multiprocessor Systems-on-Chip Montcalm, Michael R. January 2011 (has links) Embedded system designers face multiple challenges in fulfilling the runtime requirements of programs. Effective scheduling of programs is required to extract as much parallelism as possible. These scheduling algorithms must also improve speedup after instruction-set extensions have occurred. Scheduling of dynamic code at run time is made more difficult when the static components of the program are scheduled inefficiently. This research aims to optimize a program’s static code at compile time. This is achieved with four algorithms designed to schedule code at the task and instruction level. Additionally, the algorithms improve scheduling using instruction set extended code on symmetrical homogeneous multiprocessor systems. Using these algorithms, we achieve speedups up to 3.86X over sequential execution for a 4-issue 2-processor system, and show better performance than recent heuristic techniques for small programs. Finally, the algorithms generate speedup values for a 64-point FFT that are similar to the test runs. Scheduling ILP System on Chip SoC Instruction level parallelism Integer Linear Program Custom Instruction Instruction Set Extension Multiprocessor
49	Automatic Generation of Hardware for Custom Instructions Necsulescu, Philip I January 2011 (has links) The Software/Hardware Implementation and Research Architecture (SHIRA) is a C to hardware toolchain developed by the Computer Architecture Research Group (CARG) of the University of Ottawa. The framework and algorithms to generate the hardware from an Intermediate Representation (IR) of the C code is needed. This dissertation presents the conceiving, design, and development of a module that generates the hardware for custom instructions identified by specialized SHIRA components without the need for any user interaction. The module is programmed in Java and takes a Data Flow Graph (DFG) as an IR for input. It then generates VHDL code that targets the Altera FPGAs. It is possible to use separate components for each operation or to set a maximum number for each component which leads to component reuse and reduces chip area use. The performance improvement of the generated code is compared to using only the processor’s standard instruction set. FPGA Instruction Set Extension ISE Custom Instruction Automatic Hardware Generation Assisted Hardware Generation VHDL Embedded Systems Custom Hardware
50	An empirically derived system for high-speed shadow rendering Rautenbach, Helperus Ritzema 26 June 2009 (has links) Shadows have captivated humanity since the dawn of time; with the current age being no exception – shadows are core to realism and ambience, be it to invoke a classic Baroque interplay of lights, darks and colours as the case in Rembrandt van Rijn’s Militia Company of Captain Frans Banning Cocq or to create a sense of mystery as found in film noir and expressionist cinematography. Shadows, in this traditional sense, are regions of blocked light – the combined effect of placing an object between a light source and surface. This dissertation focuses on real-time shadow generation as a subset of 3D computer graphics. Its main focus is the critical analysis of numerous real-time shadow rendering algorithms and the construction of an empirically derived system for the high-speed rendering of shadows. This critical analysis allows us to assess the relationship between shadow rendering quality and performance. It also allows for the isolation of key algorithmic weaknesses and possible bottleneck areas. Focusing on these bottleneck areas, we investigate several possibilities of improving the performance and quality of shadow rendering; both on a hardware and software level. Primary performance benefits are seen through effective culling, clipping, the use of hardware extensions and by managing the polygonal complexity and silhouette detection of shadow casting meshes. Additional performance gains are achieved by combining the depth-fail stencil shadow volume algorithm with dynamic spatial subdivision. Using this performance data gathered during the analysis of various shadow rendering algorithms, we are able to define a fuzzy logic-based expert system to control the real-time selection of shadow rendering algorithms based on environmental conditions. This system ensures the following: nearby shadows are always of high-quality, distant shadows are, under certain conditions, rendered at a lower quality and the frames per second rendering performance is always maximised. / Dissertation (MSc)--University of Pretoria, 2009. / Computer Science / unrestricted Instruction set utilisation Shadow mapping Stencil shadow volumes Spatial subdivision Shadow algorithms Expert systems Fuzzy logic UCTD

Search results