Global ETD Search

1	Register File Size Reduction through Instruction Pre-Execution Incorporating Value Prediction ANDO, Hideki, TANAKA, Yusuke 01 December 2010 (has links) No description available. register file value prediction instruction pre-execution microprocessor microarchitecture
2	RST: Reuse through Speculation on Traces / RST: Reuso Especulativo de Traces Pilla, Mauricio Lima January 2004 (has links) Na presente tese, apresentamos uma nova abordagem para combinar reuso e prvisão de seqüências dinâmicas de instruções, chamada Reuso por Especulação em traces (RST). Esta técnica permite a identificação dinâmica de traces de instruções redundantes ou previsíveis e o reuso (especulativo ou não) desses traces. RST procura resolver a questão de traces que não são reusados por seus valores de entradas de Traces (DTM). Em estudo anteriores, esses traces foram contabilizados como sendo cerca de 69% de todos os traces reusáveis. Uma das maiores vantagens de RST sobre a combinação de um mecanismo de previsão com uma técnica de reuso de valores em que mecanismos não são relacionados é que RST não necessita de tabelas adicionais para o armazenamento dos valores a serem previstos. A aplciação de reuso e previsão de valores pela simples combinação de mecanismos pode necessitar de uma quantidade proibitiva de espaço de armazenamento. No mecanismo RST, os valores já estão presentes na Tabela de Memorização de Traces, não incorrendo em custos adicionais para lê-los se comparado com uma técnica não-especulativa de reuso de traces. O contexto de entrada de cada trace (os valores de entrada de todas as instruções contidas no trace) já armazenam os valores para o teste de reuso, os quais podem ser também utilizados para previsão de valores para o teste de reuso, os quais podem ser também utilizados para previsão de valores. As principais contribuições de nosso trabalho incluem: (i) um framework de reuso especulativo de traces que pode ser modificado para diferentes arquiteturas de processadores; (ii) definição das modificações necessárias em um processador superescalar e superpipeline para implementar nosso mecanismo; (iii) estudo de questões de implementação relacionadas à essa arquitetura; (iv) estudo dos limites de desempenho da nossa técnica; (v) estudo de uma implementação RST limitada por fatores realísticos; e (vi) ferramentas de simulação que podem ser utilizadas em outros estudos, representando um processador superescalar e superpipeline em detalhes. Salientamos que, em uma arquitetura utilizando mecanismos realistas de estimativa de confiança das previsões, nossa técnica RST consegue atingir speedups médios (médias harmônicas) de 1.29 sobre uma arquitetura sem reuso e 1.09 sobre uma técnica não-especulativa de reuso de traces (DTM). / In this thesis, we present a novel approach to combine both reuse and prediction of dynamic sequences of instructions called Reuse through Speculation on Traces (RST). Our technique allows the dynamic identification of instruction traces that are redundant or predictable, and the reuse (speculative or not) of these traces. RST addresses the issue, present on Dynamic Trace Memoization (DTM), of traces not being reused because some of their inputs are not ready for the reuse test. These traces were measured to be 69% of all reusable traces in previous studies. One of the main advantages of RST over just combining a value prediction technique with an unrelated reuse technique is that RST does not require extra tables to store the values to be predicted. Applying reuse and value prediction in unrelated mechanisms but at the same time may require a prohibitive amount of storage in tables. In RST, the values are already stored in the Trace Memoization Table, and there is no extra cost in reading them if compared with a non-speculative trace reuse technique. . The input context of each trace (the input values of all instructions in the trace) already stores the values for the reuse test, which may also be used for prediction. Our main contributions include: (i) a speculative trace reuse framework that can be adapted to different processor architectures; (ii) specification of the modifications in a superscalar, superpipelined processor in order to implement our mechanism; (iii) study of implementation issues related to this architecture; (iv) study of the performance limits of our technique; (v) a performance study of a realistic, constrained implementation of RST; and (vi) simulation tools that can be used in other studies which represent a superscalar, superpipelined processor in detail. In a constrained architecture with realistic confidence, our RST technique is able to achieve average speedups (harmonic means) of 1.29 over the baseline architecture without reuse and 1.09 over a non-speculative trace reuse technique (DTM). Arquiteturas super escalares Processamento paralelo Speculative Trace Reuse Superscalar architectures Parallel processing Value reuse Value prediction
3	Reuso especulativo de traços com instruções de acesso à memória / Speculative trace reuse with memory access instructions Laurino, Luiz Sequeira January 2007 (has links) Mesmo com o crescente esforço para a detecção e tratamento de instruções redundantes, as dependências verdadeiras ainda causam um grande atraso na execução dos programas. Mecanismos que utilizam técnicas de reuso e previsão de valores têm sido constantemente estudados como alternativa para estes problemas. Dentro desse contexto destaca-se a arquitetura RST (Reuse through Speculation on Traces), aliando essas duas técnicas e atingindo um aumento significativo no desempenho de microprocessadores superescalares. A arquitetura RST original, no entanto, não considera instruções de acesso à memória como candidatas ao reuso. Desse modo, esse trabalho introduz um novo mecanismo de reuso e previsão de valores chamado RSTm (Reuse through Speculation on Traces with Memory), que estende as funcionalidades do mecanismo original, com a adição de instruções de acesso à memória ao domínio de reuso da arquitetura. Dentre as soluções analisadas, optou-se pela utilização de uma tabela dedicada (Memo_Table_L) para o armazenamento das instruções de carga/escrita. Esta solução garante boa economia de hardware, não limita o número de instruções de acesso à memória por traço e, também, armazena tanto o endereço como seu respectivo valor. Os experimentos, realizados com benchmarks do SPEC2000 integer e floating-point, mostram um crescimento de 2,97% (média harmônica) no desempenho do RSTm sobre o mecanismo original e de17,42% sobre a arquitetura base. O ganho é resultado de uma combinação de diversos fatores: traços maiores (em média, 7,75 instruções por traço; o RST original apresenta 3,17 em média), embora com taxa de reuso de aproximadamente 10,88% (inferior ao RST, que apresenta taxa de 15,23%); entretanto, a latência das instruções presentes nos traços do RSTm é maior e compensa a taxa de reuso inferior. / Even with the growing efforts to detect and handle redundant instructions, the true dependencies are still one of the bottlenecks of the computations. Value reuse and value prediction techniques have been studied in order to become an alternative to these issues. Following this approach, RST (Reuse through Speculation on Traces) combines both reuse mechanisms and has achieved some good performance improvements for superscalar processors. However, the original RST mechanism does not consider load/store instructions as reuse candidates. Because of this, our work presents a new value reuse and value prediction technique named RSTm (Reuse through Speculation on Traces with Memory), that extends RST and adds memory-access instructions to the reuse domain of the architecture. Among all studied solutions, we chose the approach of using a dedicated table (Memo_Table_L) to take care of the load/store instructions. This solution guarantees low hardware overhead, does not limit the number of memory-access instructions that could be stored for each trace and stores both the address and its value. From our experiments, performed with SPEC2000 integer and floating-point benchmarks, RSTm can achieve average performance improvements (harmonic means) of 2,97% over the original RST and 17,42% over the baseline architecture. These performance improvements are due to several reasons: bigger traces (in average, 7,75 per trace; the original RST has 3,17 in average), with a reuse rate of around 10,88% (less than RST, that presents reuse rate of 15,23%) because the latency of the instructions in the RSTm traces is bigger and compensates the smaller reuse rate. Arquitetura super escalares Desempenho : Computadores Processor architectures Value reuse Value prediction
4	Reuso especulativo de traços com instruções de acesso à memória / Speculative trace reuse with memory access instructions Laurino, Luiz Sequeira January 2007 (has links) Mesmo com o crescente esforço para a detecção e tratamento de instruções redundantes, as dependências verdadeiras ainda causam um grande atraso na execução dos programas. Mecanismos que utilizam técnicas de reuso e previsão de valores têm sido constantemente estudados como alternativa para estes problemas. Dentro desse contexto destaca-se a arquitetura RST (Reuse through Speculation on Traces), aliando essas duas técnicas e atingindo um aumento significativo no desempenho de microprocessadores superescalares. A arquitetura RST original, no entanto, não considera instruções de acesso à memória como candidatas ao reuso. Desse modo, esse trabalho introduz um novo mecanismo de reuso e previsão de valores chamado RSTm (Reuse through Speculation on Traces with Memory), que estende as funcionalidades do mecanismo original, com a adição de instruções de acesso à memória ao domínio de reuso da arquitetura. Dentre as soluções analisadas, optou-se pela utilização de uma tabela dedicada (Memo_Table_L) para o armazenamento das instruções de carga/escrita. Esta solução garante boa economia de hardware, não limita o número de instruções de acesso à memória por traço e, também, armazena tanto o endereço como seu respectivo valor. Os experimentos, realizados com benchmarks do SPEC2000 integer e floating-point, mostram um crescimento de 2,97% (média harmônica) no desempenho do RSTm sobre o mecanismo original e de17,42% sobre a arquitetura base. O ganho é resultado de uma combinação de diversos fatores: traços maiores (em média, 7,75 instruções por traço; o RST original apresenta 3,17 em média), embora com taxa de reuso de aproximadamente 10,88% (inferior ao RST, que apresenta taxa de 15,23%); entretanto, a latência das instruções presentes nos traços do RSTm é maior e compensa a taxa de reuso inferior. / Even with the growing efforts to detect and handle redundant instructions, the true dependencies are still one of the bottlenecks of the computations. Value reuse and value prediction techniques have been studied in order to become an alternative to these issues. Following this approach, RST (Reuse through Speculation on Traces) combines both reuse mechanisms and has achieved some good performance improvements for superscalar processors. However, the original RST mechanism does not consider load/store instructions as reuse candidates. Because of this, our work presents a new value reuse and value prediction technique named RSTm (Reuse through Speculation on Traces with Memory), that extends RST and adds memory-access instructions to the reuse domain of the architecture. Among all studied solutions, we chose the approach of using a dedicated table (Memo_Table_L) to take care of the load/store instructions. This solution guarantees low hardware overhead, does not limit the number of memory-access instructions that could be stored for each trace and stores both the address and its value. From our experiments, performed with SPEC2000 integer and floating-point benchmarks, RSTm can achieve average performance improvements (harmonic means) of 2,97% over the original RST and 17,42% over the baseline architecture. These performance improvements are due to several reasons: bigger traces (in average, 7,75 per trace; the original RST has 3,17 in average), with a reuse rate of around 10,88% (less than RST, that presents reuse rate of 15,23%) because the latency of the instructions in the RSTm traces is bigger and compensates the smaller reuse rate. Arquitetura super escalares Desempenho : Computadores Processor architectures Value reuse Value prediction
5	RST: Reuse through Speculation on Traces / RST: Reuso Especulativo de Traces Pilla, Mauricio Lima January 2004 (has links) Na presente tese, apresentamos uma nova abordagem para combinar reuso e prvisão de seqüências dinâmicas de instruções, chamada Reuso por Especulação em traces (RST). Esta técnica permite a identificação dinâmica de traces de instruções redundantes ou previsíveis e o reuso (especulativo ou não) desses traces. RST procura resolver a questão de traces que não são reusados por seus valores de entradas de Traces (DTM). Em estudo anteriores, esses traces foram contabilizados como sendo cerca de 69% de todos os traces reusáveis. Uma das maiores vantagens de RST sobre a combinação de um mecanismo de previsão com uma técnica de reuso de valores em que mecanismos não são relacionados é que RST não necessita de tabelas adicionais para o armazenamento dos valores a serem previstos. A aplciação de reuso e previsão de valores pela simples combinação de mecanismos pode necessitar de uma quantidade proibitiva de espaço de armazenamento. No mecanismo RST, os valores já estão presentes na Tabela de Memorização de Traces, não incorrendo em custos adicionais para lê-los se comparado com uma técnica não-especulativa de reuso de traces. O contexto de entrada de cada trace (os valores de entrada de todas as instruções contidas no trace) já armazenam os valores para o teste de reuso, os quais podem ser também utilizados para previsão de valores para o teste de reuso, os quais podem ser também utilizados para previsão de valores. As principais contribuições de nosso trabalho incluem: (i) um framework de reuso especulativo de traces que pode ser modificado para diferentes arquiteturas de processadores; (ii) definição das modificações necessárias em um processador superescalar e superpipeline para implementar nosso mecanismo; (iii) estudo de questões de implementação relacionadas à essa arquitetura; (iv) estudo dos limites de desempenho da nossa técnica; (v) estudo de uma implementação RST limitada por fatores realísticos; e (vi) ferramentas de simulação que podem ser utilizadas em outros estudos, representando um processador superescalar e superpipeline em detalhes. Salientamos que, em uma arquitetura utilizando mecanismos realistas de estimativa de confiança das previsões, nossa técnica RST consegue atingir speedups médios (médias harmônicas) de 1.29 sobre uma arquitetura sem reuso e 1.09 sobre uma técnica não-especulativa de reuso de traces (DTM). / In this thesis, we present a novel approach to combine both reuse and prediction of dynamic sequences of instructions called Reuse through Speculation on Traces (RST). Our technique allows the dynamic identification of instruction traces that are redundant or predictable, and the reuse (speculative or not) of these traces. RST addresses the issue, present on Dynamic Trace Memoization (DTM), of traces not being reused because some of their inputs are not ready for the reuse test. These traces were measured to be 69% of all reusable traces in previous studies. One of the main advantages of RST over just combining a value prediction technique with an unrelated reuse technique is that RST does not require extra tables to store the values to be predicted. Applying reuse and value prediction in unrelated mechanisms but at the same time may require a prohibitive amount of storage in tables. In RST, the values are already stored in the Trace Memoization Table, and there is no extra cost in reading them if compared with a non-speculative trace reuse technique. . The input context of each trace (the input values of all instructions in the trace) already stores the values for the reuse test, which may also be used for prediction. Our main contributions include: (i) a speculative trace reuse framework that can be adapted to different processor architectures; (ii) specification of the modifications in a superscalar, superpipelined processor in order to implement our mechanism; (iii) study of implementation issues related to this architecture; (iv) study of the performance limits of our technique; (v) a performance study of a realistic, constrained implementation of RST; and (vi) simulation tools that can be used in other studies which represent a superscalar, superpipelined processor in detail. In a constrained architecture with realistic confidence, our RST technique is able to achieve average speedups (harmonic means) of 1.29 over the baseline architecture without reuse and 1.09 over a non-speculative trace reuse technique (DTM). Arquiteturas super escalares Processamento paralelo Speculative Trace Reuse Superscalar architectures Parallel processing Value reuse Value prediction
6	RST: Reuse through Speculation on Traces / RST: Reuso Especulativo de Traces Pilla, Mauricio Lima January 2004 (has links) Na presente tese, apresentamos uma nova abordagem para combinar reuso e prvisão de seqüências dinâmicas de instruções, chamada Reuso por Especulação em traces (RST). Esta técnica permite a identificação dinâmica de traces de instruções redundantes ou previsíveis e o reuso (especulativo ou não) desses traces. RST procura resolver a questão de traces que não são reusados por seus valores de entradas de Traces (DTM). Em estudo anteriores, esses traces foram contabilizados como sendo cerca de 69% de todos os traces reusáveis. Uma das maiores vantagens de RST sobre a combinação de um mecanismo de previsão com uma técnica de reuso de valores em que mecanismos não são relacionados é que RST não necessita de tabelas adicionais para o armazenamento dos valores a serem previstos. A aplciação de reuso e previsão de valores pela simples combinação de mecanismos pode necessitar de uma quantidade proibitiva de espaço de armazenamento. No mecanismo RST, os valores já estão presentes na Tabela de Memorização de Traces, não incorrendo em custos adicionais para lê-los se comparado com uma técnica não-especulativa de reuso de traces. O contexto de entrada de cada trace (os valores de entrada de todas as instruções contidas no trace) já armazenam os valores para o teste de reuso, os quais podem ser também utilizados para previsão de valores para o teste de reuso, os quais podem ser também utilizados para previsão de valores. As principais contribuições de nosso trabalho incluem: (i) um framework de reuso especulativo de traces que pode ser modificado para diferentes arquiteturas de processadores; (ii) definição das modificações necessárias em um processador superescalar e superpipeline para implementar nosso mecanismo; (iii) estudo de questões de implementação relacionadas à essa arquitetura; (iv) estudo dos limites de desempenho da nossa técnica; (v) estudo de uma implementação RST limitada por fatores realísticos; e (vi) ferramentas de simulação que podem ser utilizadas em outros estudos, representando um processador superescalar e superpipeline em detalhes. Salientamos que, em uma arquitetura utilizando mecanismos realistas de estimativa de confiança das previsões, nossa técnica RST consegue atingir speedups médios (médias harmônicas) de 1.29 sobre uma arquitetura sem reuso e 1.09 sobre uma técnica não-especulativa de reuso de traces (DTM). / In this thesis, we present a novel approach to combine both reuse and prediction of dynamic sequences of instructions called Reuse through Speculation on Traces (RST). Our technique allows the dynamic identification of instruction traces that are redundant or predictable, and the reuse (speculative or not) of these traces. RST addresses the issue, present on Dynamic Trace Memoization (DTM), of traces not being reused because some of their inputs are not ready for the reuse test. These traces were measured to be 69% of all reusable traces in previous studies. One of the main advantages of RST over just combining a value prediction technique with an unrelated reuse technique is that RST does not require extra tables to store the values to be predicted. Applying reuse and value prediction in unrelated mechanisms but at the same time may require a prohibitive amount of storage in tables. In RST, the values are already stored in the Trace Memoization Table, and there is no extra cost in reading them if compared with a non-speculative trace reuse technique. . The input context of each trace (the input values of all instructions in the trace) already stores the values for the reuse test, which may also be used for prediction. Our main contributions include: (i) a speculative trace reuse framework that can be adapted to different processor architectures; (ii) specification of the modifications in a superscalar, superpipelined processor in order to implement our mechanism; (iii) study of implementation issues related to this architecture; (iv) study of the performance limits of our technique; (v) a performance study of a realistic, constrained implementation of RST; and (vi) simulation tools that can be used in other studies which represent a superscalar, superpipelined processor in detail. In a constrained architecture with realistic confidence, our RST technique is able to achieve average speedups (harmonic means) of 1.29 over the baseline architecture without reuse and 1.09 over a non-speculative trace reuse technique (DTM). Arquiteturas super escalares Processamento paralelo Speculative Trace Reuse Superscalar architectures Parallel processing Value reuse Value prediction
7	Reuso especulativo de traços com instruções de acesso à memória / Speculative trace reuse with memory access instructions Laurino, Luiz Sequeira January 2007 (has links) Mesmo com o crescente esforço para a detecção e tratamento de instruções redundantes, as dependências verdadeiras ainda causam um grande atraso na execução dos programas. Mecanismos que utilizam técnicas de reuso e previsão de valores têm sido constantemente estudados como alternativa para estes problemas. Dentro desse contexto destaca-se a arquitetura RST (Reuse through Speculation on Traces), aliando essas duas técnicas e atingindo um aumento significativo no desempenho de microprocessadores superescalares. A arquitetura RST original, no entanto, não considera instruções de acesso à memória como candidatas ao reuso. Desse modo, esse trabalho introduz um novo mecanismo de reuso e previsão de valores chamado RSTm (Reuse through Speculation on Traces with Memory), que estende as funcionalidades do mecanismo original, com a adição de instruções de acesso à memória ao domínio de reuso da arquitetura. Dentre as soluções analisadas, optou-se pela utilização de uma tabela dedicada (Memo_Table_L) para o armazenamento das instruções de carga/escrita. Esta solução garante boa economia de hardware, não limita o número de instruções de acesso à memória por traço e, também, armazena tanto o endereço como seu respectivo valor. Os experimentos, realizados com benchmarks do SPEC2000 integer e floating-point, mostram um crescimento de 2,97% (média harmônica) no desempenho do RSTm sobre o mecanismo original e de17,42% sobre a arquitetura base. O ganho é resultado de uma combinação de diversos fatores: traços maiores (em média, 7,75 instruções por traço; o RST original apresenta 3,17 em média), embora com taxa de reuso de aproximadamente 10,88% (inferior ao RST, que apresenta taxa de 15,23%); entretanto, a latência das instruções presentes nos traços do RSTm é maior e compensa a taxa de reuso inferior. / Even with the growing efforts to detect and handle redundant instructions, the true dependencies are still one of the bottlenecks of the computations. Value reuse and value prediction techniques have been studied in order to become an alternative to these issues. Following this approach, RST (Reuse through Speculation on Traces) combines both reuse mechanisms and has achieved some good performance improvements for superscalar processors. However, the original RST mechanism does not consider load/store instructions as reuse candidates. Because of this, our work presents a new value reuse and value prediction technique named RSTm (Reuse through Speculation on Traces with Memory), that extends RST and adds memory-access instructions to the reuse domain of the architecture. Among all studied solutions, we chose the approach of using a dedicated table (Memo_Table_L) to take care of the load/store instructions. This solution guarantees low hardware overhead, does not limit the number of memory-access instructions that could be stored for each trace and stores both the address and its value. From our experiments, performed with SPEC2000 integer and floating-point benchmarks, RSTm can achieve average performance improvements (harmonic means) of 2,97% over the original RST and 17,42% over the baseline architecture. These performance improvements are due to several reasons: bigger traces (in average, 7,75 per trace; the original RST has 3,17 in average), with a reuse rate of around 10,88% (less than RST, that presents reuse rate of 15,23%) because the latency of the instructions in the RSTm traces is bigger and compensates the smaller reuse rate. Arquitetura super escalares Desempenho : Computadores Processor architectures Value reuse Value prediction
8	Design Space Exploration for Value Prediction in Security Applications Gunnarsson, Linnea January 2019 (has links) With the introduction of Spectre and Meltdown, two new attacks thattarget the speculative instructions due to Out-of-Order execution intoday's processors, a new way to handle speculative loads has beenproposed. Instead of performing the speculative load, the approach isto predict them. This is a new way to use value predictors. In thiswork, the Last Value Predictor, which predicts based on the previouslyseen value, Value TAgged GEometric history length Predictor (VTAGE),which predicts based on the global branch history, and a stridepredictor, which predicts with help of strides, has been compared tosee which one has the best fit for this new use. They have been runwith the SPEC CPU 2017 benchmark suite in three different tests,different sizes, different threshold confidence and for VTAGE,different associativity. The VTAGE predictor performed best in terms ofvalues predicted and values correctly predicted. The thresholdconfidence level plays an important role in how many incorrectpredictions were made. The associativity in the VTAGE did not do muchdifference to the results. Spectre Meltdown Value prediction Other Computer and Information Science Annan data- och informationsvetenskap
9	Sensor numerical prediction based on long-term and short-term memory neural network Yangyang, Wen January 2020 (has links) Many sensor nodes are scattered in the sensor network,which are used in all aspects of life due to their small size, low power consumption, and multiple functions. With the advent of the Internet of Things, more small sensor devices will appear in our lives. The research of deep learning neural networks is generally based on large and medium-sized devices such as servers and computers, and it is rarely heard about the research of neural networks based on small Internet of Things devices. In this study, the Internet of Things devices are divided into three types: large, medium, and small in terms of device size, running speed, and computing power. More vividly, I classify the laptop as a medium- sized device, the device with more computing power than the laptop, like server, as a large-size IoT(Internet of Things) device, and the IoT mobile device that is smaller than it as a small IoT device. The purpose of this paper is to explore the feasibility, usefulness, and effectiveness of long-short-term memory neural network model value prediction research based on small IoT devices. In the control experiment of small and medium-sized Internet of Things devices, the following results are obtained: the error curves of the training set and verification set of small and medium-sized devices have the same downward trend, and similar accuracy and errors. But in terms of time consumption, small equipment is about 12 times that of medium-sized equipment. Therefore, it can be concluded that the LSTM(long-and-short-term memory neural networks) model value prediction research based on small IoT devices is feasible, and the results are useful and effective. One of the main problems encountered when the LSTM model is extended to small devices is time-consuming. Deep learning sensor networks Internet of Things value prediction Software Engineering Programvaruteknik
10	Increasing the performance of superscalar processors through value prediction / La prédiction de valeurs comme moyen d'augmenter la performance des processeurs superscalaires Perais, Arthur 24 September 2015 (has links) Bien que les processeurs actuels possèdent plus de 10 cœurs, de nombreux programmes restent purement séquentiels. Cela peut être dû à l'algorithme que le programme met en œuvre, au programme étant vieux et ayant été écrit durant l'ère des uni-processeurs, ou simplement à des contraintes temporelles, car écrire du code parallèle est notoirement long et difficile. De plus, même pour les programmes parallèles, la performance de la partie séquentielle de ces programmes devient rapidement le facteur limitant l'augmentation de la performance apportée par l'augmentation du nombre de cœurs disponibles, ce qui est exprimé par la loi d'Amdahl. Conséquemment, augmenter la performance séquentielle reste une approche valide même à l'ère des multi-cœurs.Malheureusement, la façon conventionnelle d'améliorer la performance (augmenter la taille de la fenêtre d'instructions) contribue à l'augmentation de la complexité et de la consommation du processeur. Dans ces travaux, nous revisitons une technique visant à améliorer la performance de façon orthogonale : La prédiction de valeurs. Au lieu d'augmenter les capacités du moteur d'exécution, la prédiction de valeurs améliore l'utilisation des ressources existantes en augmentant le parallélisme d'instructions disponible.En particulier, nous nous attaquons aux trois problèmes majeurs empêchant la prédiction de valeurs d'être mise en œuvre dans les processeurs modernes. Premièrement, nous proposons de déplacer la validation des prédictions depuis le moteur d'exécution vers l'étage de retirement des instructions. Deuxièmement, nous proposons un nouveau modèle d'exécution qui exécute certaines instructions dans l'ordre soit avant soit après le moteur d'exécution dans le désordre. Cela réduit la pression exercée sur ledit moteur et permet de réduire ses capacités. De cette manière, le nombre de ports requis sur le fichier de registre et la complexité générale diminuent. Troisièmement, nous présentons un mécanisme de prédiction imitant le mécanisme de récupération des instructions : La prédiction par blocs. Cela permet de prédire plusieurs instructions par cycle tout en effectuant une unique lecture dans le prédicteur. Ces trois propositions forment une mise en œuvre possible de la prédiction de valeurs qui est réaliste mais néanmoins performante. / Although currently available general purpose microprocessors feature more than 10 cores, many programs remain mostly sequential. This can either be due to an inherent property of the algorithm used by the program, to the program being old and written during the uni-processor era, or simply to time to market constraints, as writing and validating parallel code is known to be hard. Moreover, even for parallel programs, the performance of the sequential part quickly becomes the limiting improvement factor as more cores are made available to the application, as expressed by Amdahl's Law. Consequently, increasing sequential performance remains a valid approach in the multi-core era. Unfortunately, conventional means to do so - increasing the out-of-order window size and issue width - are major contributors to the complexity and power consumption of the chip. In this thesis, we revisit a previously proposed technique that aimed to improve performance in an orthogonal fashion: Value Prediction (VP). Instead of increasing the execution engine aggressiveness, VP improves the utilization of existing resources by increasing the available Instruction Level Parallelism. In particular, we address the three main issues preventing VP from being implemented. First, we propose to remove validation and recovery from the execution engine, and do it in-order at Commit. Second, we propose a new execution model that executes some instructions in-order either before or after the out-of-order engine. This reduces pressure on said engine and allows to reduce its aggressiveness. As a result, port requirement on the Physical Register File and overall complexity decrease. Third, we propose a prediction scheme that mimics the instruction fetch scheme: Block Based Prediction. This allows predicting several instructions per cycle with a single read, hence a single port on the predictor array. This three propositions form a possible implementation of Value Prediction that is both realistic and efficient. Architecture des processeurs Processeurs à hautes performances Exécution spéculative Prédiction de valeurs Processeurs superscalaires Exécution dans le désordre Processor architecture High performance processors Speculative execution Value prediction Superscalar processors Out-Of-Order execution

Search results