Spelling suggestions: "subject:"eprocessor architecture"" "subject:"coprocessor architecture""
11 |
Design of an Adaptable Run-Time Reconfigurable Software-Defined Radio Processing ArchitectureTemplin, Joshua R. 01 December 2010 (has links)
Processing power is a key technical challenge holding back the development of a high-performance software defined radio (SDR). Traditionally, SDR has utilized digital signal processors (DSPs), but increasingly complex algorithms, higher data rates, and multi-tasking needs have exceed the processing capabilities of modern DSPs. Reconfigurable computers, such as field-programmable gate arrays (FPGAs), are popular alternatives because of their performance gains over software for streaming data applications like SDR. However, FPGAs have not yet realized the ideal SDR because architectures have not fully utilized their partial reconfiguration (PR) capabilities to bring needed flexibility. A reconfigurable processor architecture is proposed that utilizes PR in reconfigurable computers to achieve a more sophisticated SDR. The proposed processor contains run-time swappable blocks whose parameters and interconnects are programmable. The architecture is analyzed for performance and flexibility and compared with available alternate technologies. For a sample QPSK algorithm, hardware performance gains of at least 44x are seen over modern desktop processors and DSPs while most of their flexibility and extensibility is maintained.
|
12 |
Améliorer la performance séquentielle à l’ère des processeurs massivement multicœurs / Increase Sequential Performance in the Manycore EraPrémillieu, Nathanaël 03 December 2013 (has links)
L'omniprésence des ordinateurs et la demande de toujours plus de puissance poussent les architectes processeur à chercher des moyens d'augmenter les performances de ces processeurs. La tendance actuelle est de répliquer sur une même puce plusieurs cœurs d'exécution pour paralléliser l'exécution. Si elle se poursuit, les processeurs deviendront massivement multicoeurs avec plusieurs centaines voire un millier de cœurs disponibles. Cependant, la loi d'Amdahl nous rappelle que l'augmentation de la performance séquentielle sera toujours nécessaire pour améliorer les performances globales. Une voie essentielle pour accroître la performance séquentielle est de perfectionner le traitement des branchements, ceux-ci limitant le parallélisme d'instructions. La prédiction de branchements est la solution la plus étudiée, dont l'intérêt dépend essentiellement de la précision du prédicteur. Au cours des dernières années, cette précision a été continuellement améliorée et a atteint un seuil qu'il semble difficile de dépasser. Une autre solution est d'éliminer les branchements et de les remplacer par une construction reposant sur des instructions prédiquées. L'exécution des instructions prédiquées pose cependant plusieurs problèmes dans les processeurs à exécution dans le désordre, en particulier celui des définitions multiples. Les travaux présentés dans cette thèse explorent ces deux aspects du traitement des branchements. La première partie s'intéresse à la prédiction de branchements. Une solution pour améliorer celle-ci sans augmenter la précision est de réduire le coût d'une mauvaise prédiction. Cela est possible en exploitant la reconvergence de flot de contrôle et l'indépendance de contrôle pour récupérer une partie du travail fait par le processeur sur le mauvais chemin sur les instructions communes aux deux chemins pour éviter de le refaire sur le bon chemin. La deuxième partie s'intéresse aux instructions prédiquées. Nous proposons une solution au problème des définitions multiples qui passe par la prédiction sélective de la valeur des prédicats. Un mécanisme de rejeu sélectif est utilisé pour réduire le coût d'une mauvaise prédiction de prédicat. / Computers are everywhere and the need for always more computation power has pushed the processor architects to find new ways to increase performance. The today's tendency is to replicate execution core on the same die to parallelize the execution. If it goes on, processors will become manycores featuring hundred to a thousand cores. However, Amdahl's law reminds us that increasing the sequential performance will always be vital to increase global performance. A perfect way to increase sequential performance is to improve how branches are executed because they limit instruction level parallelism. The branch prediction is the most studied solution, its interest greatly depending on its accuracy. In the last years, this accuracy has been continuously improved up to reach a hardly exceeding limit. An other solution is to suppress the branches by replacing them with a construct based on predicated instructions. However, the execution of predicated instructions on out-of-order processors comes up with several problems like the multiple definition problem. This study investigates these two aspects of the branch treatment. The first part is about branch prediction. A way to improve it without increasing the accuracy is to reduce the coast of a branch misprediction. This is possible by exploiting control flow reconvergence and control independence. The work done on the wrong path on instructions common to the two paths is saved to be reused on the correct path. The second part is about predicated instructions. We propose a solution to the multiple definition problem by selectively predicting the predicate values. A selective replay mechanism is used to reduce the cost of a predicate misprediction.
|
13 |
Exekveringsmiljö för Plex-C på JVM / Run-time environment for Plex-C on JVMMöller, Johan January 2002 (has links)
<p>The Ericsson AXE-based systems are programmed using an internally developed language called Plex-C. Plex-C is normally compiled to execute on an Ericsson internal processor architecture. A transition to standard processors is currently in progress. This makes it interesting to examine if Plex-C can be compiled to execute on the JVM, which would make it processor independent. </p><p>The purpose of the thesis is to examine if parts of the run-time environment of Plex-C can be translated to Java and if this can be done so that sufficient performance is obtained. It includes how language constructions in Plex-C can be translated to Java. </p><p>The thesis describes how a limited part of the Plex-C run-time environment is implemented in Java. Optimizations are an important part of the implementation. </p><p>It is also described how the JVM system was tested with a benchmark test. </p><p>The test results indicate that the implemented system is a few times faster than the Ericsson internal processor architecture. But this performance is still not sufficient for the JVM system to be an interesting replacement for the currently used processor architecture. It might still be useful as a processor independent test platform.</p>
|
14 |
FPGA-BASED IMPLEMENTATION OF DUAL-FREQUENCY PATTERN SCHEME FOR 3-D SHAPE MEASUREMENTBondehagen, Brent 01 January 2013 (has links)
Structured Light Illumination (SLI) is the process where spatially varied patterns are projected onto a 3-D surface and based on the distortion by the surface topology, phase information can be calculated and a 3D model constructed. Phase Measuring Profilometry (PMP) is a particular type of SLI that requires three or more patterns temporarily multiplexed. High speed PMP attempts to scan moving objects whose motion is small so as to have little impact on the 3-D model. Given that practically all machine vision cameras and high speed cameras employ a Field Programmable Gate Array (FPGA) interface directly to the image sensors, the opportunity exists to do the processing on camera. This thesis focuses on the design, implementation, testing, and evaluation of a camera-projector system to implement a PMP dual-frequency scheme for 3-D shape measurement on a single FPGA chip. The processor architecture is implemented and tested using the Xilinx Spartan 3 FPGA chip on an Opal Kelly development board. The hardware is described using VHDL and Verilog Hardware Description Languages (HDLs).
|
15 |
Dynamic Power Management in a Heterogeneous Processor ArchitectureArega, Frehiwot Melak, Hähnel, Markus, Dargie, Waltenegus 15 May 2023 (has links)
Emerging mobile platforms integrate heterogeneous, multicore processors to efficiently deal with the heterogeneity of data (in magnitude, type, and quality). The main goal is to achieve a high degree of energy-proportionality which corresponds with the nature and fluctuation of mobile workloads. Most existing power and energy consumption analyses of these architectures rely on simulation or static benchmarks neither of which truly reflects the type of workload the processors handle in reality. By contrast, we generate two types of stochastic workloads and employ four types of dynamic voltage and frequency scaling (DVFS) policies to investigate the energy proportionality and the dynamic power consumption characteristics of a heterogeneous processor architecture when operating in different configurations. The analysis illustrates, both qualitatively and quantitatively, that knowledge of the statistics of the incoming workload is critical to determine the appropriate processor configuration.
|
16 |
Designing Future Low-Power and Secure Processors with Non-Volatile MemoryPan, Xiang 07 September 2017 (has links)
No description available.
|
17 |
Out-of-Order Retirement of Instructions in Superscalar, Multithreaded, and Multicore ProcessorsUbal Tena, Rafael 01 September 2010 (has links)
Los procesadores superescalares actuales utilizan un reorder buffer (ROB) para contabilizar las instrucciones en vuelo. El ROB se implementa como una cola FIFO first in first out en la que las instrucciones se insertan en orden de programa después de ser decodificadas, y de la que se extraen también en orden de programa en la etapa commit. El uso de esta estructura proporciona un soporte simple para la especulación, las excepciones precisas y la reclamación de registros. Sin embargo, el hecho de retirar instrucciones en orden puede degradar las prestaciones si una operación de alta latencia está bloqueando la cabecera del ROB. Varias propuestas se han publicado atacando este problema. La mayoría utiliza retirada de instrucciones fuera de orden de forma especulativa, requiriendo almacenar puntos de recuperación (checkpoints) para restaurar un estado válido del procesador ante un fallo de especulación. Normalmente, los checkpoints necesitan implementarse con estructuras hardware costosas, y además requieren un crecimiento de otras estructuras del procesador, lo cual a su vez puede impactar en el tiempo de ciclo de reloj. Este problema afecta a muchos tipos de procesadores actuales, independientemente del número de hilos hardware (threads) y del número de núcleos de cómputo (cores) que incluyan. Esta tesis abarca el estudio de la retirada no especulativa de instrucciones fuera de orden en procesadores superescalares, multithread y multicore. / Ubal Tena, R. (2010). Out-of-Order Retirement of Instructions in Superscalar, Multithreaded, and Multicore Processors [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8535
|
18 |
Superscalar Processor Models Using Statistical LearningJoseph, P J 04 1900 (has links)
Processor architectures are becoming increasingly complex and hence architects have to evaluate a large design space consisting of several parameters, each with a number of potential settings. In order to assist in guiding design decisions we develop simple and accurate models of the superscalar processor design space using a detailed and validated superscalar processor simulator.
Firstly, we obtain precise estimates of all significant micro-architectural parameters and their interactions by building linear regression models using simulation based experiments. We obtain good approximate models at low simulation costs using an iterative process in which Akaike’s Information Criteria is used to extract a good linear model from a small set of simulations, and limited further simulation is guided by the model using D-optimal experimental designs. The iterative process is repeated until desired error bounds are achieved. We use this procedure for model construction and show that it provides a cost effective scheme to experiment with all relevant parameters.
We also obtain accurate predictors of the processors performance response across the entire design-space, by constructing radial basis function networks from sampled simulation experiments. We construct these models, by simulating at limited design points selected by latin hypercube sampling, and then deriving the radial neural networks from the results. We show that these predictors provide accurate approximations to the simulator’s performance response, and hence provide a cheap alternative to simulation while searching for optimal processor design points.
|
19 |
Exekveringsmiljö för Plex-C på JVM / Run-time environment for Plex-C on JVMMöller, Johan January 2002 (has links)
The Ericsson AXE-based systems are programmed using an internally developed language called Plex-C. Plex-C is normally compiled to execute on an Ericsson internal processor architecture. A transition to standard processors is currently in progress. This makes it interesting to examine if Plex-C can be compiled to execute on the JVM, which would make it processor independent. The purpose of the thesis is to examine if parts of the run-time environment of Plex-C can be translated to Java and if this can be done so that sufficient performance is obtained. It includes how language constructions in Plex-C can be translated to Java. The thesis describes how a limited part of the Plex-C run-time environment is implemented in Java. Optimizations are an important part of the implementation. It is also described how the JVM system was tested with a benchmark test. The test results indicate that the implemented system is a few times faster than the Ericsson internal processor architecture. But this performance is still not sufficient for the JVM system to be an interesting replacement for the currently used processor architecture. It might still be useful as a processor independent test platform.
|
20 |
Increasing the performance of superscalar processors through value prediction / La prédiction de valeurs comme moyen d'augmenter la performance des processeurs superscalairesPerais, Arthur 24 September 2015 (has links)
Bien que les processeurs actuels possèdent plus de 10 cœurs, de nombreux programmes restent purement séquentiels. Cela peut être dû à l'algorithme que le programme met en œuvre, au programme étant vieux et ayant été écrit durant l'ère des uni-processeurs, ou simplement à des contraintes temporelles, car écrire du code parallèle est notoirement long et difficile. De plus, même pour les programmes parallèles, la performance de la partie séquentielle de ces programmes devient rapidement le facteur limitant l'augmentation de la performance apportée par l'augmentation du nombre de cœurs disponibles, ce qui est exprimé par la loi d'Amdahl. Conséquemment, augmenter la performance séquentielle reste une approche valide même à l'ère des multi-cœurs.Malheureusement, la façon conventionnelle d'améliorer la performance (augmenter la taille de la fenêtre d'instructions) contribue à l'augmentation de la complexité et de la consommation du processeur. Dans ces travaux, nous revisitons une technique visant à améliorer la performance de façon orthogonale : La prédiction de valeurs. Au lieu d'augmenter les capacités du moteur d'exécution, la prédiction de valeurs améliore l'utilisation des ressources existantes en augmentant le parallélisme d'instructions disponible.En particulier, nous nous attaquons aux trois problèmes majeurs empêchant la prédiction de valeurs d'être mise en œuvre dans les processeurs modernes. Premièrement, nous proposons de déplacer la validation des prédictions depuis le moteur d'exécution vers l'étage de retirement des instructions. Deuxièmement, nous proposons un nouveau modèle d'exécution qui exécute certaines instructions dans l'ordre soit avant soit après le moteur d'exécution dans le désordre. Cela réduit la pression exercée sur ledit moteur et permet de réduire ses capacités. De cette manière, le nombre de ports requis sur le fichier de registre et la complexité générale diminuent. Troisièmement, nous présentons un mécanisme de prédiction imitant le mécanisme de récupération des instructions : La prédiction par blocs. Cela permet de prédire plusieurs instructions par cycle tout en effectuant une unique lecture dans le prédicteur. Ces trois propositions forment une mise en œuvre possible de la prédiction de valeurs qui est réaliste mais néanmoins performante. / Although currently available general purpose microprocessors feature more than 10 cores, many programs remain mostly sequential. This can either be due to an inherent property of the algorithm used by the program, to the program being old and written during the uni-processor era, or simply to time to market constraints, as writing and validating parallel code is known to be hard. Moreover, even for parallel programs, the performance of the sequential part quickly becomes the limiting improvement factor as more cores are made available to the application, as expressed by Amdahl's Law. Consequently, increasing sequential performance remains a valid approach in the multi-core era. Unfortunately, conventional means to do so - increasing the out-of-order window size and issue width - are major contributors to the complexity and power consumption of the chip. In this thesis, we revisit a previously proposed technique that aimed to improve performance in an orthogonal fashion: Value Prediction (VP). Instead of increasing the execution engine aggressiveness, VP improves the utilization of existing resources by increasing the available Instruction Level Parallelism. In particular, we address the three main issues preventing VP from being implemented. First, we propose to remove validation and recovery from the execution engine, and do it in-order at Commit. Second, we propose a new execution model that executes some instructions in-order either before or after the out-of-order engine. This reduces pressure on said engine and allows to reduce its aggressiveness. As a result, port requirement on the Physical Register File and overall complexity decrease. Third, we propose a prediction scheme that mimics the instruction fetch scheme: Block Based Prediction. This allows predicting several instructions per cycle with a single read, hence a single port on the predictor array. This three propositions form a possible implementation of Value Prediction that is both realistic and efficient.
|
Page generated in 0.0872 seconds