• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 16
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 24
  • 24
  • 24
  • 12
  • 9
  • 9
  • 9
  • 8
  • 8
  • 4
  • 4
  • 3
  • 3
  • 3
  • 3
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Developing a generic request-processor for systems with limited request processing resources

Venter, H. (Heinrich) 03 1900 (has links)
Thesis (MScIng)--University of Stellenbosch, 2008. / ENGLISH ABSTRACT: This thesis describes the design, modelling and implementation of a prototype request- processing software system, which can be used as the basis for a request processing frame- work for systems with limited request processing resources. Due to design constraints, the request-processor system described here consists of multiple processes. It is problematic to prove that a multiple process design satis es the conditions of a set of prede ned requirements. One way to verify that such a multiple process design works as intended, is to use modelchecking tools. The system was veri ed for correctness and translated into a working prototype soft- ware system. / AFRIKAANSE OPSOMMING: Hierdie tesis beskryf die ontwerp, modellering en implementering van 'n prototipe versoek- verwerking-sagtewarestelsel. Die stelsel kan gebruik word om 'n versoekverwerkings- raamwerk te ontwerp vir stelsels met beperkte versoekverwerkingshulpbronne. Die ver- soekverwerkingsstelsel bestaan uit veelvoudige prosesse. Die veelvoudige proses-ontwerp was die direkte gevolg van stelselbeperkings. Dit is problematies om te bewys dat 'n multi-proses-ontwerp korrek funksioneer. Mod- elchecking-sagteware kan gebruik word om te veri eer of 'n stelsel korrek funksioneer. Die korrektheid van die stelsel is geveri eer voordat die nale prototipe ge¨implementeer is.
22

Efficient dispatch policy for SMT processors

Shmachkov, Igor. January 2009 (has links)
Thesis (M.S.)--State University of New York at Binghamton, Thomas J. Watson School of Engineering and Applied Science, Department of Computer Science, 2009. / Includes bibliographical references.
23

Compiler Optimizations for Multithreaded Multicore Network Processors

Zhuang, Xiaotong 07 July 2006 (has links)
Network processors are new types of multithreaded multicore processors geared towards achieving both fast processing speed and flexibility of programming. The architecture of network processors considers many special properties for packet processing, including multiple threads, multiple processor cores on the same chip, special functional units, simplified ISA and simplified pipeline, etc. The architectural peculiarities of network processors raise new challenges for compiler design and optimization. Due to very high clocking speeds, the CPU memory gap on such processors is huge, making registers extremely precious. Moreover, the register file is split into two banks, and for any ALU instruction, the two source operands must come from different banks. We present and compare three different approaches to do register allocation and bank assignment. We also address the problem of sharing registers across threads in order to maximize the utilization of hardware resources. The context switches on the IXP network processor only happen when long latency operations are encountered. As a result, context switches are highly frequent. Therefore, the designer of the IXP network processor decided to make context switches extremely lightweight, i.e. only the program counter(PC) is stored together with the context. Since registers are not saved and restored during context switches, it becomes difficult to share registers across threads. For a conventional processor, each thread can assume that it can use the entire register file, because registers are always part of the context. However, with lightweight context switch, each thread must take a separate piece of the register file, making register usage inefficient. Programs executing on network processors typically have runtime constraints. Scheduling of multiple programs sharing a CPU must be orchestrated by the OS and the hardware using certain sharing policies. Real time applications demand a real time aware OS kernel to meet their specified deadlines. However, due to stringent performance requirements on network processors, neither OS nor hardware mechanisms is typically feasible. In this work, we demonstrate that a compiler approach could achieve some of the OS scheduling and real time scheduling functionalities without introducing a hefty overhead.
24

Increasing the performance of superscalar processors through value prediction / La prédiction de valeurs comme moyen d'augmenter la performance des processeurs superscalaires

Perais, Arthur 24 September 2015 (has links)
Bien que les processeurs actuels possèdent plus de 10 cœurs, de nombreux programmes restent purement séquentiels. Cela peut être dû à l'algorithme que le programme met en œuvre, au programme étant vieux et ayant été écrit durant l'ère des uni-processeurs, ou simplement à des contraintes temporelles, car écrire du code parallèle est notoirement long et difficile. De plus, même pour les programmes parallèles, la performance de la partie séquentielle de ces programmes devient rapidement le facteur limitant l'augmentation de la performance apportée par l'augmentation du nombre de cœurs disponibles, ce qui est exprimé par la loi d'Amdahl. Conséquemment, augmenter la performance séquentielle reste une approche valide même à l'ère des multi-cœurs.Malheureusement, la façon conventionnelle d'améliorer la performance (augmenter la taille de la fenêtre d'instructions) contribue à l'augmentation de la complexité et de la consommation du processeur. Dans ces travaux, nous revisitons une technique visant à améliorer la performance de façon orthogonale : La prédiction de valeurs. Au lieu d'augmenter les capacités du moteur d'exécution, la prédiction de valeurs améliore l'utilisation des ressources existantes en augmentant le parallélisme d'instructions disponible.En particulier, nous nous attaquons aux trois problèmes majeurs empêchant la prédiction de valeurs d'être mise en œuvre dans les processeurs modernes. Premièrement, nous proposons de déplacer la validation des prédictions depuis le moteur d'exécution vers l'étage de retirement des instructions. Deuxièmement, nous proposons un nouveau modèle d'exécution qui exécute certaines instructions dans l'ordre soit avant soit après le moteur d'exécution dans le désordre. Cela réduit la pression exercée sur ledit moteur et permet de réduire ses capacités. De cette manière, le nombre de ports requis sur le fichier de registre et la complexité générale diminuent. Troisièmement, nous présentons un mécanisme de prédiction imitant le mécanisme de récupération des instructions : La prédiction par blocs. Cela permet de prédire plusieurs instructions par cycle tout en effectuant une unique lecture dans le prédicteur. Ces trois propositions forment une mise en œuvre possible de la prédiction de valeurs qui est réaliste mais néanmoins performante. / Although currently available general purpose microprocessors feature more than 10 cores, many programs remain mostly sequential. This can either be due to an inherent property of the algorithm used by the program, to the program being old and written during the uni-processor era, or simply to time to market constraints, as writing and validating parallel code is known to be hard. Moreover, even for parallel programs, the performance of the sequential part quickly becomes the limiting improvement factor as more cores are made available to the application, as expressed by Amdahl's Law. Consequently, increasing sequential performance remains a valid approach in the multi-core era. Unfortunately, conventional means to do so - increasing the out-of-order window size and issue width - are major contributors to the complexity and power consumption of the chip. In this thesis, we revisit a previously proposed technique that aimed to improve performance in an orthogonal fashion: Value Prediction (VP). Instead of increasing the execution engine aggressiveness, VP improves the utilization of existing resources by increasing the available Instruction Level Parallelism. In particular, we address the three main issues preventing VP from being implemented. First, we propose to remove validation and recovery from the execution engine, and do it in-order at Commit. Second, we propose a new execution model that executes some instructions in-order either before or after the out-of-order engine. This reduces pressure on said engine and allows to reduce its aggressiveness. As a result, port requirement on the Physical Register File and overall complexity decrease. Third, we propose a prediction scheme that mimics the instruction fetch scheme: Block Based Prediction. This allows predicting several instructions per cycle with a single read, hence a single port on the predictor array. This three propositions form a possible implementation of Value Prediction that is both realistic and efficient.

Page generated in 0.1393 seconds