Spelling suggestions: "subject:"computer architecture."" "subject:"coomputer architecture.""
231 |
Data centric and adaptive source changing transactional memory with exit functionalityHerath, Herath Mudiyanselage Isuru Prasenajith January 2012 (has links)
Multi-core computing is becoming ubiquitous due to the scaling limitations of single-core computing. It is inevitable that parallel programming will become the mainstream for such processors. In this paradigm shift, the concept of abstraction should not be compromised. A programming model serves as an abstraction of how programs are executed. Transactional Memory (TM) is a technique proposed to maintain lock free synchronization. Due to the simplicity of the abstraction provided by it, TM can also be used as a way of distributing parallel work, maintaining coherence and consistency. Motivated by this, at a higher level, the thesis makes three contributions and all are centred around Hardware Transactional Memory (HTM).As the first contribution, a transaction-only architecture is coupled with a ``data centric" approach, to address the scalability issues of the former whilst maintaining its simplicity. This is achieved by grouping together memory locations having similar access patterns and maintaining coherence and consistency according to the group each memory location belongs to. As the second contribution a novel technique is proposed to reduce the number of false transaction aborts which occur in a signature based HTM. The idea is to adaptively switch between cache lines and signatures to detect conflicts. That is, when a transaction fits in the L1 cache, cache line information is used to detect conflicts and signatures are used otherwise. As the third contribution, the thesis makes a case for having an exit functionality in an HTM. The objective of the proposed functionality, TM_EXIT, is to terminate a transaction without restarting or committing.
|
232 |
Managing an information security policy architecture : a technical documentation perspectiveManinjwa, Prosecutor Mvikeli January 2012 (has links)
Information and the related assets form critical business assets for most organizations. Organizations depend on their information assets to survive and to remain competitive. However, the organization’s information assets are faced with a number of internal and external threats, aimed at compromising the confidentiality, integrity and/or availability (CIA) of information assets. These threats can be of physical, technical, or operational nature. For an organization to successfully conduct its business operations, information assets should always be protected from these threats. The process of protecting information and its related assets, ensuring the CIA thereof, is referred to as information security. To be effective, information security should be viewed as critical to the overall success of the organization, and therefore be included as one of the organization’s Corporate Governance sub-functions, referred to as Information Security Governance. Information Security Governance is the strategic system for directing and controlling the organization’s information security initiatives. Directing is the process whereby management issues directives, giving a strategic direction for information security within an organization. Controlling is the process of ensuring that management directives are being adhered to within an organization. To be effective, Information Security Governance directing and controlling depend on the organization’s Information Security Policy Architecture. An Information Security Policy Architecture is a hierarchical representation of the various information security policies and related documentation that an organization has used. When directing, management directives should be issued in the form of an Information Security Policy Architecture, and controlling should ensure adherence to the Information Security Policy Architecture. However, this study noted that in both literature and organizational practices, Information Security Policy Architectures are not comprehensively addressed and adequately managed. Therefore, this study argues towards a more comprehensive Information Security Policy Architecture, and the proper management thereof.
|
233 |
Exploiting tightly-coupled coresBates, Daniel January 2014 (has links)
As we move steadily through the multicore era, and the number of processing cores on each chip continues to rise, parallel computation becomes increasingly important. However, parallelising an application is often difficult because of dependencies between different regions of code which require cores to communicate. Communication is usually slow compared to computation, and so restricts the opportunities for profitable parallelisation. In this work, I explore the opportunities provided when communication between cores has a very low latency and low energy cost. I observe that there are many different ways in which multiple cores can be used to execute a program, allowing more parallelism to be exploited in more situations, and also providing energy savings in some cases. Individual cores can be made very simple and efficient because they do not need to exploit parallelism internally. The communication patterns between cores can be updated frequently to reflect the parallelism available at the time, allowing better utilisation than specialised hardware which is used infrequently. In this dissertation I introduce Loki: a homogeneous, tiled architecture made up of many simple, tightly-coupled cores. I demonstrate the benefits in both performance and energy consumption which can be achieved with this arrangement and observe that it is also likely to have lower design and validation costs and be easier to optimise. I then determine exactly where the performance bottlenecks of the design are, and where the energy is consumed, and look into some more-advanced optimisations which can make parallelism even more profitable.
|
234 |
Language and computer design for effective software systemsLillich, Alan W. January 1979 (has links)
This thesis describes two distinct, but mutually supportive, research projects. The first is the design and implementation of a high level language intended to be suitable for writing operating systems among other large software products. It provides facilities for the creation and control of asynchronous processes along with powerful data and "sequential" control structures. The second project is the design and implementation of a machine architecture which is a congenial host for modern block structured languages. This machine has several advantages compared to most of today's computers; code generation is simple, the object code is very compact and the machine is reasonably fast.
Effective software systems are well designed, reliable, have "low" space-time products and are developed, maintained and used with a minimum amount of human effort. The work presented here is intended to be a viable first step towards the production of an environment for the production of effective software systems. / Science, Faculty of / Computer Science, Department of / Graduate
|
235 |
Hardware Architecture Impact on Manycore Programming ModelStubbfält, Erik January 2021 (has links)
This work investigates how certain processor architectures can affectthe implementation and performance of a parallel programming model.The Ericsson Many-Core Architecture (EMCA) is compared and contrastedto general-purpose multicore processors, highlighting differencesin their memory systems and processor cores. A proof-of-conceptimplementation of the Concurrency Building Blocks (CBB) programmingmodel is developed for x86-64 using MPI. Benchmark tests showhow CBB on EMCA handles compute-intensive and memory-intensivescenarios, compared to a high-end x86-64 machine running the proofof-concept implementation. EMCA shows its strengths in heavy computationswhile x86-64 performs at its best with high degrees of datareuse. Both systems are able to utilize locality in their memory systemsto achieve great performance benefits.
|
236 |
The implementation of a generalized table driven back end processorBroadbent, Christopher Frank January 1987 (has links)
Includes bibliographical references. / This thesis discusses the University of Cape Town implementation of a table driven back end processor. The back end processor takes as input an intermediate tree representation of a high level programming language. It produces as output an object text ready for assembly. The specifications of the input tree and the output object are supplied to the back end processor via two tables. The initial motivation for this project was the need to provide a back end processor capable of taking the DIANA tree output of the University of Cape Town front end processor and producing a corresponding P-code object. The University of Cape Town back end processor is implemented using Pascal and C in a Unix V environment.
|
237 |
Directory-based Cache Coherence in SMTp Machines without Memory Overhead using Sparse DirectoriesKiriwas, Anton 01 January 2004 (has links)
As computing power has increased over the past few decades, science and engineering have found more and more uses for this new found computing power. With the advent of multiprocessor machines, we are achieving MIPS and FLOPS ratings previously unthought-of. Distributed shared-memory machines (DSM) are quickly becoming a powerful tool for computing, and the ability to build them from commodity off-the-shelf parts would be a great benefit to computing in general. In the paper entitled, "SMTp: An Architecture for Next-generation Scalable Multi-threading", Heinrich, et al. presents an architecture for a scalable DSM built from slightly modified machines capable of simultaneous multi-threading (SMT). In this architecture SMT -based machines are connected together via a high-speed network as DSMs with a directory-based cache coherence protocol. What is unique in SMTp is that the cache coherence protocol runs on the second thread in the SMT processors instead of running on an expensive, specialized memory controller. The results of this work show that SMTp can sometimes be even faster than dedicated hardware. In this thesis I intend to present the work on SMTp and extend its capabilities by removing the necessity for memory based directory backing by leveraging the work of Wolf-Dietrich Weber in sparse directories. The removal of the directory backing store will free a large percentage of main memory for work in the system while having only a minor impact on the cache miss rate of applications and overall system throughout.
|
238 |
Routing and flow control in integrated voice-data networksNassehi, Mohammad Mehdi. January 1981 (has links)
No description available.
|
239 |
Improving Branch Prediction Accuracy Via Effective Source Information And Prediction AlgorithmsGao, Hongliang 01 January 2008 (has links)
Modern superscalar processors rely on branch predictors to sustain a high instruction fetch throughput. Given the trend of deep pipelines and large instruction windows, a branch misprediction will incur a large performance penalty and result in a significant amount of energy wasted by the instructions along wrong paths. With their critical role in high performance processors, there has been extensive research on branch predictors to improve the prediction accuracy. Conceptually a dynamic branch prediction scheme includes three major components: a source, an information processor, and a predictor. Traditional works mainly focus on the algorithm for the predictor. In this dissertation, besides novel prediction algorithms, we investigate other components and develop untraditional ways to improve the prediction accuracy. First, we propose an adaptive information processing method to dynamically extract the most effective inputs to maximize the correlation to be exploited by the predictor. Second, we propose a new prediction algorithm, which improves the Prediction by Partial Matching (PPM) algorithm by selectively combining multiple partial matches. The PPM algorithm was previously considered optimal and has been used to derive the upper limit of branch prediction accuracy. Our proposed algorithm achieves higher prediction accuracy than PPM and can be implemented in realistic hardware budget. Third, we discover a new locality existing between the address of producer loads and the outcomes of their consumer branches. We study this address-branch correlation in detail and propose a branch predictor to explore this correlation for long-latency and hard-to-predict branches, which existing branch predictors fail to predict accurately.
|
240 |
FPGA-based range-limited molecular dynamics accelerationWu, Chunshu 07 September 2023 (has links)
Molecular Dynamics (MD) is a computer simulation technique that executes iteratively over discrete, infinitesimal time intervals. It has been a widely utilized application in the fields of material sciences and computer-aided drug design for many years, serving as a crucial benchmark in high-performance computing (HPC). Numerous MD packages have been developed and effectively accelerated using GPUs. However, as the limits of Moore's Law are reached, the performance of an individual computing node has reached its bottleneck, while the performance of multiple nodes is primarily hindered by scalability issues, particularly when dealing with small datasets.
In this thesis, the acceleration with respect to small datasets is the main focus. With the recent COVID-19 pandemic, drug discovery has gained significant attention, and Molecular Dynamics (MD) has emerged as a crucial tool in this process. Particularly, in the critical domain of drug discovery, small simulations involving approximately ~50K particles are frequently employed. However, it is important to note that small simulations do not necessarily translate to faster results, as long-term simulations comprising billions of MD iterations and more are essential in this context.
In addition to dataset size, the problem of interest is further constrained. Referred to as the most computationally demanding aspect of MD, the evaluation of range-limited (RL) forces not only accounts for 90% of the MD computation workload but also involves irregular mapping patterns of 3-D data onto 2-D processor networks. To emphasize, this thesis centers around the acceleration of RL MD specifically for small datasets.
In order to address the single-node bottleneck and multi-node scaling challenges, the thesis is organized into two progressive stages of investigation. The first stage delves extensively into enhancing single-node efficiency by examining various factors such as workload mapping from 3-D to 2-D, data routing, and data locality. The second stage focuses on studying multi-node scalability, with a particular emphasis on strong scaling, bandwidth demands, and the synchronization mechanisms between nodes.
Through our study, the results show our design on a Xilinx U280 FPGA achieves 51.72x and 4.17x speedups with respect to an Intel Xeon Gold 6226R CPU, and a Quadro RTX 8000 GPU. Our research towards strong scaling also demonstrates that 8 Xilinx U280 FPGAs connected to a switch achieves 4.67x speedup compared to an Nvidia V100 GPU
|
Page generated in 0.1254 seconds