171 |
The design of a sparse vector processorHopkins, T. M. January 1993 (has links)
This thesis describes the development of a new vector processor architecture capable of high efficiency when computing with very sparse vector and matrix data, of irregular structure. Two applications are identified as of particular importance: sparse Gaussian elimination, and Linear Programming, and the algorithmic steps involved in the solution of these problems are analysed. Existing techniques for sparse vector computation, which are only able to achieve a small fraction of the arithmetic performance commonly expected on dense matrix problems, are critically examined. A variety of new techniques with potential for hardware support is discussed. From these, the most promising are selected, and efficient hardware implementations developed. The architecture of a complete vector processor incorporating the new vector and matrix mechanisms is described - the new architecture also uses an innovative control structure for the vector processor, which enables high efficiency even when computing with vectors with very small numbers of non-zeroes. The practical feasibility of the design is demonstrated by describing the prototype implementation, under construction from off-the-shelf components. The expected performance of the new architecture is analysed, and simulation results are presented which demonstrate that the machine could be expected to provide an order of magnitude speed-up on many large sparse Linear Programming problems, compared to a scalar processor with the same clock rate. The simulation results indicate that the vector processor control structure is successful - the vector half-performance length is as low as 8 for standard vector instruction loop tests. In some cases, simulations indicate that the performance of the machine is limited by the speed of some scalar processor operations. Finally, the scope for re-implementing the new architecture in technology faster than the prototype's 8MHz is briefly discussed, and particular potential difficulties identified.
|
172 |
Evolvable hardware platform for fault-tolerant reconfigurable sensor electronicsStefatos, Evangelos F. January 2007 (has links)
The advent of System-on-Chip technology and the continuous shrinkage in silicon device feature size are dictating the need for realisation of miniaturised integrated robust and autonomous systems. This need is especially evident in systems operating within hostile environments, such as aerospace. Evolvable Hardware (EHW) is a technology, which shows promise in meeting the needs of systems facing malfunctions due to harsh electronics environments. Key features of EHW are a reconfigurable fabric and an evolutionary strategy. The design of a complete and efficient EHW framework must consider both these features concurrently. This comprehensive approach to the design of EHW based systems is an issue considered by only a few researchers in the literature. This thesis presents a novel holistic EHW framework that accomplishes all the electronics associated with the JPL/Boeing gyroscope sensor. It includes an efficient fault-tolerant reconfigurable fabric and an integrated on-chip multi-object evolutionary strategy. The conception and implementation of both parts also consider real-time adaptation and low-power consumption for enabling ultra-long life aerospace missions. A number of key objectives have been achieved: a) The Verilog implementation of an autonomous reconfigurable fabric that is capable of accomplishing the sensor’s electronics with substantial accuracy (>99.7%), b) The implementation of numerous evolutionary strategies that are able to primarily guide the hardware evolution even in the presence of 30% faults injected in the user and configuration memory of the system, c) This in addition to a reduction from 8.6 to 9.8 times in the number of generations that are needed for the evolution of a 31-tap FIR filter, compared with previous research in this field, d) Furthermore, the circuits evolved consume 3.3 times less power than similar implementations within industrial reconfigurable devices.
|
173 |
Small nets and short paths : optimising neural computationFrean, Marcus Roland January 1990 (has links)
The thesis explores two aspects of <i>optimisation</i> in neural network research. 1. The question of how to find the optimal feed-forward neural network architecture for learning a given binary classification is addressed. The so-called constructive approach is reviewed whereby intermediate, hidden, units are built as required for the particular problem. Current constructive algorithms are compared, and three new methods are introduced. One of these, the <i>Upstart</i> algorithm, is shown to out-perform all other constructive algorithms of this type. This work led on to the ancillary problem of finding a satisfactory procedure for changing the weight values of an individual unit in a network. The new <i>thermal perceptron</i> rule is described and is shown to compare favorably with its competitors. Finally the spectrum of possible learning rules is surveyed. 2. Neurobiologically inspired algorithms for mapping between spaces of different dimensions are applied to a classic optimisation problem, the Travelling Salesman Problem. Two new methods are described that can tackle the general symmetric form of the TSP, thus overcoming the restriction on other neural network algorithms to the geometric case.
|
174 |
Shared memory with hidden latency on a family of mesh-like networksHarris, Tim J. January 1995 (has links)
We begin with a general introduction to the problem of PRAM simulation, and a brief survey of the state of the art in such simulations. We then highlight the importance of processor efficient simulations, where the latency of access to shared memory is hidden. We consider the use of multithreading for latency hiding in PRAM simulations in a general context, addressing the relationship between the number of threads run on each processor and diameter of the network. We provide evidence that in the general case bounded degree networks will not have enough bandwidth to support such processor efficient simulations, and we define a class of networks, known as <I>fat rings </I>and <I>fat meshes, </I>which provide the necessary bandwidth. We then implement the ideas we have discussed by providing a processor efficient EREW PRAM simulation on a family of networks consisting of fat meshes of arbitrary dimension. The simulation focuses on memory management and routing techniques for the networks. We provide evidence that concurrent access models are inherently poorly suited for multithreaded architectures. Given these difficulties we go on to describe a satisfactory CRCW PRAM simulation for the fat mesh, which by necessity has delay which is greater than the diameter of the network. We then reinforce our theoretical conclusions with experimental results generated from a trace driven simulation of our architecture. We conclude with an assessment of some performance characteristics of fat mesh machines, as well as a review of the main points of the thesis.
|
175 |
Standard CMOS floating gate memories for non-volatile parameterisation of pulse-stream VLSI radial basis function neural networksBuchan, L. William January 1997 (has links)
Analogue VLSI artificial neural networks (ANNs) offer a means of dealing with the non-linearities, cross-sensitivities, noise and interfacing requirements of analogue sensors (the problem of <I>sensor fusion</I>) whilst maintaining the compactness and low power of direct analogue operation. Radial Basis Function (RBF) networks, as a means of performing this function, have several advantages over other ANNs. The pulse-stream ANN technique developed at Edinburgh provides the additional benefit of implicit analogue-digital conversion and signal robustness. However, progressing this work requires the integration of high density analogue memory for parameterisation of the ANN since conventional weight refresh methods are too area and power hungry. For this purpose, standard CMOS floating gates have been proposed as these maintain the low process cost and energy availability of the neural circuitry. Investigation of this proposition proceeded in three stages: 1. Evaluation of the suitability of a standard process for the fabrication of floating gates and exposure of the issues involved: feasibility, <I>analogue </I>programmability, layout optimisation and modelling. 2. The interfacing of floating gates to Radial Basis Function (RBF) neural network circuits and development of programming approaches to cope with potentially destructive characteristics of high voltages and currents. 3. Development of circuits for programming floating gates using continuous-time feedback to facilitate a rapid weight downloading phase from a software model. Three chips were designed, fabricated and tested to explore each of these sets of issues. Detailed discussion and measurements are presented. Conclusions have been drawn about layout optimisation, programmability and device ageing and on the design and general suitability for purpose of standard CMOS floating gates. While these can be designed, interfaced to RBF circuits, and programmed to perform useful functions, their disadvantages make them more useful as a prototyping technique than as memory modules for inclusion in a final product.
|
176 |
Domain-specific and reconfigurable instruction cells based architectures for low-power SoCKhawam, Sami January 2006 (has links)
New communication standards and the requirements of modern mobile-device’s users push silicon towards processing more data in an increasingly shorter time; this is accurately the case for new compression formats targeting high-quality low-bandwidth multimedia. This presses forward the need for new programmable hardware solutions that intrinsically achieves generality, high-performance and, most importantly, low power consumption. This work investigates the design of reconfigurable hardware architectures to address these issues. Two novel solutions are thus proposed along with the implementation of several multimedia applications on them; the first architecture fits as a middle ground between FPGAs and ASICs in terms for performance and costs. This is achieved by using coarse-grain functional units combined with programmable interconnects to build flexible, high-performance and low-power circuits. A framework for generating and programming the custom domain-specific reconfigurable arrays is also proposed. The tool-flow leverages some of the design effort that goes in creating and using the arrays by facilitating the reuse of previous design elements. Furthermore, this work proposes novel direction-aware routing elements to allow efficient tailoring of interconnects structures to the application. The second proposed processing architecture adds the dimension of high-level programmability to the reconfigurable arrays. This is achieved by using functional units that can be directly matched to elements in a complier’s internal representation of software. By using a custom instruction-controller the array can execute control operations in a similar way to processors, while at the same time allowing highly efficient mapping of datapath circuits. Coupled to the low-power and high-throughput achieved, this creates a viable alternative to FPGAs, DSPs and ASICs suitable for deployment in high performance mobile applications.
|
177 |
A VLSI hardware neural accelerator using reduced precision arithmeticButler, Zoe F. January 1990 (has links)
A synthetic neural network is a massively parallel array of computational units (neurons) that captures some of the functionality and computational strengths of the brain. The functions that it may have are the ability to consider many solutions simultaneously, the ability to work with corrupted or incomplete data without any form of error correction and a natural fault tolerance, which is acquired from the parallelism and the representation of knowledge in a distributed fashion giving rise to graceful degradation as faults appear. A neuron can be thought of, in engineering terms, as a state machine that signals its 'on' <i>state</i> by the absence of a voltage. The level of excitation of the neuron is represented by its quality of <i>activity</i>. The activity is related to the neural state by an <i>activation function</i>, which is usually the 'sigmoid' or 'S-shape' function. This function represents a smooth switching of neural state from off to on as the activity increases through a threshold. Direct stimulation of the neuron from outside the network and contributions from other neurons in the network will change the level of activity. The levels of firing from other neurons to a receiving neuron are weighted by interneural synaptic weights. The weights represent the long term memory storage elements of network. By altering the value of the weights, information is encoded or 'learnt' by the network, which adds to its store of knowledge. There are three broad categories into which neural network research can be divided. These are mathematical description and analysis of the dynamical learning properties of the network, computer simulation of the mathematical methods and the VLSI hardware implementation of neural functions or classes of neural networks. It is the final category into which the main thrust of this thesis falls. The research presented here implements a VLSI digital neural network as a neural accelerator to speed up simulation times. The VLSI design incorporates a parallel array of synapses. The synapses provide the connections between neurons. Each synapse effectively 'multiplies' the neural state of the receiving neuron by the synaptic weight between the sending neuron and the receiving neuron. The 'multiplication' is achieved by using <i>reduced precision arithmetic</i> that has a 'staircase' activation function modelled on the sigmoid activation function and allows the neuron to be in any one of five states. Therefore, with little loss in precision, the reduced precision arithmetic avoids using full multiplication, which is expensive in silicon area. The reduced arithmetic synpase increases the number of synapses that can be implemented on a single die. The VLSI neural network chips can be easily cascaded together to give a larger array of synapses. Four cascaded chips resulted in 108 synapses in an array. However, this size of array was too small to perform neural network learning simulations. Therefore the synapse array has been configured in a <i>paging architecture</i>, that has traded off some of the high speed of the chips (upto 20 Mhz) against increased network size. The synapse array has been wired with support circuitry on to a board to give a <i>neural accelerator</i> that is interface to a host Sun computer. The paging architecture of the board allows a network of several hundred neurons to be simulated. The neural accelerator is used with the delta learning rule algorithm and results show its increased acceleration to be up to two orders of magnitude over equivalent software simulations.
|
178 |
Computer structures for distributed systemsCasey, Liam Maurice January 1977 (has links)
No description available.
|
179 |
Computer aided design techniques applied to logic designDervisoglu, Bulent January 1973 (has links)
No description available.
|
180 |
Adding safe and effective load balancing to multicomputersMartin, Paul January 1994 (has links)
In the quest for ever greater cost-effectiveness researchers have begun to experiment with scaleable, parallel architectures known as 'multicomputers'. The underlying assumption is that adding more processors to a computer is a cheap way to increase the problem size which it can tackle and /or decrease the execution time. However, results to date are less good than those hoped for, indicating that there are still a number of difficulties to be resolved. One problem in particular is felt by experienced multicomputer programmers looking for significant execution time speed-ups: effort must be expended to tune a program for the underlying architecture if the work is to be evenly distributed between the processors. Fortunately, a solution to this problem can be found in dynamic load balancing, a mechanism for redistributing work between processors automatically and transparently, allowing the programmer to develop fast, portable programs without having to worry about performance tuning. This thesis examines the many issues associated with adding load balancing to multicomputers and makes the following contributions. Firstly, a critical review of the literature on load balancing showing the techniques proposed and suggesting which are the most promising for future systems. Secondly, a detailed description of how a typical multicomputer operating system needs to be extended in order to be able to checkpoint a task on one processor and restart it on another. Thirdly, a study of the impact on performance of these extensions to the operating system functionality. Fourthly, an investigation into the use of formal methods for designing and verifying the protocols for exchange of tasks between processors. Lastly, a report on the best methods for detecting processor load imbalances and for deciding which user tasks to move. Thus, the thesis addresses all aspects of adding safe and effective load balancing to multicomputers.
|
Page generated in 1.0273 seconds