271 |
Concepts and capabilities of database machinesTavakoli, Nassrin January 2010 (has links)
Typescript (photocopy). / Digitized by Kansas Correctional Industries
|
272 |
Design and measurement of a reconfigurable multi-microprocessor machineZukowski, Charles January 1982 (has links)
Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1982. / MICROFICHE COPY AVAILABLE IN ARCHIVES AND ENGINEERING / Includes bibliographical references. / by Charles Zukowski. / M.S.
|
273 |
The Design, Implementation, and Evaluation of Software and Architectural Support for ARM VirtualizationDall, Christoffer January 2018 (has links)
The ARM architecture is dominating in the mobile and embedded markets and is making an upwards push into the server and networking markets where virtualization is a key technology. Similar to x86, ARM has added hardware support for virtualization, but there are important differences between the ARM and x86 architectural designs. Given two widely deployed computer architectures with different approaches to hardware virtualization support, we can evaluate, in practice, benefits and drawbacks of different approaches to architectural support for virtualization.
This dissertation explores new approaches to combining software and architectural support for virtualization with a focus on the ARM architecture and shows that it is possible to provide virtualization services an order of magnitude more efficiently than traditional implementations.
First, we investigate why the ARM architecture does not meet the classical requirements for virtualizable architectures and present an early prototype of KVM for ARM, a hypervisor using lightweight paravirtualization to run VMs on ARM systems without hardware virtualization support. Lightweight paravirtualization is a fully automated approach which replaces sensitive instructions with privileged instructions and requires no understanding of the guest OS code.
Second, we introduce split-mode virtualization to support hosted hypervisor designs using ARM's architectural support for virtualization. Different from x86, the ARM virtualization extensions are based on a new hypervisor CPU mode, separate from existing CPU modes. This separate hypervisor CPU mode does not support running existing unmodified OSes, and therefore hosted hypervisor designs, in which the hypervisor runs as part of a host OS, do not work on ARM. Split-mode virtualization splits the execution of the hypervisor such that the host OS with core hypervisor functionality runs in the existing kernel CPU mode, but a small runtime runs in the hypervisor CPU mode and supports switching between the VM and the host OS. Split-mode virtualization was used in KVM/ARM, which was designed from the ground up as an open source project and merged in the mainline Linux kernel, resulting in interesting lessons about translating research ideas into practice.
Third, we present an in-depth performance study of 64-bit ARMv8 virtualization using server hardware and compare against x86. We measure the performance of both standalone and hosted hypervisors on both ARM and x86 and compare their results. We find that ARM hardware support for virtualization can enable faster transitions between the VM and the hypervisor for standalone hypervisors compared to x86, but results in high switching overheads for hosted hypervisors compared to both x86 and to standalone hypervisors on ARM. We identify a key reason for high switching overhead for hosted hypervisors being the need to save and restore kernel mode state between the host OS kernel and the VM kernel. However, standalone hypervisors such as Xen, cannot leverage their performance benefit in practice for real application workloads. Other factors related to hypervisor software design and I/O emulation play a larger role in overall hypervisor performance than low-level interactions between the hypervisor and the hardware.
Fourth, realizing that modern hypervisors rely on running a full OS kernel, the hypervisor OS kernel, to support their hypervisor functionality, we present a new hypervisor design which runs the hypervisor and its hypervisor OS kernel in ARM's separate hypervisor CPU mode and avoids the need to multiplex kernel mode CPU state between the VM and the hypervisor. Our design benefits from new architectural features, the virtualization host extensions (VHE), in ARMv8.1 to avoid modifying the hypervisor OS kernel to run in the hypervisor CPU mode. We show that the hypervisor must be co-designed with the hardware features to take advantage of running in a separate CPU mode and implement our changes to KVM/ARM. We show that running the hypervisor OS kernel in a separate CPU mode from the VM and taking advantage of ARM's ability to quickly switch between the VM and hypervisor results in an order of magnitude reduction in overhead for important virtualization microbenchmarks and reduces the overhead of real application workloads by more than 50%.
|
274 |
Hybrid Analog-Digital Co-Processing for Scientific ComputationHuang, Yipeng January 2018 (has links)
In the past 10 years computer architecture research has moved to more heterogeneity and less adherence to conventional abstractions. Scientists and engineers hold an unshakable belief that computing holds keys to unlocking humanity's Grand Challenges. Acting on that belief they have looked deeper into computer architecture to find specialized support for their applications. Likewise, computer architects have looked deeper into circuits and devices in search of untapped performance and efficiency. The lines between computer architecture layers---applications, algorithms, architectures, microarchitectures, circuits and devices---have blurred. Against this backdrop, a menagerie of computer architectures are on the horizon, ones that forgo basic assumptions about computer hardware, and require new thinking of how such hardware supports problems and algorithms.
This thesis is about revisiting hybrid analog-digital computing in support of diverse modern workloads. Hybrid computing had extensive applications in early computing history, and has been revisited for small-scale applications in embedded systems. But architectural support for using hybrid computing in modern workloads, at scale and with high accuracy solutions, has been lacking.
I demonstrate solving a variety of scientific computing problems, including stochastic ODEs, partial differential equations, linear algebra, and nonlinear systems of equations, as case studies in hybrid computing. I solve these problems on a system of multiple prototype analog accelerator chips built by a team at Columbia University. On that team I made contributions toward programming the chips, building the digital interface, and validating the chips' functionality. The analog accelerator chip is intended for use in conjunction with a conventional digital host computer.
The appeal and motivation for using an analog accelerator is efficiency and performance, but it comes with limitations in accuracy and problem sizes that we have to work around.
The first problem is how to do problems in this unconventional computation model. Scientific computing phrases problems as differential equations and algebraic equations. Differential equations are a continuous view of the world, while algebraic equations are a discrete one. Prior work in analog computing mostly focused on differential equations; algebraic equations played a minor role in prior work in analog computing. The secret to using the analog accelerator to support modern workloads on conventional computers is that these two viewpoints are interchangeable. The algebraic equations that underlie most workloads can be solved as differential equations,
and differential equations are naturally solvable in the analog accelerator chip. A hybrid analog-digital computer architecture can focus on solving linear and nonlinear algebra problems to support many workloads.
The second problem is how to get accurate solutions using hybrid analog-digital computing. The reason that the analog computation model gives less accurate solutions is it gives up representing numbers as digital binary numbers, and instead uses the full range of analog voltage and current to represent real numbers. Prior work has established that encoding data in analog signals gives an energy efficiency advantage as long as the analog data precision is limited. While the analog accelerator alone may be useful for energy-constrained applications where inputs and outputs are imprecise, we are more interested in using analog in conjunction with digital for precise solutions. This thesis gives novel insight that the trick to do so is to solve nonlinear problems where low-precision guesses are useful for conventional digital algorithms.
The third problem is how to solve large problems using hybrid analog-digital computing. The reason the analog computation model can't handle large problems is it gives up step-by-step discrete-time operation, instead allowing variables to evolve smoothly in continuous time. To make that happen the analog accelerator works by chaining hardware for mathematical operations end-to-end. During computation analog data flows through the hardware with no overheads in control logic and memory accesses. The downside is then the needed hardware size grows alongside problem sizes. While scientific computing researchers have for a long time split large problems into smaller subproblems to fit in digital computer constraints, this thesis is a first attempt to consider these divide-and-conquer algorithms as an essential tool in using the analog model of computation.
As we enter the post-Moore’s law era of computing, unconventional architectures will offer specialized models of computation that uniquely support specific problem types. Two prominent examples are deep neural networks and quantum computers. Recent trends in computer science research show these unconventional architectures will soon have broad adoption. In this thesis I show another specialized, unconventional architecture is to use analog accelerators to solve problems in scientific computing. Computer architecture researchers will discover other important models of computation in the future. This thesis is an example of the discovery process, implementation, and evaluation of how an unconventional architecture supports specialized workloads.
|
275 |
Application-specific instruction set processor for speech recognition.January 2005 (has links)
Cheung Man Ting. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2005. / Includes bibliographical references (leaves 69-71). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- The Emergence of ASIP --- p.1 / Chapter 1.1.1 --- Related Work --- p.3 / Chapter 1.2 --- Motivation --- p.6 / Chapter 1.3 --- ASIP Design Methodologies --- p.7 / Chapter 1.4 --- Fundamentals of Speech Recognition --- p.8 / Chapter 1.5 --- Thesis outline --- p.10 / Chapter 2 --- Automatic Speech Recognition --- p.11 / Chapter 2.1 --- Overview of ASR system --- p.11 / Chapter 2.2 --- Theory of Front-end Feature Extraction --- p.12 / Chapter 2.3 --- Theory of HMM-based Speech Recognition --- p.14 / Chapter 2.3.1 --- Hidden Markov Model (HMM) --- p.14 / Chapter 2.3.2 --- The Typical Structure of the HMM --- p.14 / Chapter 2.3.3 --- Discrete HMMs and Continuous HMMs --- p.15 / Chapter 2.3.4 --- The Three Basic Problems for HMMs --- p.17 / Chapter 2.3.5 --- Probability Evaluation --- p.18 / Chapter 2.4 --- The Viterbi Search Engine --- p.19 / Chapter 2.5 --- Isolated Word Recognition (IWR) --- p.22 / Chapter 3 --- Design of ASIP Platform --- p.24 / Chapter 3.1 --- Instruction Fetch --- p.25 / Chapter 3.2 --- Instruction Decode --- p.26 / Chapter 3.3 --- Datapath --- p.29 / Chapter 3.4 --- Register File Systems --- p.30 / Chapter 3.4.1 --- Memory Hierarchy --- p.30 / Chapter 3.4.2 --- Register File Organization --- p.31 / Chapter 3.4.3 --- Special Registers --- p.34 / Chapter 3.4.4 --- Address Generation --- p.34 / Chapter 3.4.5 --- Load and Store --- p.36 / Chapter 4 --- Implementation of Speech Recognition on ASIP --- p.37 / Chapter 4.1 --- Hardware Architecture Exploration --- p.37 / Chapter 4.1.1 --- Floating Point and Fixed Point --- p.37 / Chapter 4.1.2 --- Multiplication and Accumulation --- p.38 / Chapter 4.1.3 --- Pipelining --- p.41 / Chapter 4.1.4 --- Memory Architecture --- p.43 / Chapter 4.1.5 --- Saturation Logic --- p.44 / Chapter 4.1.6 --- Specialized Addressing Modes --- p.44 / Chapter 4.1.7 --- Repetitive Operation --- p.47 / Chapter 4.2 --- Software Algorithm Implementation --- p.49 / Chapter 4.2.1 --- Implementation Using Base Instruction Set --- p.49 / Chapter 4.2.2 --- Implementation Using Refined Instruction Set --- p.54 / Chapter 5 --- Simulation Results --- p.56 / Chapter 6 --- Conclusions and Future Work --- p.60 / Appendices --- p.62 / Chapter A --- Base Instruction Set --- p.62 / Chapter B --- Special Registers --- p.65 / Chapter C --- Chip Microphotograph of ASIP --- p.67 / Chapter D --- The Testing Board of ASIP --- p.68 / Bibliography --- p.69
|
276 |
Constraint extension to dataflow network.January 2004 (has links)
Tsang Wing Yee. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2004. / Includes bibliographical references (leaves 90-93). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 2 --- Preliminaries --- p.4 / Chapter 2.1 --- Constraint Satisfaction Problems --- p.4 / Chapter 2.2 --- Dataflow Networks --- p.5 / Chapter 2.3 --- The Lucid Programming Language --- p.9 / Chapter 2.3.1 --- Daton Domain --- p.10 / Chapter 2.3.2 --- Constants --- p.10 / Chapter 2.3.3 --- Variables --- p.10 / Chapter 2.3.4 --- Dataflow Operators --- p.11 / Chapter 2.3.5 --- Functions --- p.16 / Chapter 2.3.6 --- Expression and Statement --- p.17 / Chapter 2.3.7 --- Examples --- p.17 / Chapter 2.3.8 --- Implementation --- p.19 / Chapter 3 --- Extended Dataflow Network --- p.25 / Chapter 3.1 --- Assertion Arcs --- p.25 / Chapter 3.2 --- Selection Operators --- p.27 / Chapter 3.2.1 --- The Discrete Choice Operator --- p.27 / Chapter 3.2.2 --- The Discrete Committed Choice Operator --- p.29 / Chapter 3.2.3 --- The Range Choice Operators --- p.29 / Chapter 3.2.4 --- The Range Committed Choice Operators --- p.32 / Chapter 3.3 --- Examples --- p.33 / Chapter 3.4 --- E-Lucid --- p.39 / Chapter 3.4.1 --- Modified Four Cockroaches Problem --- p.42 / Chapter 3.4.2 --- Traffic Light Problem --- p.45 / Chapter 3.4.3 --- Old Maid Problem --- p.48 / Chapter 4 --- Implementation of E-Lucid --- p.54 / Chapter 4.1 --- Overview --- p.54 / Chapter 4.2 --- Definition of Terms --- p.56 / Chapter 4.3 --- Function ELUCIDinterpreter --- p.57 / Chapter 4.4 --- Function Edemand --- p.58 / Chapter 4.5 --- Function transf ormD --- p.59 / Chapter 4.5.1 --- Labelling Datastreams of Selection Operators --- p.59 / Chapter 4.5.2 --- Removing Committed Choice Operators --- p.62 / Chapter 4.5.3 --- "Removing asa, wvr, and upon" --- p.62 / Chapter 4.5.4 --- Labelling Output Datastreams of if-then-else-fi --- p.63 / Chapter 4.5.5 --- Transforming Statements to Daton Statements --- p.63 / Chapter 4.5.6 --- Transforming Daton Expressions Recursively --- p.65 / Chapter 4.5.7 --- An Example --- p.65 / Chapter 4.6 --- "Functions constructCSP, f indC, and transf ormC" --- p.68 / Chapter 4.7 --- An Example --- p.75 / Chapter 4.8 --- Function backtrack --- p.77 / Chapter 5 --- Related Works --- p.83 / Chapter 6 --- Conclusion --- p.87
|
277 |
Improving on-chip data cache using instruction register information.January 1996 (has links)
by Lau Siu Chung. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1996. / Includes bibliographical references (leaves 71-74). / Abstract --- p.i / Acknowledgment --- p.ii / List of Figures --- p.v / Chapter Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Hiding memory latency --- p.1 / Chapter 1.2 --- Organization of dissertation --- p.4 / Chapter Chapter 2 --- Related Work --- p.5 / Chapter 2.1 --- Hardware controlled cache prefetching --- p.5 / Chapter 2.2 --- Software assisted cache prefetching --- p.9 / Chapter Chapter 3 --- Data Prefetching --- p.13 / Chapter 3.1 --- Data reference patterns --- p.14 / Chapter 3.2 --- Embedded hints for next data references --- p.19 / Chapter 3.3 --- Instruction Opcode and Addressing Mode Prefetching scheme --- p.21 / Chapter 3.3.1 --- Basic IAP scheme --- p.21 / Chapter 3.3.2 --- Enhanced IAP scheme --- p.24 / Chapter 3.3.3 --- Combined IAP scheme --- p.27 / Chapter 3.4 --- Summary --- p.29 / Chapter Chapter 4 --- Performance Evaluation --- p.31 / Chapter 4.1 --- Evaluation methodology --- p.31 / Chapter 4.1.1 --- Trace-driven simulation --- p.31 / Chapter 4.1.2 --- Caching models --- p.33 / Chapter 4.1.3 --- Benchmarks and metrics --- p.36 / Chapter 4.2 --- General Results --- p.41 / Chapter 4.2.1 --- Varying cache size --- p.44 / Chapter 4.2.2 --- Varying cache block size --- p.46 / Chapter 4.2.3 --- Varying associativity --- p.49 / Chapter 4.3 --- Other performance metrics --- p.52 / Chapter 4.3.1 --- Accuracy of prefetch --- p.52 / Chapter 4.3.2 --- Partial hit delay --- p.55 / Chapter 4.3.3 --- Bus usage problem --- p.59 / Chapter 4.4 --- Zero time prefetch --- p.63 / Chapter 4.5 --- Summary --- p.67 / Chapter Chapter 5 --- Conclusion --- p.68 / Chapter 5.1 --- Summary of our research --- p.68 / Chapter 5.2 --- Future work --- p.70 / Bibliography --- p.71
|
278 |
Design and analysis of a memory hierarchy for a very high performance multiprocessor configurationTick, Evan Michael January 1982 (has links)
Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1982. / MICROFICHE COPY AVAILABLE IN ARCHIVES AND ENGINEERING. / Vita. / Bibliography: p. 204-221. / by Evan Michael Tick. / M.S.
|
279 |
O2-tree: a shared memory resident index in multicore architecturesOhene-Kwofie, Daniel 06 February 2013 (has links)
Shared memory multicore computer architectures are now commonplace in computing.
These can be found in modern desktops and workstation computers and also in High
Performance Computing (HPC) systems. Recent advances in memory architecture and
in 64-bit addressing, allow such systems to have memory sizes of the order of hundreds of
gigabytes and beyond. This now allows for realistic development of main memory resident
database systems. This still requires the use of a memory resident index such as T-Tree,
and the B+-Tree for fast access to the data items.
This thesis proposes a new indexing structure, called the O2-Tree, which is essentially
an augmented Red-Black Tree in which the leaf nodes are index data blocks that store
multiple pairs of key and value referred to as \key-value" pairs. The value is either the
entire record associated with the key or a pointer to the location of the record. The
internal nodes contain copies of the keys that split blocks of the leaf nodes in a manner
similar to the B+-Tree. O2-Tree structure has the advantage that: it can be easily
reconstructed by reading only the lowest value of the key of each leaf node page. The size
is su ciently small and thus can be dumped and restored much faster.
Analysis and comparative experimental study show that the performance of the O2-Tree
is superior to other tree-based index structures with respect to various query operations
for large datasets. We also present results which indicate that the O2-Tree outperforms
popular key-value stores such as BerkelyDB and TreeDB of Kyoto Cabinet for various
workloads. The thesis addresses various concurrent access techniques for the O2-Tree for
shared memory multicore architecture and gives analysis of the O2-Tree with respect to
query operations, storage utilization, failover and recovery.
|
280 |
Exploiting Adaptive Techniques to Improve Processor Energy EfficiencyChen, Hu 01 May 2016 (has links)
Rapid device-miniaturization keeps on inducing challenges in building energy efficient microprocessors. As the size of the transistors continuously decreasing, more uncertainties emerge in their operations. On the other hand, integrating more and more transistors on a single chip accentuates the need to lower its supply-voltage. This dissertation investigates one of the primary device uncertainties - timing error, in microprocessor performance bottleneck in NTC era. Then it proposes various innovative techniques to exploit these opportunities to maintain processor energy efficiency, in the context of emerging challenges. Evaluated with the cross-layer methodology, the proposed approaches achieve substantial improvements in processor energy efficiency, compared to other start-of-art techniques.
|
Page generated in 0.1024 seconds