• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 223
  • 59
  • 56
  • 55
  • 29
  • 25
  • 23
  • 18
  • 4
  • 3
  • 3
  • 3
  • 3
  • 2
  • 2
  • Tagged with
  • 613
  • 158
  • 117
  • 107
  • 91
  • 90
  • 77
  • 63
  • 57
  • 56
  • 55
  • 52
  • 51
  • 50
  • 49
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

A multichannel correlation signal processor for flow measurement

Keech, R. P. January 1984 (has links)
No description available.
12

Language extensions for array processor and multi-processor configurations

Orr, Rodney Alister January 1986 (has links)
No description available.
13

Sistema integrado de aquisição e processamento de imagens

Medeiros, Henry Ponti 2010 October 1914 (has links)
Este trabalho apresenta o projeto, a implementação e os testes de um sistema integrado de aquisição e processamento de imagens (SIAPI). O sistema desenvolvido é constituído por um sensor de imagem CCD (Charge Coupled Device), um processador digital de sinais (Digital Signal Processor – DSP) de baixo custo responsável pelo controle do sensor, um DSP de alto desempenho para realizar o processamento das imagens adquiridas, uma interface USB (Universal Serial Bus) para realizar a comunicação do dispositivo com um microcomputador e um módulo de resfriamento termo-elétrico baseado no efeito Peltier para minimizar a quantidade de ruído térmico gerado no sensor de imagem. O sistema permite que se controle todos os parâmetros do sensor de imagem, bem como o processamento das imagens adquiridas em tempo real. Para avaliar as características e o desempenho do sistema, construiu-se um protótipo baseado em um DSK (DSP Starter Kit) e desenvolveu-se um software em linguagem C para comunicação com o sistema. Foram levantadas as características de resposta do sensor de imagem em diferentes configurações. O sistema demonstrou ser funcional e seu desempenho na execução de algoritmos de processamento de imagens foi bastante superior ao de um microcomputador de alto desempenho. / This work presents the design and implementation of a new integrated system for image acquisition and processing (SIAPI). The system is composed of a charge coupled device (CCD) image sensor, a low end DSP (Digital Signal Processor) to control the image sensor, a high end DSP to process the acquired images, a universal serial bus (USB) interface to connect the system to a microcomputer and a thermo-electric cooler device based on the Peltier effect which is meant to reduce the amount of thermal noise generated by the image sensor. All the functional parameters of the image sensor can be controlled through the system, and the images acquired can be processed in real time. To evaluate the functions of the system and to measure its performance, a prototype was built based on a DSK (DSP Starter Kit). Also, a computer program in C language was developed to communicate with the system. The response characteristics of the image sensor were evaluated under different conditions. It has been concluded that the system is completely functional and that it outperforms a high performance microcomputer when executing image processing algorithms.
14

Tolerating memory latency through lightweight multithreading

Gale, Andrew January 2002 (has links)
As processor clock frequencies continue to improve at a rate that exceeds the rate of improvement in the performance of semiconductor memories, so the effect of memory latency on processor efficiency increases. Unless steps are taken to mitigate the effect of memory latency, the increased processor frequency is of little benefit. This work demonstrates how multithreading can reduce the effect of memory latency on processor performance and how just a few threads are required to achieve close to optimal performance. A lightweight multithreaded architecture is discussed and simulated to show how threads derived from an application's instruction-level parallelism may be used to tolerate memory latency.
15

Analysis of Hardware Sorting Units in Processor Design

Furlan, Carmelo C. 01 June 2019 (has links)
Sorting is often computationally intensive and can cause the application in which it is used to run slowly. To date, the quickest software sorting implementations for an N element sorting problem runs at O(nlogn). Current techniques, beyond developing better algorithms, used to accelerate sorting include the use of multiple processors or moving the sorting operation to a GPU. The use of multiple processors or a GPU can lead to increased energy consumption and heat produced by the device as compared to a single-core GPU-less implementation. To address these problems, specialized instructions and hardware units can be added to the processors to accelerate the sorting operation directly. This thesis studies and records the performance implications from implementing a sorting accelerator into a modern RISC-V processor pipeline. This thesis also explores the additional energy and area costs of implementing such hardware units in the processor.
16

A Soft-core processor architecture optimised for radar signal processing applications

Broich, René January 2013 (has links)
Current radar signal processor architectures lack either performance or flexibility in terms of ease of modification and large design time overheads. Combinations of processors and FPGAs are typically hard-wired together into a precisely timed and pipelined solution to achieve a desired level of functionality and performance. Such a fixed processing solution is clearly not feasible for new algorithm evaluation or quick changes during field tests. A more flexible solution based on a high-performance soft-core processing architecture is proposed. To develop such a processing architecture, data and signal-flow characteristics of common radar signal processing algorithms are analysed. Each algorithm is broken down into signal processing and mathematical operations. The computational requirements are then evaluated using an abstract model of computation to determine the relative importance of each mathematical operation. Critical portions of the radar applications are identified for architecture selection and optimisation purposes. Built around these dominant operations, a soft-core architecture model that is better matched to the core computational requirements of a radar signal processor is proposed. The processor model is iteratively refined based on the previous synthesis as well as code profiling results. To automate this iterative process, a software development environment was designed. The software development environment enables rapid architectural design space exploration through the automatic generation of development tools (assembler, linker, code editor, cycle accurate emulator / simulator, programmer, and debugger) as well as platform independent VHDL code from an architecture description file. Together with the board specific HDL-based HAL files, the design files are synthesised using the vendor specific FPGA tools and practically verified on a custom high performance development board. Timing results, functional accuracy, resource usage, profiling and performance data are analysed and fed back into the architecture description file for further refinement. The results from this iterative design process yielded a unique transport-based pipelined architecture. The proposed architecture achieves high data throughput while providing the flexibility that a software-programmable device offers. The end user can thus write custom radar algorithms in software rather than going through a long and complex HDL-based design. The simplicity of this architecture enables high clock frequencies, deterministic response times, and makes it easy to understand. Furthermore, the architecture is scalable in performance and functionality for a variety of different streaming and burst-processing related applications. A comparison to the Texas Instruments C66x DSP core showed a decrease in clock cycles by a factor between 10.8 and 20.9 for the identical radar application on the proposed architecture over a range of typical operating parameters. Even with the limited clock speeds achievable on the FPGA technology, the proposed architecture exceeds the performance of the commercial high-end DSP processor. Further research is required on ASIC, SIMD and multi-core implementations as well as compiler technology for the proposed architecture. A custom ASIC implementation is expected to further improve the processing performance by factors between 10 and 27. / Dissertation (MEng)--University of Pretoria, 2013. / gm2014 / Electrical, Electronic and Computer Engineering / unrestricted
17

Design of Vertex and Per-Fragment Processor for 3D Graphics Rendering

Tsai, Ming-chi 04 September 2007 (has links)
For the past few years, with the rapid advance of VLSI and multimedia technology, the applications of three-dimensional (3D) graphic applications have been widely and rapidly spread into various areas, and not longer limited into specific technical areas performed by high-end workstations. In near future, the 3D graphic engine will become an indispensable part of most multimedia systems including the entertainment television sets, the personal electronic appliances etc. A general 3D graphics engine can be divided into the geometry subsystem and the raster sub- system. The main contribution of this thesis is to design an efficient fragment pipeline process. It also helps the development of the vertex processor, and the integration of geometry and raster subsystem. In the design of the per-fragment processor, since it contains vary processing stages, such as fog blending, visible test, and alpha blending. This thesis analyzes the dependence relationship between these stages to allow several stages to run in parallel to reduce the overall pipeline latency and adjust the processing order of these stages to avoid unnecessary texturing access. This thesis also proposes two memory buffer access mechanisms suitable for the tile-based 3D graphic rendering engine to reduce the overall system memory bandwidth. The first method is to include some additional control flags for each tile such that the frequent buffer clear operations can be integrated with the normal rendering processes to avoid the additional memory clear access. The second approach is to identify the non-modified pixels in each tile by building the dirty table to reduce the number of updated pixels. The experimental results show that the proposed methods can cause more than 50% reduction of memory access. The proposed design has been realized using 0.18um technology. The gate count of the vertex processor without special functions and per-fragment processor is 201k and 118k, respectively.
18

Reusing cached schedules in an out-of-order processor with in-order issue logic

Palomar Pérez, Óscar 09 May 2011 (has links)
Modern processors use out-of-order processing logic to achieve high performance in Instructions Per Cycle (IPC) but this logic has a serious impact on the achievable frequency. In order to get better performance out of smaller transistors there is a trend to increase the number of cores per die instead of making the cores themselves bigger. Moreover, for throughput-oriented and server workloads, simpler in-order processors that allow more cores per die and higher design frequencies are becoming the preferred choice. Unfortunately, for other workloads this type of cores result in a lower single thread performance. There are many workloads where it is still important to achieve good single thread performance. In this thesis we present the ReLaSch processor. Its aim is to enable high IPC cores capable of running at high clock frequencies by processing the instructions using simple superscalar in-order issue logic and caching instruction groups that are dynamically scheduled in hardware after commit, that is, out of the critical path and only when really needed. Objective This thesis has several research goals: • Show that the dynamic scheduler of a conventional out-of-order processor does a lot of redundant work because it ignores the repetitiveness of code. • Propose a complete superscalar out-of-order architecture that reduces the amount of redundant work done by creating the schedules once in dedicated hardware, storing them in a cache of schedules and reusing the schedules as much as possible. • Place the scheduler out of the critical path of execution, which should be enabled by the reduction of work that it must do. Thus, the execution path of our proposed processor can be simpler than that of a conventional out-of-order processor. Proposal and results We present the \textbf{ReLaSch} processor, named after Reused Late Schedules, in which the creation of issue-groups is removed from the critical path of execution and uses a simple and small in-order issue logic. It just wakes-up and selects the instructions of a single issue-group each cycle, instead of processing the instructions of a whole issue queue. A new logic at the end of the conventional pipeline schedules the committed instructions. The new scheduler can be complex since it is not in the critical path of execution. The schedules are cached and whenever it is possible an rgroup is read and its instructions executed. The schedules are reused, lowering the pressure on the scheduling logic. In some cases, the ReLaSch processor is able to outperform a conventional out-of-order processor, because the post-commit scheduler has a broader vision of the code. For instance, while ReLaSch can schedule together two independent instructions that are distant in the code, a conventional out-oforder processor only issues them in the same cycle if both are in-flight. The ReLaSch processor predicts the branch targets, memory aliases and latencies at scheduling time, out of the critical path. The prediction is based on the most recent executions at scheduling time. Furthermore, most of the register renaming process is performed by the scheduler and is removed from the execution pipeline. Our experiments show that ReLaSch has the same average IPC as our reference out-of-order processor and is clearly better than the reference inorder processor (1.55 speed-up). In all cases it outperforms the in-order processor and in 23 benchmarks out of 40 it has a higher IPC than the reference out-of-order processor.
19

Hades - an asynchronous superscalar processor

Elston, Corrie John January 1996 (has links)
No description available.
20

SMART : an innovative multimedia computer architecture for processing ATM cells in real-time

Cashman, Neil January 1998 (has links)
No description available.

Page generated in 0.0738 seconds