• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 53
  • 13
  • 6
  • 6
  • 5
  • 5
  • 5
  • 4
  • 3
  • 3
  • 1
  • Tagged with
  • 116
  • 116
  • 56
  • 52
  • 26
  • 25
  • 25
  • 22
  • 20
  • 19
  • 18
  • 17
  • 15
  • 15
  • 14
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

SUNSHINE: A Multi-Domain Sensor Network Simulator

Zhang, Jingyao 02 November 2010 (has links)
Simulators are important tools for analyzing and evaluating different design options for wireless sensor networks (sensornets) and hence, have been intensively studied in the past decades. However, existing simulators only support evaluations of protocols and software aspects of sensornet design. They cannot accurately capture the significant impacts of various hardware designs on sensornet performance. As a result, the performance/energy benefits of customized hardware designs are difficult to be evaluated in sensornet research. To fill in this technical void, in this thesis, we describe the design and implementation of SUNSHINE, a scalable hardware-software cross-domain simulator for sensornet applications. SUNSHINE is the first sensornet simulator that effectively supports joint evaluation and design of sensor hardware and software performance in a networked context. SUNSHINE captures the performance of network protocols, software and hardware up to cycle-level accuracy through its seamless integration of three existing sensornet simulators: a network simulator TOSSIM, an instruction-set simulator SimulAVR and a hardware simulator GEZEL. SUNSHINE solves challenging design problems, including data exchanges and time synchronizations across different simulation domains and simulation accuracy levels. SUNSHINE also provides hardware specification scheme for simulating flexible and customized hardware designs. Several experiments are given to illustrate SUNSHINE's cross-domain simulation capability, demonstrating that SUNSHINE is an efficient tool for software-hardware codesign in sensornet research. / Master of Science
22

Teaching In-Memory Database Systems the Detection of Hardware Errors

Lehner, Wolfgang, Habich, Dirk, Kolditz, Till 18 January 2023 (has links)
The key objective of database systems is to reliably manage data, whereby high query throughput and low query latency are core requirements. To satisfy these requirements, database systems constantly adapt to novel hardware features. Although it has been intensively studied and commonly accepted that hardware error rates in terms of bit flips increase dramatically with the decrease of the underlying chip structures, most database system research activities neglected this fact, leaving error (bit flip) detection as well as correction to the underlying hardware. Especially for main memory, silent data corruption (SDC) as a result of transient bit flips leading to faulty data is mainly detected and corrected at the DRAM and memory-controller layer. However, since future hardware becomes less reliable and error detection as well as correction by hardware becomes more expensive, this free ride will come to an end in the near future. To further provide a reliable data management, an emerging research direction is employing specific and tailored protection techniques at the database system level. Following that, we are currently developing and implementing an adopted system design for state-of-the-art in-memory column stores. In our lightning talk, we will summarize our current state and outline future work.
23

Arquitetura multi-core reconfigurável para detecção de pedestres baseada em visão / Reconfigurable Multi-core Architecture for Vision-based Pedestrian Detection

Holanda, Jose Arnaldo Mascagni de 17 May 2017 (has links)
Dentre as diversas tecnologias de Assistência Avançada ao Condutor (ADAS) que têm sido adicionadas aos automóveis modernos estão os sistemas de detecção de pedestres. Tais sistemas utilizam sensores, como radares, lasers e câmeras de vídeo para captar informações do ambiente e evitar a colisão com pessoas no contexto do trânsito. Câmeras de vídeo têm se apresentado como um ótima opção para esses sistemas, devido ao relativo baixo custo e à riqueza de informações que capturam do ambiente. Muitas técnicas para detecção de pedestres baseadas em visão têm surgido nos últimos anos, tendo como característica a necessidade de um grande poder computacional para que se possa realizar o processamento das imagens em tempo real, de forma robusta, confiável e com baixa taxa de erros. Além disso, é necessário que sistemas que implementem essas técnicas tenham baixo consumo de energia, para que possam funcionar em um ambiente embarcado, como os automóveis. Uma tendência desses sistemas é o processamento de imagens de múltiplas câmeras presentes no veículo, de forma que o sistema consiga perceber potenciais perigos de colisão ao redor do veículo. Neste contexto, este trabalho aborda o coprojeto de hardware e software de uma arquitetura para detecção de pedestres, considerando a presença de quatro câmeras em um veículo (uma frontal, uma traseira e duas laterais). Com este propósito, utiliza-se a flexibilidade dos dispositivos FPGA para a exploração do espaço de projeto e a construção de uma arquitetura que forneça o desempenho necessário, o consumo de energia em níveis adequados e que também permita a adaptação a novos cenários e a evolução das técnicas de detecção de pedestres por meio da programabilidade. O desenvolvimento da arquitetura baseouse em dois algoritmos amplamente utilizados para detecção de pedestres, que são o Histogram of Oriented Gradients (HOG) e o Integral Channel Features (ICF). Ambos introduzem técnicas que servem como base para os algoritmos de detecção modernos. A arquitetura implementada permitiu a exploração de diferentes tipos de paralelismo das aplicações por meio do uso de múltiplos processadores softcore, bem como a aceleração de funções críticas por meio de implementações em hardware. Também foi demonstrada sua viabilidade no atendimento a um sistema contendo quatro câmeras de vídeo. / Among the several Advanced Driver Assistance (ADAS) technologies that have been added to modern vehicles are pedestrian detection systems. Those systems use sensors, such as radars, lasers, and video cameras to capture information from the environment and avoid collision with people in the context of traffic. Video cameras have become as a great option for such systems because of the relatively low cost and all of information they are able to capture from the environment. Many techniques for vison-based pedestrian detection have appeared in the last years, having as characteristic the necessity of a great computational power so that image can be processed in real time, in a robust and reliable way, and with low error rate. In addition, systems that implement these techniques require low power consumption, so they can operate in an embedded environment such as automobiles. A trend of these systems is the processing of images from multiple cameras mounted in vehicles, so that the system can detect potential collision hazards around the vehicle. In this context, this work addresses the hardware and software codesign of an architecture for pedestrian detection, considering the presence of four cameras in a vehicle (one in the front, one in the rear and two in the sides). For this purpose, the flexibility of FPGA devices is used for design space exploration and the construction of an architecture that provides the necessary performance, energy consumption at appropriate levels and also allows adaptation to new scenarios and evolution of pedestrian detection techniques through programmability. The development of the architecture was based on two algorithms widely used for pedestrian detection, which are Histogram of Oriented Gradients (HOG) and Integral Channel Features (ICF). Both introduce techniques that serve as the basis for modern detection algorithms. The implemented architecture allowed the exploration of different types of parallelism through the use of multiple softcore processors, as well as the acceleration of critical functions through implementations in hardware. It has also been demonstrated its feasibility in attending to a system containing four video cameras.
24

Co-Projeto de hardware/software para correlação de imagens / Hardware/software co-design for imge cross-correlation

Dias, Maurício Acconcia 26 July 2011 (has links)
Este trabalho de pesquisa tem por objetivo o desenvolvimento de um coprojeto de hardware/software para o algoritmo de correlação de imagens visando atingir um ganho de desempenho com relação à implementação totalmente em software. O trabalho apresenta um comparativo entre um conjunto bastante amplo e significativo de configurações diferentes do soft-processor Nios II implementadas em FPGA, inclusive com a adição de novas instruções dedicadas. O desenvolvimento do co-projeto foi feito com base em uma modificação do método baseado em profiling adicionando-se um ciclo de desenvolvimento e de otimização de software. A comparação foi feita com relação ao tempo de execução para medir o speedup alcançado durante o desenvolvimento do co-projeto que atingiu um ganho de desempenho significativo. Também analisou-se a influência de estruturas de hardware básicas e dedicadas no tempo de execução final do algoritmo. A análise dos resultados sugere que o método se mostrou eficiente considerando o speedup atingido, porém o tempo total de execução ainda ficou acima do esperado, considerando-se a necessidade de execução e processamento de imagens em tempo real dos sistemas de navegação robótica. No entanto, destaca-se que as limitações de processamento em tempo real estão também ligadas as restrições de desempenho impostas pelo hardware adotado no projeto, baseado em uma FPGA de baixo custo e capacidade média / This work presents a FPGA based hardware/software co-design for image normalized cross correlation algorithm. The main goal is to achieve a significant speedup related to the execution time of the all-software implementation. The co-design proposed method is a modified profiling-based method with a software development step. The executions were compared related to execution time resulting on a significant speedup. To achieve this speedup a comparison between 21 different configurations of Nios II soft-processor was done. Also hardware influence on execution time was evaluated to know how simple hardware structures and specific hardware structures influence algorithm final execution time. Result analysis suggest that the method is very efficient considering achieved speedup but the final execution time still remains higher, considering the need for real time image processing on robotic navigation systems. However, the limitations for real time processing are a consequence of the hardware adopted in this work, based on a low cost and capacity FPGA
25

Co-Projeto de hardware/software para correlação de imagens / Hardware/software co-design for imge cross-correlation

Maurício Acconcia Dias 26 July 2011 (has links)
Este trabalho de pesquisa tem por objetivo o desenvolvimento de um coprojeto de hardware/software para o algoritmo de correlação de imagens visando atingir um ganho de desempenho com relação à implementação totalmente em software. O trabalho apresenta um comparativo entre um conjunto bastante amplo e significativo de configurações diferentes do soft-processor Nios II implementadas em FPGA, inclusive com a adição de novas instruções dedicadas. O desenvolvimento do co-projeto foi feito com base em uma modificação do método baseado em profiling adicionando-se um ciclo de desenvolvimento e de otimização de software. A comparação foi feita com relação ao tempo de execução para medir o speedup alcançado durante o desenvolvimento do co-projeto que atingiu um ganho de desempenho significativo. Também analisou-se a influência de estruturas de hardware básicas e dedicadas no tempo de execução final do algoritmo. A análise dos resultados sugere que o método se mostrou eficiente considerando o speedup atingido, porém o tempo total de execução ainda ficou acima do esperado, considerando-se a necessidade de execução e processamento de imagens em tempo real dos sistemas de navegação robótica. No entanto, destaca-se que as limitações de processamento em tempo real estão também ligadas as restrições de desempenho impostas pelo hardware adotado no projeto, baseado em uma FPGA de baixo custo e capacidade média / This work presents a FPGA based hardware/software co-design for image normalized cross correlation algorithm. The main goal is to achieve a significant speedup related to the execution time of the all-software implementation. The co-design proposed method is a modified profiling-based method with a software development step. The executions were compared related to execution time resulting on a significant speedup. To achieve this speedup a comparison between 21 different configurations of Nios II soft-processor was done. Also hardware influence on execution time was evaluated to know how simple hardware structures and specific hardware structures influence algorithm final execution time. Result analysis suggest that the method is very efficient considering achieved speedup but the final execution time still remains higher, considering the need for real time image processing on robotic navigation systems. However, the limitations for real time processing are a consequence of the hardware adopted in this work, based on a low cost and capacity FPGA
26

Entwurf, Methoden und Werkzeuge für komplexe Bildverarbeitungssysteme auf Rekonfigurierbaren System-on-Chip-Architekturen / Design, methodologies and tools for complex image processing systems on reconfigurable system-on-chip-architectures

Mühlbauer, Felix January 2011 (has links)
Bildverarbeitungsanwendungen stellen besondere Ansprüche an das ausführende Rechensystem. Einerseits ist eine hohe Rechenleistung erforderlich. Andererseits ist eine hohe Flexibilität von Vorteil, da die Entwicklung tendentiell ein experimenteller und interaktiver Prozess ist. Für neue Anwendungen tendieren Entwickler dazu, eine Rechenarchitektur zu wählen, die sie gut kennen, anstatt eine Architektur einzusetzen, die am besten zur Anwendung passt. Bildverarbeitungsalgorithmen sind inhärent parallel, doch herkömmliche bildverarbeitende eingebettete Systeme basieren meist auf sequentiell arbeitenden Prozessoren. Im Gegensatz zu dieser "Unstimmigkeit" können hocheffiziente Systeme aus einer gezielten Synergie aus Software- und Hardwarekomponenten aufgebaut werden. Die Konstruktion solcher System ist jedoch komplex und viele Lösungen, wie zum Beispiel grobgranulare Architekturen oder anwendungsspezifische Programmiersprachen, sind oft zu akademisch für einen Einsatz in der Wirtschaft. Die vorliegende Arbeit soll ein Beitrag dazu leisten, die Komplexität von Hardware-Software-Systemen zu reduzieren und damit die Entwicklung hochperformanter on-Chip-Systeme im Bereich Bildverarbeitung zu vereinfachen und wirtschaftlicher zu machen. Dabei wurde Wert darauf gelegt, den Aufwand für Einarbeitung, Entwicklung als auch Erweiterungen gering zu halten. Es wurde ein Entwurfsfluss konzipiert und umgesetzt, welcher es dem Softwareentwickler ermöglicht, Berechnungen durch Hardwarekomponenten zu beschleunigen und das zu Grunde liegende eingebettete System komplett zu prototypisieren. Hierbei werden komplexe Bildverarbeitungsanwendungen betrachtet, welche ein Betriebssystem erfordern, wie zum Beispiel verteilte Kamerasensornetzwerke. Die eingesetzte Software basiert auf Linux und der Bildverarbeitungsbibliothek OpenCV. Die Verteilung der Berechnungen auf Software- und Hardwarekomponenten und die daraus resultierende Ablaufplanung und Generierung der Rechenarchitektur erfolgt automatisch. Mittels einer auf der Antwortmengenprogrammierung basierten Entwurfsraumexploration ergeben sich Vorteile bei der Modellierung und Erweiterung. Die Systemsoftware wird mit OpenEmbedded/Bitbake synthetisiert und die erzeugten on-Chip-Architekturen auf FPGAs realisiert. / Image processing applications have special requirements to the executing computational system. On the one hand a high computational power is necessary. On the other hand a high flexibility is an advantage because the development tends to be an experimental and interactive process. For new applications the developer tend to choose a computational architecture which they know well instead of using that one which fits best to the application. Image processing algorithms are inherently parallel while common image processing systems are mostly based on sequentially operating processors. In contrast to this "mismatch", highly efficient systems can be setup of a directed synergy of software and hardware components. However, the construction of such systems is complex and lots of solutions, like gross-grained architectures or application specific programming languages, are often too academic for the usage in commerce. The present work should contribute to reduce the complexity of hardware-software-systems and thus increase the economy of and simplify the development of high-performance on-chip systems in the domain of image processing. In doing so, a value was set on keeping the effort low on making familiar to the topic, on development and also extensions. A design flow was developed and implemented which allows the software developer to accelerate calculations with hardware components and to prototype the whole embedded system. Here complex image processing systems, like distributed camera sensor networks, are examined which need an operating system. The used software is based upon Linux and the image processing library OpenCV. The distribution of the calculations to software and hardware components and the resulting scheduling and generation of architectures is done automatically. The design space exploration is based on answer set programming which involves advantages for modelling in terms of simplicity and extensions. The software is synthesized with the help of OpenEmbedded/Bitbake and the generated on-chip architectures are implemented on FPGAs.
27

Arquitetura multi-core reconfigurável para detecção de pedestres baseada em visão / Reconfigurable Multi-core Architecture for Vision-based Pedestrian Detection

Jose Arnaldo Mascagni de Holanda 17 May 2017 (has links)
Dentre as diversas tecnologias de Assistência Avançada ao Condutor (ADAS) que têm sido adicionadas aos automóveis modernos estão os sistemas de detecção de pedestres. Tais sistemas utilizam sensores, como radares, lasers e câmeras de vídeo para captar informações do ambiente e evitar a colisão com pessoas no contexto do trânsito. Câmeras de vídeo têm se apresentado como um ótima opção para esses sistemas, devido ao relativo baixo custo e à riqueza de informações que capturam do ambiente. Muitas técnicas para detecção de pedestres baseadas em visão têm surgido nos últimos anos, tendo como característica a necessidade de um grande poder computacional para que se possa realizar o processamento das imagens em tempo real, de forma robusta, confiável e com baixa taxa de erros. Além disso, é necessário que sistemas que implementem essas técnicas tenham baixo consumo de energia, para que possam funcionar em um ambiente embarcado, como os automóveis. Uma tendência desses sistemas é o processamento de imagens de múltiplas câmeras presentes no veículo, de forma que o sistema consiga perceber potenciais perigos de colisão ao redor do veículo. Neste contexto, este trabalho aborda o coprojeto de hardware e software de uma arquitetura para detecção de pedestres, considerando a presença de quatro câmeras em um veículo (uma frontal, uma traseira e duas laterais). Com este propósito, utiliza-se a flexibilidade dos dispositivos FPGA para a exploração do espaço de projeto e a construção de uma arquitetura que forneça o desempenho necessário, o consumo de energia em níveis adequados e que também permita a adaptação a novos cenários e a evolução das técnicas de detecção de pedestres por meio da programabilidade. O desenvolvimento da arquitetura baseouse em dois algoritmos amplamente utilizados para detecção de pedestres, que são o Histogram of Oriented Gradients (HOG) e o Integral Channel Features (ICF). Ambos introduzem técnicas que servem como base para os algoritmos de detecção modernos. A arquitetura implementada permitiu a exploração de diferentes tipos de paralelismo das aplicações por meio do uso de múltiplos processadores softcore, bem como a aceleração de funções críticas por meio de implementações em hardware. Também foi demonstrada sua viabilidade no atendimento a um sistema contendo quatro câmeras de vídeo. / Among the several Advanced Driver Assistance (ADAS) technologies that have been added to modern vehicles are pedestrian detection systems. Those systems use sensors, such as radars, lasers, and video cameras to capture information from the environment and avoid collision with people in the context of traffic. Video cameras have become as a great option for such systems because of the relatively low cost and all of information they are able to capture from the environment. Many techniques for vison-based pedestrian detection have appeared in the last years, having as characteristic the necessity of a great computational power so that image can be processed in real time, in a robust and reliable way, and with low error rate. In addition, systems that implement these techniques require low power consumption, so they can operate in an embedded environment such as automobiles. A trend of these systems is the processing of images from multiple cameras mounted in vehicles, so that the system can detect potential collision hazards around the vehicle. In this context, this work addresses the hardware and software codesign of an architecture for pedestrian detection, considering the presence of four cameras in a vehicle (one in the front, one in the rear and two in the sides). For this purpose, the flexibility of FPGA devices is used for design space exploration and the construction of an architecture that provides the necessary performance, energy consumption at appropriate levels and also allows adaptation to new scenarios and evolution of pedestrian detection techniques through programmability. The development of the architecture was based on two algorithms widely used for pedestrian detection, which are Histogram of Oriented Gradients (HOG) and Integral Channel Features (ICF). Both introduce techniques that serve as the basis for modern detection algorithms. The implemented architecture allowed the exploration of different types of parallelism through the use of multiple softcore processors, as well as the acceleration of critical functions through implementations in hardware. It has also been demonstrated its feasibility in attending to a system containing four video cameras.
28

Definition and evaluation of spatio-temporal scheduling strategies for 3D multi-core heterogeneous architectures / Définition et évaluation des stratégies d’ordonnancement spatio-temporel pour les architectures 3D multicore hétérogènes

Khuat, Quang Hai 16 March 2015 (has links)
Empilant une couche multiprocesseur (MPSoC) et une couche de FPGA pour former un système sur puce reconfigurable en trois dimension (3DRSoC), est une solution prometteuse donnant un niveau de flexibilité élevé en adaptant l'architecture aux applications visées. Pour une application exécutée sur ce système, l'un des principaux défis vient de la gestion de haut niveau des tâches. Cette gestion est effectuée par le service d'ordonnancement du système d'exploitation et elle doit être en mesure de déterminer, lors de l'exécution de l'application, quelle tâche est exécutée logiciellement et/ou matériellement, quand (dimension temporelle) et sur quelles ressources (dimension spatiale, c'est à dire sur quel processeur ou quelle région du FPGA) pour atteindre la haute performance du système. Dans cette thèse, nous proposons des stratégies d'ordonnancement spatio-temporel pour les architectures 3DRSoCs. La première stratégie décide la nécessité de placer une tâche matérielle et une tâche logicielle en face-à-face afin que le coût de la communication entre tâches soit minimisé. La deuxième stratégie vise à minimiser le temps d'exécution globale de l'application. Cette stratégie exploits la présence de processeurs de la couche MPSoC afin d'anticiper, en temps-réel, l'exécution d'une tâche logicielle quand sa version matérielle ne peut pas être allouée sur le FPGA. Ensuite, un outil de simulation graphique a été développé pour vérifier le bon fonctionnement des stratégies développées et aussi nous permettre de produire des résultats. / Stacking a multiprocessor (MPSoC) layer and a FPGA layer to form a 3D Reconfigurable System-on- Chip (3DRSoC) is a promising solution giving a high flexibility level in adapting the architecture to the targeted application. For an application defined as a graph of parallel tasks running on the 3DRSoC system, one of the main challenges comes from the high-level management of tasks. This management is done by the scheduling service of the Operating System and it must be able to determine, on the fly, what task should be run in software and/or hardware, when (temporal dimension) and where (spatial dimension, i.e. on what processor or what area of the FPGA) in order to achieve high performance of the system. In this thesis, we propose online spatio-temporal scheduling strategies for 3DRSoCs. The first strategy decides, during the task scheduling, the need for a SW task and a HW task to communicate in face-to-face so that the communication cost between tasks is minimized. The second strategy aims at minimizing the overall execution time of the application. It exploits the presence of processors in the MPSoC layer in order to anticipate, at run-time, the SW execution of a task when its HW version cannot be allocated to the FPGA. Then, a graphical simulation tool has been developed to verify the proper functioning of the developed strategies and also enable us to produce results.
29

An Application-Specific Instruction Set for Accelerating Set-Oriented Database Primitives

Arnold, Oliver, Haas, Sebastian, Fettweis, Gerhard, Schlegel, Benjamin, Kissinger, Thomas, Lehner, Wolfgang 13 June 2022 (has links)
The key task of database systems is to efficiently manage large amounts of data. A high query throughput and a low query latency are essential for the success of a database system. Lately, research focused on exploiting hardware features like superscalar execution units, SIMD, or multiple cores to speed up processing. Apart from these software optimizations for given hardware, even tailor-made processing circuits running on FPGAs are built to run mostly stateless query plans with incredibly high throughput. A similar idea, which was already considered three decades ago, is to build tailor-made hardware like a database processor. Despite their superior performance, such application-specific processors were not considered to be beneficial because general-purpose processors eventually always caught up so that the high development costs did not pay off. In this paper, we show that the development of a database processor is much more feasible nowadays through the availability of customizable processors. We illustrate exemplarily how to create an instruction set extension for set-oriented database rimitives. The resulting application-specific processor provides not only a high performance but it also enables very energy-efficient processing. Our processor requires in various configurations more than 960x less energy than a high-end x86 processor while providing the same performance.
30

A New System Architecture for Heterogeneous Compute Units

Asmussen, Nils 09 August 2019 (has links)
The ongoing trend to more heterogeneous systems forces us to rethink the design of systems. In this work, I study a new system design that considers heterogeneous compute units (general-purpose cores with different instruction sets, DSPs, FPGAs, fixed-function accelerators, etc.) from the beginning instead of as an afterthought. The goal is to treat all compute units (CUs) as first-class citizens, enabling (1) isolation and secure communication between all types of CUs, (2) a direct interaction of all CUs, removing the conventional CPU from the critical path, and (3) access to operating system (OS) services such as file systems and network stacks for all CUs. To study this system design, I am using a hardware/software co-design based on two key ideas: 1) introduce a new hardware component next to each CU used by the OS as the CUs' common interface and 2) let the OS kernel control applications remotely from a different CU. The hardware component is called data transfer unit (DTU) and offers the minimal set of features to reach the stated goals: secure message passing and memory access. The OS is called M³ and runs its kernel on a dedicated CU and runs the OS services and applications on the remaining CUs. The kernel is responsible for establishing DTU-based communication channels between services and applications. After a channel has been set up, services and applications communicate directly without involving the kernel. This approach allows to support arbitrary CUs as aforementioned first-class citizens, ranging from fixed-function accelerators to complex general-purpose cores.

Page generated in 0.0597 seconds