Spelling suggestions: "subject:"hardware/software c.design."" "subject:"hardware/software candesign.""
11 |
High-Level Synthesis of Software Function CallsTOMIYAMA, Hiroyuki, KANBARA, Hiroyuki, ISHIMORI, Yoshiyuki, ISHIURA, Nagisa, NISHIMURA, Masanari 01 December 2008 (has links)
No description available.
|
12 |
A Multiprocessor Platform Based on FPGA Technology Targeted for a Driver Vigilance Monitoring DeviceMoussa, Wafik January 2009 (has links)
Medical devices processing images or audio or executing complex AI algorithms are able to run more efficiently and meet real time requirements if the parallelism in those algorithms is exploited. In this research a methodology is proposed to exploit the flexibility and short design cycle of FPGAs (Field Programmable Gate Arrays) in order to achieve this target. Hardware/software co-design and dynamic partitioning allow the optimization of the multiprocessor platform design parameters and software code targeting each core to meet real time constraints. This is practically demonstrated by building a real life driver vigilance monitoring system based on visual cues extraction and evaluation. The application drives the whole design process to prove its effectiveness. An algorithm was built to achieve the goal of detecting the eye state of the driver (open or closed) and it is applied on captured consecutive frames to evaluate the vigilance state of the driver. Vigilance state is measured depending on duration of eye closure. This video processing application is then targeted to run on a multi-core FPGA based processing platform using the proposed methodology.
Results obtained were very good using the Grimace Face Database and when operating the system on one’s face. On operating the device, a false positive of eye closure must take place two consecutive times in order to get an alarm, which decreases the probability of failure. The timing analysis applied proved the importance of using the concept of parallelism to achieve performance constraints. FPGA technology proved to be a very powerful prototyping tool for complex multiprocessor systems design. The flexible FPGA technology coupled with hardware/software co-design provided means to explore the design space and reach decisions that satisfy the design constraints with minimum time investment and cost.
|
13 |
A Multiprocessor Platform Based on FPGA Technology Targeted for a Driver Vigilance Monitoring DeviceMoussa, Wafik January 2009 (has links)
Medical devices processing images or audio or executing complex AI algorithms are able to run more efficiently and meet real time requirements if the parallelism in those algorithms is exploited. In this research a methodology is proposed to exploit the flexibility and short design cycle of FPGAs (Field Programmable Gate Arrays) in order to achieve this target. Hardware/software co-design and dynamic partitioning allow the optimization of the multiprocessor platform design parameters and software code targeting each core to meet real time constraints. This is practically demonstrated by building a real life driver vigilance monitoring system based on visual cues extraction and evaluation. The application drives the whole design process to prove its effectiveness. An algorithm was built to achieve the goal of detecting the eye state of the driver (open or closed) and it is applied on captured consecutive frames to evaluate the vigilance state of the driver. Vigilance state is measured depending on duration of eye closure. This video processing application is then targeted to run on a multi-core FPGA based processing platform using the proposed methodology.
Results obtained were very good using the Grimace Face Database and when operating the system on one’s face. On operating the device, a false positive of eye closure must take place two consecutive times in order to get an alarm, which decreases the probability of failure. The timing analysis applied proved the importance of using the concept of parallelism to achieve performance constraints. FPGA technology proved to be a very powerful prototyping tool for complex multiprocessor systems design. The flexible FPGA technology coupled with hardware/software co-design provided means to explore the design space and reach decisions that satisfy the design constraints with minimum time investment and cost.
|
14 |
System Prototyping of H.264/AVC Video Decoder on SoC Development PlatformKuan, Yi-Sheng 06 September 2005 (has links)
For the next generation of multimedia applications such as digital video broadcasting, multimedia message service and video conference, enormous amounts of video context will be transmitted and exchanged through the wireless channel. Due to the limited communication bandwidth, how to achieve more efficient, reliable, and robust video compression is a very important issue. H.264/AVC (Advanced Video Coding) is one of the latest video coding standards, which is anticipated to be adopted in many future application systems due to its excellent compression efficiency. In this thesis, the implementation issue of the H.264 decoding algorithm on the SOC (System-On-Chip) development platform is addressed. Several key modules of H.264 decoders including color space converter, inter-interpolation, transformation rescale modules are all realized by dedicated hardware architectures. A novel low-cost fast scalable deblocking filter based on single-port memory architecture is also proposed which can support fast real-time deblocking filtering process. The entire H.264 decoder system is prototyped on the Altera SOPC platform, and the decoding result is displayed directly on the monitor. All the hardware modules are hooked on the system Avalon bus, and interact with Altera NIOS-¢º processor. Through the hardware/software co-design approach, the decoding speed can be increase by a factor of 1.9.
|
15 |
Implementation Of An 8-bit Microcontroller With System CKesen, Lokman 01 November 2004 (has links) (PDF)
In this thesis, an 8-bit microcontroller, 8051 core, is implemented using SystemC
programming language. SystemC is a new generation co-design language which is
capable of both programming software and describing hardware parts of a
complete system. The benefit of this design environment appears while developing
a System-on-Chip (SoC), that is a system consisting both custom hardware parts
and embedded software parts. SystemC is not a completely new language, but
based on C++ with some additional class libraries and extensions to handle
hardware related concepts such as signals, multi-valued logic, clock and delay
elements. 8051 is an 8 bit microcontroller which is widely used in industry for many
years. The 8051 core is still being used as the main controller in today&rsquo / s highly
complex chips, such as communication and bus controllers. During the
development cycles of a System-on-Chip, instead of using separate design environments for hardware and software parts, the usage of a unified co-design
environment provides a better design and simulation methodology which also
decreases the number of iterations at hardware software integration. In this work,
an 8-bit 8051 microcontroller core and external memory modules are developed
using SystemC that can be re-used in future designs to achieve more complex
System-on-Chip&rsquo / s. During the development of the 8051 core, simulation results are
analyzed at each step to verify the design from the very beginning of the work,
which makes the design processes more structured and controlled and faster as a
result.
|
16 |
Um ambiente para geração automática de biblioteca de componentes de comunicação em sistemas embarcados distribuídosDÓRIA, Valnor Calheiros January 2003 (has links)
Made available in DSpace on 2014-06-12T15:58:54Z (GMT). No. of bitstreams: 1
license.txt: 1748 bytes, checksum: 8a4605be74aa9ea9d79846c1fba20a33 (MD5)
Previous issue date: 2003 / Hardware/software co-design é uma metodologia utilizada para o desenvolvimento de
sistemas digitais compostos por componentes de software e por componentes de hardware,
que possibilita obter um drástico ganho de produtividade no desenvolvimento de tais sistemas.
Este ganho de produtividade pode ser utilizado na exploração de diversas alternativas de
solução, a fim de se conseguir melhorar a qualidade e reduzir o custo do projeto final. Com o
recente crescimento da utilização de sistemas embarcados distribuídos, os projetistas têm cada
vez mais utilizado ambientes de hardware/software co-design que suportem esta categoria de
projetos.
O co-design de sistemas embarcados distribuídos é uma tarefa ainda mais desafiadora,
pois cada fase da metodologia tem que considerar as restrições físicas impostas pelas
características distribuídas destes sistemas. Um dos desafios do co-design de sistemas
embarcados distribuídos está na geração de comunicação entre processos alocados em
diferentes sistemas embarcados. Trata-se de uma tarefa tediosa, propícia a erros e que
consome bastante tempo quando não é realizada automaticamente, pois, a cada nova situação
a ser analisada, a ausência de uma ferramenta de auxílio ao projeto força o projetista do
sistema a refazer todos os parâmetros que são dependentes da aplicação e customizar os
subsistemas de comunicação de maneira a refletir a nova arquitetura a ser analisada.
O objetivo principal desse trabalho foi o desenvolvimento de um ambiente que gere
automaticamente uma biblioteca de componentes de comunicação para sistemas embarcados
que estão distribuídos. O sistema deve suportar projetos de diferentes escalas e com topologia
qualquer. Para tanto, foi definido um modelo de comunicação, foi proposta uma arquitetura de
rede para a qual o sistema deve gerar os componentes de comunicação e foi realizado o
desenvolvimento de uma biblioteca de componentes de comunicação com especificações de
implementação em hardware e em software, que suportam inclusive comunicação através da
Internet. Como resultado do trabalho, foi implementado um sistema de geração automática de
componentes de comunicação, GCCom, que oferece suporte ao desenvolvimento de projetos
de sistemas embarcados distribuídos
|
17 |
Example Modules for Hardware-software Co-designBappudi, Bhargav 20 October 2016 (has links)
No description available.
|
18 |
Research and Design of Neural Processing Architectures Optimized for Embedded ApplicationsWu, Binyi 28 May 2024 (has links)
Der Einsatz von neuronalen Netzen in Edge-Geräten und deren Einbindung in unser tägliches Leben findet immer mehr Beachtung. Ihre hohen Rechenkosten machen jedoch viele eingebettete Anwendungen zu einer Herausforderung. Das Hauptziel meiner Doktorarbeit ist es, einen Beitrag zur Lösung dieses Dilemmas zu leisten: die Optimierung neuronaler Netze und die Entwicklung entsprechender neuronaler Verarbeitungseinheiten für Endgeräte. Diese Arbeit nahm die algorithmische Forschung als Ausgangspunkt und wandte dann deren Ergebnisse an, um das Architekturdesign von Neural Processing Units (NPUs) zu verbessern. Die Optimierung neuronaler Netzwerkmodelle begann mit der Quantisierung neuronaler Netzwerke mit einfacher Präzision und entwickelte sich zu gemischter Präzision. Die Entwicklung der NPU-Architektur folgte den Erkenntnissen der Algorithmusforschung, um ein Hardware/Software Co-Design zu erreichen. Darüber hinaus wurde ein neuartiger Ansatz zur gemeinsamen Entwicklung von Hardware und Software vorgeschlagen, um das Prototyping und die Leistungsbewertung von NPUs zu beschleunigen. Dieser Ansatz zielt auf die frühe Entwicklungsphase ab. Er hilft Entwicklern, sich auf das Design und die Optimierung von NPUs zu konzentrieren und verkürzt den Entwicklungszyklus erheblich. Im Abschlussprojekt wurde ein auf maschinellem Lernen basierender Ansatz angewendet, um die Rechen- und Speicherressourcen der NPU zu erkunden and optimieren. Die gesamte Arbeit umfasst mehrere verschiedene Bereiche, von der Algorithmusforschung bis zum Hardwaredesign. Sie alle arbeiten jedoch an der Verbesserung der Inferenz-Effizienz neuronaler Netze. Die Optimierung der Algorithmen zielt insbesondere darauf ab, den Speicherbedarf und die Rechenkosten von neuronalen Netzen zu verringern. Das NPU-Design hingegen konzentriert sich auf die Verbesserung der Nutzung von Hardwareressourcen. Der vorgeschlagene Ansatz zur gemeinsamen Entwicklung von Software und Hardware verkürzt den Entwurfszyklus und beschleunigt die Entwurfsiterationen. Die oben dargestellte Reihenfolge entspricht dem Aufbau dieser Dissertation. Jedes Kapitel ist einem Thema gewidmet und umfasst relevante Forschungsarbeiten, Methodik und Versuchsergebnisse.:1 Introduction
2 Convolutional Neural Networks
2.1 Convolutional layer
2.1.1 Padding
2.1.2 Convolution
2.1.3 Batch Normalization
2.1.4 Nonlinearity
2.2 Pooling Layer
2.3 Fully Connected Layer
2.4 Characterization
2.4.1 Composition of Operations and Parameters
2.4.2 Arithmetic Intensity
2.5 Optimization
3 Quantization with Double-Stage Squeeze-and-Threshold 19
3.1 Overview
3.1.1 Binarization
3.1.2 Multi-bit Quantization
3.2 Quantization of Convolutional Neural Networks
3.2.1 Quantization Scheme
3.2.2 Operator fusion of Conv2D
3.3 Activation Quantization with Squeeze-and-Threshold
3.3.1 Double-Stage Squeeze-and-Threshold
3.3.2 Inference Optimization
3.4 Experiment
3.4.1 Ablation Study of Squeeze-and-Threshold
3.4.2 Comparison with State-of-the-art Methods
3.5 Summary
4 Low-Precision Neural Architecture Search 39
4.1 Overview
4.2 Differentiable Architecture Search
4.2.1 Gumbel Softmax
4.2.2 Disadvantage and Solution
4.3 Low-Precision Differentiable Architecture Search
4.3.1 Convolution Sharing
4.3.2 Forward-and-Backward Scaling
4.3.3 Power Estimation
4.3.4 Architecture of Supernet
4.4 Experiment
4.4.1 Effectiveness of solutions to the dominance problem
4.4.2 Softmax and Gumbel Softmax
4.4.3 Optimizer and Inverted Learning Rate Scheduler
4.4.4 NAS Method Evaluation
4.4.5 Searched Model Analysis
4.4.6 NAS Cost Analysis
4.4.7 NAS Training Analysis
4.5 Summary
5 Configurable Sparse Neural Processing Unit 65
5.1 Overview
5.2 NPU Architecture
5.2.1 Buffer
5.2.2 Reshapeable Mixed-Precision MAC Array
5.2.3 Sparsity
5.2.4 Post Process Unit
5.3 Mapping
5.3.1 Mixed-Precision MAC
5.3.2 MAC Array
5.3.3 Support of Other Operation
5.3.4 Configurability
5.4 Experiment
5.4.1 Performance Analysis of Runtime Configuration
5.4.2 Roofline Performance Analysis
5.4.3 Mixed-Precision
5.4.4 Comparison with Cortex-M7
5.5 Summary
6 Agile Development and Rapid Design Space Exploration 91
6.1 Overview
6.1.1 Agile Development
6.1.2 Design Space Exploration
6.2 Agile Development Infrastructure
6.2.1 Chisel Backend
6.2.2 NPU Software Stack
6.3 Modeling and Exploration
6.3.1 Area Modeling
6.3.2 Performance Modeling
6.3.3 Layered Exploration Framework
6.4 Experiment
6.4.1 Efficiency of Agile Development Infrastructure
6.4.2 Effectiveness of Agile Development Infrastructure
6.4.3 Area Modeling
6.4.4 Performance Modeling
6.4.5 Rapid Exploration and Pareto Front
6.5 Summary
7 Summary and Outlook 123
7.1 Summary
7.2 Outlook
A Appendix of Double-Stage ST Quantization 127
A.1 Training setting of ResNet-18 in Table 3.3
A.2 Training setting of ReActNet in Table 3.4
A.3 Training setting of ResNet-18 in Table 3.4
A.4 Pseudocode Implementation of Double-Stage ST
B Appendix of Low-Precision Neural Architecture Search 131
B.1 Low-Precision NAS on CIFAR-10
B.2 Low-Precision NAS on Tiny-ImageNet
B.3 Low-Precision NAS on ImageNet
Bibliography 137 / Deploying neural networks on edge devices and bringing them into our daily lives is attracting more and more attention. However, its expensive computational cost makes many embedded applications daunting. The primary objective of my doctoral studies is to make contributions towards resolving this predicament: optimizing neural networks and designing corresponding efficient neural processing units for edge devices. This work took algorithmic research, specifically the optimization of deep neural networks, as a starting point and then applied its findings to steer the architecture design of Neural Processing Units (NPUs). The optimization of neural network models started with single precision neural network quantization and progressed to mixed precision. The NPU architecture development followed the algorithmic research findings to achieve hardware/software co-design. Furthermore, a new approach to hardware and software co-development was introduced, aimed at expediting the prototyping and performance assessment of NPUs. This approach targets early-stage development. It helps developers to focus on the design and optimization of NPUs and significantly shortens the development cycle. In the final project, a machine learning-based approach was applied to explore and optimize the computational and memory resources of the NPU. The entire work covers several different areas, from algorithmic research to hardware design. But they all work on improving the inference efficiency of neural networks. Specifically, algorithm optimization aims to reduce the memory footprint and computational cost of neural networks. The NPU design, on the other hand, focuses on improving the utilization of hardware resources. The proposed software and hardware co-development approach shortens the design cycle and speeds up the design iteration. The order presented above corresponds to the structure of this dissertation. Each chapter corresponds to a topic and covers relevant research, methodology, and experimental results.:1 Introduction
2 Convolutional Neural Networks
2.1 Convolutional layer
2.1.1 Padding
2.1.2 Convolution
2.1.3 Batch Normalization
2.1.4 Nonlinearity
2.2 Pooling Layer
2.3 Fully Connected Layer
2.4 Characterization
2.4.1 Composition of Operations and Parameters
2.4.2 Arithmetic Intensity
2.5 Optimization
3 Quantization with Double-Stage Squeeze-and-Threshold 19
3.1 Overview
3.1.1 Binarization
3.1.2 Multi-bit Quantization
3.2 Quantization of Convolutional Neural Networks
3.2.1 Quantization Scheme
3.2.2 Operator fusion of Conv2D
3.3 Activation Quantization with Squeeze-and-Threshold
3.3.1 Double-Stage Squeeze-and-Threshold
3.3.2 Inference Optimization
3.4 Experiment
3.4.1 Ablation Study of Squeeze-and-Threshold
3.4.2 Comparison with State-of-the-art Methods
3.5 Summary
4 Low-Precision Neural Architecture Search 39
4.1 Overview
4.2 Differentiable Architecture Search
4.2.1 Gumbel Softmax
4.2.2 Disadvantage and Solution
4.3 Low-Precision Differentiable Architecture Search
4.3.1 Convolution Sharing
4.3.2 Forward-and-Backward Scaling
4.3.3 Power Estimation
4.3.4 Architecture of Supernet
4.4 Experiment
4.4.1 Effectiveness of solutions to the dominance problem
4.4.2 Softmax and Gumbel Softmax
4.4.3 Optimizer and Inverted Learning Rate Scheduler
4.4.4 NAS Method Evaluation
4.4.5 Searched Model Analysis
4.4.6 NAS Cost Analysis
4.4.7 NAS Training Analysis
4.5 Summary
5 Configurable Sparse Neural Processing Unit 65
5.1 Overview
5.2 NPU Architecture
5.2.1 Buffer
5.2.2 Reshapeable Mixed-Precision MAC Array
5.2.3 Sparsity
5.2.4 Post Process Unit
5.3 Mapping
5.3.1 Mixed-Precision MAC
5.3.2 MAC Array
5.3.3 Support of Other Operation
5.3.4 Configurability
5.4 Experiment
5.4.1 Performance Analysis of Runtime Configuration
5.4.2 Roofline Performance Analysis
5.4.3 Mixed-Precision
5.4.4 Comparison with Cortex-M7
5.5 Summary
6 Agile Development and Rapid Design Space Exploration 91
6.1 Overview
6.1.1 Agile Development
6.1.2 Design Space Exploration
6.2 Agile Development Infrastructure
6.2.1 Chisel Backend
6.2.2 NPU Software Stack
6.3 Modeling and Exploration
6.3.1 Area Modeling
6.3.2 Performance Modeling
6.3.3 Layered Exploration Framework
6.4 Experiment
6.4.1 Efficiency of Agile Development Infrastructure
6.4.2 Effectiveness of Agile Development Infrastructure
6.4.3 Area Modeling
6.4.4 Performance Modeling
6.4.5 Rapid Exploration and Pareto Front
6.5 Summary
7 Summary and Outlook 123
7.1 Summary
7.2 Outlook
A Appendix of Double-Stage ST Quantization 127
A.1 Training setting of ResNet-18 in Table 3.3
A.2 Training setting of ReActNet in Table 3.4
A.3 Training setting of ResNet-18 in Table 3.4
A.4 Pseudocode Implementation of Double-Stage ST
B Appendix of Low-Precision Neural Architecture Search 131
B.1 Low-Precision NAS on CIFAR-10
B.2 Low-Precision NAS on Tiny-ImageNet
B.3 Low-Precision NAS on ImageNet
Bibliography 137
|
19 |
Embedded electronic systems driven by run-time reconfigurable hardwareFons Lluís, Francisco 29 May 2012 (has links)
Abstract
This doctoral thesis addresses the design of embedded electronic systems based on run-time reconfigurable hardware technology –available through SRAM-based FPGA/SoC devices– aimed at contributing to enhance the life quality of the human beings. This work does research on the conception of the system architecture and the reconfiguration engine that provides to the FPGA the capability of dynamic partial reconfiguration in order to synthesize, by means of hardware/software co-design, a given application partitioned in processing tasks which are multiplexed in time and space, optimizing thus its physical implementation –silicon area, processing time, complexity, flexibility, functional density, cost and power consumption– in comparison with other alternatives based on static hardware (MCU, DSP, GPU, ASSP, ASIC, etc.). The design flow of such technology is evaluated through the prototyping of several engineering applications (control systems, mathematical coprocessors, complex image processors, etc.), showing a high enough level of maturity for its exploitation in the industry. / Resumen
Esta tesis doctoral abarca el diseño de sistemas electrónicos embebidos basados en tecnología hardware dinámicamente reconfigurable –disponible a través de dispositivos lógicos programables SRAM FPGA/SoC– que contribuyan a la mejora de la calidad de vida de la sociedad. Se investiga la arquitectura del sistema y del motor de reconfiguración que proporcione a la FPGA la capacidad de reconfiguración dinámica parcial de sus recursos programables, con objeto de sintetizar, mediante codiseño hardware/software, una determinada aplicación particionada en tareas multiplexadas en tiempo y en espacio, optimizando así su implementación física –área de silicio, tiempo de procesado, complejidad, flexibilidad, densidad funcional, coste y potencia disipada– comparada con otras alternativas basadas en hardware estático (MCU, DSP, GPU, ASSP, ASIC, etc.). Se evalúa el flujo de diseño de dicha tecnología a través del prototipado de varias aplicaciones de ingeniería (sistemas de control, coprocesadores aritméticos, procesadores de imagen, etc.), evidenciando un nivel de madurez viable ya para su explotación en la industria. / Resum
Aquesta tesi doctoral està orientada al disseny de sistemes electrònics empotrats basats en tecnologia hardware dinàmicament reconfigurable –disponible mitjançant dispositius lògics programables SRAM FPGA/SoC– que contribueixin a la millora de la qualitat de vida de la societat. S’investiga l’arquitectura del sistema i del motor de reconfiguració que proporcioni a la FPGA la capacitat de reconfiguració dinàmica parcial dels seus recursos programables, amb l’objectiu de sintetitzar, mitjançant codisseny hardware/software, una determinada aplicació particionada en tasques multiplexades en temps i en espai, optimizant així la seva implementació física –àrea de silici, temps de processat, complexitat, flexibilitat, densitat funcional, cost i potència dissipada– comparada amb altres alternatives basades en hardware estàtic (MCU, DSP, GPU, ASSP, ASIC, etc.). S’evalúa el fluxe de disseny d’aquesta tecnologia a través del prototipat de varies aplicacions d’enginyeria (sistemes de control, coprocessadors aritmètics, processadors d’imatge, etc.), demostrant un nivell de maduresa viable ja per a la seva explotació a la indústria.
|
20 |
Arquitetura de co-projeto hardware/software para implementação de um codificador de vídeo escalável padrão H.264/SVCHusemann, Ronaldo January 2011 (has links)
Visando atuação flexível em redes heterogêneas, modernos sistemas multimídia podem adotar o conceito da codificação escalável, onde o fluxo de vídeo é composto por múltiplas camadas, cada qual complementando e aprimorando gradualmente as características de exibição, de forma adaptativa às capacidades de cada receptor. Atualmente, a especificação H.264/SVC representa o estado da arte da área, por sua eficiência de codificação aprimorada, porém demanda recursos computacionais extremamente elevados. Neste contexto, o presente trabalho apresenta uma arquitetura de projeto colaborativo de hardware e software, que explora as características dos diversos algoritmos internos do codificador H.264/SVC, buscando um adequado balanceamento entre as duas tecnologias (hardware e software) para a implementação prática de um codificador escalável de até 16 camadas em formato de 1920x1080 pixels. A partir de um modelo do código de referência H.264/SVC, refinado para reduzir tempos de codificação, foram definidas estratégias de particionamento de módulos e integração entre entidades de software e hardware, avaliando-se questões como dependência de dados e potencial de paralelismo dos algoritmos, assim como restrições práticas das interfaces de comunicação e acessos à memória. Em hardware foram implementados módulos de transformadas, quantização, filtro anti-blocagem e predição entre camadas, permanecendo em software funções de gerência do sistema, entropia, controle de taxa e interface com usuário. A solução completa obtida, integrando módulos em hardware, sintetizados em uma placa de desenvolvimento, com o software de referência refinado, comprova a validade da proposta, pelos significativos ganhos de desempenho registrados, mostrando-se como uma solução adequada para aplicações que exijam codificação escalável tempo real. / In order to support heterogeneous networks and distinct devices simultaneously, modern multimedia systems can adopt the scalability concept, when the video stream is composed by multiple layers, each one being responsible for gradually enhance the video exhibition quality, according to specific receiver capabilities. Currently the H.264/SVC specification can be considered the state-of-art in this area, by improving the coding efficiency, but, in the other hand, impacting in extremely high computational demands. Based on that, this work presents a hardware/software co-design architecture, which explores the characteristics of H.264/SVC internal algorithms, aiming the right balancing between both technologies (hardware and software) in order to generate a practical scalable encoder implementation, able to process up to 16 layers in 1920x1080 pixels format. Based in an H.264/SVC reference code model, which was refined in order to reduce global encoding time, the approaches for module partitioning and data integration between hardware and software were defined. The proposed methodology took into account characteristics like data dependency and inherent possibility of parallelism, as well practical restrictions like influence of communication interfaces and memory accesses. Particularly, the modules of transforms, quantization, deblocking and inter-layer prediction were implemented in hardware, while the functions of system management, entropy, rate control and user interface were kept in software. The whole solution, which was obtained integrating hardware modules, synthesized in a development board, with the refined H.264/SVC reference code, validates the proposal, by the significant performance gains registered, indicating it as an adequate solution for applications which require real-time video scalable coding.
|
Page generated in 0.0854 seconds