Global ETD Search

1	Parallel video decoding Álvarez Mesa, Mauricio 08 September 2011 (has links) Digital video is a popular technology used in many different applications. The quality of video, expressed in the spatial and temporal resolution, has been increasing continuously in the last years. In order to reduce the bitrate required for its storage and transmission, a new generation of video encoders and decoders (codecs) have been developed. The latest video codec standard, known as H.264/AVC, includes sophisticated compression tools that require more computing resources than any previous video codec. The combination of high quality video and the advanced compression tools found in H.264/AVC has resulted in a significant increase in the computational requirements of video decoding applications. The main objective of this thesis is to provide the performance required for real-time operation of high quality video decoding using programmable architectures. Our solution has been the simultaneous exploitation of multiple levels of parallelism. On the one hand, video decoders have been modified in order to extract as much parallelism as possible. And, on the other hand, general purpose architectures has been enhanced for exploiting the type of parallelism that is present in video codec applications. / El vídeo digital es una tecnología popular utilizada en una gran variedad de aplicaciones. La calidad de vídeo, expresada en la resolución espacial y temporal, ha ido aumentando constantemente en los últimos años. Con el fin de reducir la tasa de bits requerida para su almacenamiento y transmisión, se ha desarrollado una nueva generación de codificadores y decodificadores (códecs) de vídeo. El códec estándar de vídeo más reciente, conocido como H.264/AVC, incluye herramientas sofisticadas de compresión que requieren más recursos de computación que los códecs de vídeo anteriores. El efecto combinado del vídeo de alta calidad y las herramientas de compresión avanzada incluidas en el H.264/AVC han llevado a un aumento significativo de los requerimientos computacionales de la decodificación de vídeo. El objetivo principal de esta tesis es proporcionar el rendimiento necesario para la decodificación en tiempo real de vídeo de alta calidad. Nuestra solución ha sido la explotación simultánea de múltiples niveles de paralelismo. Por un lado, se realizaron modificaciones en el decodificador de vídeo con el fin de extraer múltiples niveles de paralelismo. Y, por otro lado, se modificaron las arquitecturas de propósito general para mejorar la explotación del tipo paralelismo que está presente en las aplicaciones de vídeo. Primero hicimos un análisis de la escalabilidad de dos extensiones de Instrucción Simple con Múltiples Datos (SIMD por sus siglas en inglés): una de una dimensión (1D) y otra matricial de dos dimensiones (2D). Se demostró que al escalar la extensión 2D se obtiene un mayor rendimiento con una menor complejidad que al escalar la extensión 1D. Luego se realizó una caracterización de la decodificación de H.264/AVC en aplicaciones de alta definición (HD) donde se identificaron los núcleos principales. Debido a la falta de un punto de referencia (benchmark) adecuado para la decodificación de vídeo HD, desarrollamos uno propio, llamado HD-VideoBench el cual incluye aplicaciones completas de codificación y decodificación de vídeo junto con una serie de secuencias de vídeo en HD. Después optimizamos los núcleos más importantes del decodificador H.264/AVC usando instrucciones SIMD. Sin embargo, los resultados no alcanzaron el máximo rendimiento posible debido al efecto negativo de la desalineación de los datos en memoria. Como solución, evaluamos el hardware y el software necesarios para realizar accesos no alineados. Este soporte produjo mejoras significativas de rendimiento en la aplicación. Aparte se realizó una investigación sobre cómo extraer paralelismo de nivel de tarea. Se encontró que ninguno de los mecanismos existentes podía escalar para sistemas masivamente paralelos. Como alternativa, desarrollamos un nuevo algoritmo que fue capaz de encontrar miles de tareas independientes al explotar paralelismo de nivel de macrobloque. Luego implementamos una versión paralela del decodificador de H.264 en una máquina de memoria compartida distribuida (DSM por sus siglas en inglés). Sin embargo esta implementación no alcanzó el máximo rendimiento posible debido al impacto negativo de las operaciones de sincronización y al efecto del núcleo de decodificación de entropía. Con el fin de eliminar estos cuellos de botella se evaluó la paralelización al nivel de imagen de la fase de decodificación de entropía combinada con la paralelización al nivel de macrobloque de los demás núcleos. La sobrecarga de las operaciones de sincronización se eliminó casi por completo mediante el uso de operaciones aceleradas por hardware. Con todas las mejoras presentadas se permitió la decodificación, en tiempo real, de vídeo de alta definición y alta tasa de imágenes por segundo. Como resultado global se creó una solución escalable capaz de usar el número creciente procesadores en las arquitecturas multinúcleo. H.264 High definition Video decoding Parallel SIMD MPEG-2 Vector processors Heterogeneous computing Multicore Computer architecture Parallel programming 004
2	Slice-Level Trading of Quality and Performance in Decoding H.264 Video: Slice-basiertes Abwägen zwischen Qualität und Leistung beim Dekodieren von H.264-Video Roitzsch, Michael 23 June 2006 (has links) When a demanding video decoding task requires more CPU resources then available, playback degrades ungracefully today: The decoder skips frames selected arbitrarily or by simple heuristics, which is noticed by the viewer as jerky motion in the good case or as images completely breaking up in the bad case. The latter can happen due to missing reference frames. This thesis provides a way to schedule individual decoding tasks based on a cost for performance trade. Therefore, I will present a way to preprocess a video, generating estimates for the cost in terms of execution time and the performance in terms of perceived visual quality. The granularity of the scheduling decision is a single slice, which leads to a much more ﬁne-grained approach than dealing with entire frames. Together with an actual scheduler implementation that uses the generated estimates, this work allows for higher perceived quality video playback in case of CPU overload. / Wenn eine anspruchsvolle Video-Dekodierung mehr Prozessor-Ressourcen benötigt, als verfügbar sind, dann verschlechtert sich die Abspielqualität mit aktuellen Methoden drastisch: Willkürlich oder mit einfachen Heuristiken ausgewählten Bilder werden nicht dekodiert. Diese Auslassung nimmt der Betrachter im günstigsten Fall nur als ruckelnde Bewegung wahr, im ungünstigen Fall jedoch als komplettes Zusammenbrechen nachfolgender Bilder durch Folgefehler im Dekodierprozess. Meine Arbeit ermöglicht es, einzelne Teilaufgaben des Dekodierprozesses anhand einer Kosten-Nutzen-Analyse einzuplanen. Dafür ermittle ich die Kosten im Sinne von Rechenzeitbedarf und den Nutzen im Sinne von visueller Qualität für einzelne Slices eines H.264 Videos. Zusammen mit einer Implementierung eines Schedulers, der diese Werte nutzt, erlaubt meine Arbeit höhere vom Betrachter wahrgenommene Videoqualität bei knapper Prozessorzeit. info:eu-repo/classification/ddc/004 ddc:004
3	Polymorphic ASIC : For Video Decoding Adarsha Rao, S J January 2013 (has links) (PDF) Video applications are becoming ubiquitous in recent times due to an explosion in the number of devices with video capture and display capabilities. Traditionally, video applications are implemented on a variety of devices with each device targeting a speciﬁc application. However, the advances in technology have created a need to support multiple applications from a single device like a smart phone or tablet. Such convergence of applications necessitates support for interoperability among various applications, scalable performance meet the requirements of different applications and a high degree of reconﬁgurability to accommodate rapid evolution in applications features. In addition, low power consumption requirement is also very stringent for many video applications. The conventional custom hardware implementations of video applications deliver high performance at low power consumption while the recent MPSoC implementations enable high degree of interoperability and are useful to support application evolution. In this thesis, we combine the best features of custom hardware and MPSoC approaches to design a Polymorphic ASIC. A Polymorphic ASIC is an integrated circuit designed to meet the requirements of several applications belonging to a particular domain. A polymorphic ASIC consists of a fabric of computation, storage and communication resources, using which applications are composed dynamically. Although different video applications differ widely in the internal de-tails of operation, at the heart of almost every video application is a video codec (encoder and decoder). The requirements of scalability, high performance and low power consumption are very stringent for video decoding. Therefore this thesis focuses mainly on the architectural design of a Polymorphic ASIC for video decoding. We present an uniﬁed software and hardware architecture (USHA) for Polymorphic ASIC. USHA is a tiled architecture which uses loosely coupled processor and hardware tiles that are software programmable and hardware reconﬁgurable respectively. The distinctive feature of Polymorphic ASIC is the static partitioning of the application and dynamic mapping of ap-plication processes onto the computational tiles. Depending on the application scenarios, a process may be mapped onto one of the hardware or processor tiles. Polymorphic ASIC incor-porates a network–on–chip (NoC) to achieve ﬂexible communication across different tiles. Formulation of a programming framework for Polymorphic ASIC requires an implementation model that captures the structure of video decoder applications as well as the properties of the Polymorphic ASIC architecture. We derive an implementation model based on a combination of parametric polyhedral process networks, stream based functions and windowed dataﬂow models of computation. The implementation model leads to a process network oriented compilation ﬂow that achieves realization agnostic application partitioning and enables seamless migration across uniprocessor, multi–processor, semi hardware and full hardware conﬁgurations of a video decoder. The thesis also presents an application QoS aware scheduler that selects a decoder conﬁguration that best meets the application performance requirements, thereby enabling dynamic performance scaling. The memory hierarchy of Polymorphic ASIC makes use of an application speciﬁc cache. Through a combined analysis of miss rate and external memory bandwidth, we show that the degradation in decoder performance due to memory stall cycles depends on the properties of the video being decoded as well as the behavior of the external memory interface. Based on this observation, we present the design of a reconﬁgurable 2–D cache architecture which can adjust its parameters in accordance with the characteristics of the video stream being decoded. We validate the Polymorphic ASIC using a proof–of–concept implementation on an FPGA. The performance of H.264 decoder on Polymorphic ASIC is evaluated for uniprocessor, multi processor, hardware accelerated and full hardware conﬁgurations. The scaling in performance delivered by these conﬁgurations shows that the Polymorphic ASIC enables the application to achieve super linear speedups [1]. The experimental results show that different implementations of a H.264 video decoder on the Polymorphic ASIC can deliver performance comparable to a wide spectrum of devices ranging from embedded processor like ARM 9 to MPSoCs like IBM Cell. We also present the energy consumption of various conﬁgurations of video decoders on Polymorphic ASIC and an application to conﬁguration mapping aimed at minimizing the overall energy consumption of a Polymorphic ASIC. Video Decoding Polymorphic ASIC Architecture Multiprocessor System-On-Chip (MPSoC) H.264 Decoder Video Encoding H.264 Decoders H.264 Decoding Computer Science
4	Polymorphic ASIC : For Video Decoding Adarsha Rao, S J January 2013 (has links) (PDF) Video applications are becoming ubiquitous in recent times due to an explosion in the number of devices with video capture and display capabilities. Traditionally, video applications are implemented on a variety of devices with each device targeting a speciﬁc application. However, the advances in technology have created a need to support multiple applications from a single device like a smart phone or tablet. Such convergence of applications necessitates support for interoperability among various applications, scalable performance meet the requirements of different applications and a high degree of reconﬁgurability to accommodate rapid evolution in applications features. In addition, low power consumption requirement is also very stringent for many video applications. The conventional custom hardware implementations of video applications deliver high performance at low power consumption while the recent MPSoC implementations enable high degree of interoperability and are useful to support application evolution. In this thesis, we combine the best features of custom hardware and MPSoC approaches to design a Polymorphic ASIC. A Polymorphic ASIC is an integrated circuit designed to meet the requirements of several applications belonging to a particular domain. A polymorphic ASIC consists of a fabric of computation, storage and communication resources, using which applications are composed dynamically. Although different video applications differ widely in the internal de-tails of operation, at the heart of almost every video application is a video codec (encoder and decoder). The requirements of scalability, high performance and low power consumption are very stringent for video decoding. Therefore this thesis focuses mainly on the architectural design of a Polymorphic ASIC for video decoding. We present an uniﬁed software and hardware architecture (USHA) for Polymorphic ASIC. USHA is a tiled architecture which uses loosely coupled processor and hardware tiles that are software programmable and hardware reconﬁgurable respectively. The distinctive feature of Polymorphic ASIC is the static partitioning of the application and dynamic mapping of ap-plication processes onto the computational tiles. Depending on the application scenarios, a process may be mapped onto one of the hardware or processor tiles. Polymorphic ASIC incor-porates a network–on–chip (NoC) to achieve ﬂexible communication across different tiles. Formulation of a programming framework for Polymorphic ASIC requires an implementation model that captures the structure of video decoder applications as well as the properties of the Polymorphic ASIC architecture. We derive an implementation model based on a combination of parametric polyhedral process networks, stream based functions and windowed dataﬂow models of computation. The implementation model leads to a process network oriented compilation ﬂow that achieves realization agnostic application partitioning and enables seamless migration across uniprocessor, multi–processor, semi hardware and full hardware conﬁgurations of a video decoder. The thesis also presents an application QoS aware scheduler that selects a decoder conﬁguration that best meets the application performance requirements, thereby enabling dynamic performance scaling. The memory hierarchy of Polymorphic ASIC makes use of an application speciﬁc cache. Through a combined analysis of miss rate and external memory bandwidth, we show that the degradation in decoder performance due to memory stall cycles depends on the properties of the video being decoded as well as the behavior of the external memory interface. Based on this observation, we present the design of a reconﬁgurable 2–D cache architecture which can adjust its parameters in accordance with the characteristics of the video stream being decoded. We validate the Polymorphic ASIC using a proof–of–concept implementation on an FPGA. The performance of H.264 decoder on Polymorphic ASIC is evaluated for uniprocessor, multi processor, hardware accelerated and full hardware conﬁgurations. The scaling in performance delivered by these conﬁgurations shows that the Polymorphic ASIC enables the application to achieve super linear speedups [1]. The experimental results show that different implementations of a H.264 video decoder on the Polymorphic ASIC can deliver performance comparable to a wide spectrum of devices ranging from embedded processor like ARM 9 to MPSoCs like IBM Cell. We also present the energy consumption of various conﬁgurations of video decoders on Polymorphic ASIC and an application to conﬁguration mapping aimed at minimizing the overall energy consumption of a Polymorphic ASIC. Video Decoding Polymorphic ASIC Architecture Multiprocessor System-On-Chip (MPSoC) H.264 Decoder Video Encoding H.264 Decoders H.264 Decoding Computer Science
5	Implementation and Evaluation of MPEG-4 Simple Profile Decoder on a Massively Parallel Processor Array Savas, Suleyman January 2011 (has links) The high demand of the video decoding has pushed the developers to implement the decoders on parallel architectures. This thesis provides the deliberations about the implementation of an MPEG-4 decoder on a massively parallel processor array (MPPA), Ambric 2045, by converting the CAL actor language implementation of the decoder. This decoder is the Xilinx model of the MPEG-4 Simple Profile decoder and consists of four main blocks; parser, acdc, idct2d and motion. The parser block is developed in another thesis work [20] and the rest of the decoder, which consists of the other three blocks, is implemented in this thesis work. Afterwards, in order to complete the decoder, the parser block is combined with the other three blocks. Several methods are developed for conversion purposes. Additionally, a number of other methods are developed in order to overcome the constraints of the ambric architecture such as no division support. At the beginning, for debugging purposes, the decoder is implemented on a simulator which is designed for Ambric architecture. Finally the implementation is uploaded to the Ambric 2045 chip and tested with different input streams. The performance of the implementation is analyzed and satisfying results are achieved when compared to the standards which are in use in the market. These performance results can be considered as satisfying for any real-time application as well. Furthermore, the results are compared with the results of the CAL implementation, running on a single 2GHz i7 intel processor, in terms of speed and efficiency. The Ambric implementation runs 4,7 times faster than the CAL implementation when a small input stream (300 frames with resolution of 176x144) is used. However, when a large input stream (384 frames with resolution of 720x480) is used, the Ambric implementation shows a performance which is approximately 32 times better than the CAL implementation, in terms of decoding speed and throughput. The performance may increase further together with the size of the input stream up to some point. MPEG MPEG-4 Ambric Ambric Architecture MPPA Processor Array Parallel Architecture CAL Caltrop Video Decoding Video Decoder CAL Actor Language Converting CAL ajava astruct simple profile decoder embedded system video encoding RVC Reconfigurable Video Coding Computer Engineering Datorteknik Computer Sciences Datavetenskap (datalogi) Computer and Information Sciences Data- och informationsvetenskap

1

Page generated in 0.0723 seconds