• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 181
  • 56
  • 9
  • 9
  • 6
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 288
  • 288
  • 89
  • 82
  • 80
  • 72
  • 47
  • 46
  • 43
  • 41
  • 41
  • 37
  • 36
  • 33
  • 33
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
91

End-to-end 3D video communication over heterogeneous networks

Mohib, Hamdullah January 2014 (has links)
Three-dimensional technology, more commonly referred to as 3D technology, has revolutionised many fields including entertainment, medicine, and communications to name a few. In addition to 3D films, games, and sports channels, 3D perception has made tele-medicine a reality. By the year 2015, 30% of the all HD panels at home will be 3D enabled, predicted by consumer electronics manufacturers. Stereoscopic cameras, a comparatively mature technology compared to other 3D systems, are now being used by ordinary citizens to produce 3D content and share at a click of a button just like they do with the 2D counterparts via sites like YouTube. But technical challenges still exist, including with autostereoscopic multiview displays. 3D content requires many complex considerations--including how to represent it, and deciphering what is the best compression format--when considering transmission or storage, because of its increased amount of data. Any decision must be taken in the light of the available bandwidth or storage capacity, quality and user expectations. Free viewpoint navigation also remains partly unsolved. The most pressing issue getting in the way of widespread uptake of consumer 3D systems is the ability to deliver 3D content to heterogeneous consumer displays over the heterogeneous networks. Optimising 3D video communication solutions must consider the entire pipeline, starting with optimisation at the video source to the end display and transmission optimisation. Multi-view offers the most compelling solution for 3D videos with motion parallax and freedom from wearing headgear for 3D video perception. Optimising multi-view video for delivery and display could increase the demand for true 3D in the consumer market. This thesis focuses on an end-to-end quality optimisation in 3D video communication/transmission, offering solutions for optimisation at the compression, transmission, and decoder levels.
92

On the Enhancement of Audio and Video in Mobile Equipment

Rossholm, Andreas January 2006 (has links)
Use of mobile equipment has increased exponentially over the last decade. As use becomes more widespread so too does the demand for new functionalities. The limited memory and computational power of many mobile devices has proven to be a challenge resulting in many innovative solutions and a number of new standards. Despite this, there is often a requirement for additional enhancement to improve quality. The focus of this thesis work has been to perform enhancement within two different areas; audio or speech encoding and video encoding/decoding. The audio enhancement section of this thesis addresses the well known problem in the GSM system with an interfering signal generated by the switching nature of TDMA cellular telephony. Two different solutions are given to suppress such interference internally in the mobile handset. The first method involves the use of subtractive noise cancellation employing correlators, the second uses a structure of IIR noth filters. Both solutions use control algorithms based on the state of the communication between the mobile handset and the base station. The video section of this thesis presents two post-filters and one pre-filter. The two post-filters are designed to improve visual quality of highly compressed video streams from standard, block-based video codecs by combating both blocking and ringing artifacts. The second post-filter also performs sharpening. The pre-filter is designed to increase the coding efficiency of a standard block based video codec. By introducing a pre-processing algorithm before the encoder, the amount of camera disturbance and the complexity of the sequence can be decreased, thereby increasing coding efficiency.
93

The effects of evaluation and rotation on descriptors and similarity measures for a single class of image objects

06 June 2008 (has links)
“A picture is worth a thousand words”. If this proverb were taken literally we all know that every person interprets images or photos differently in terms of its content. This is due to the semantics contained in these images. Content-based image retrieval has become a vast area of research in order to successfully describe and retrieve images according to the content. In military applications, intelligence images such as those obtained by the defence intelligence group are taken (mostly on film), developed and then manually annotated thereafter. These photos are then stored in a filing system according to certain attributes such as the location, content etc. To retrieve these images at a later stage might take days or even weeks to locate. Thus, the need for a digital annotation system has arisen. The images of the military contain various military vehicles and buildings that need to be detected, described and stored in a database. For our research we want to look at the effects that the rotation and elevation angle of an object in an image has on the retrieval performance. We chose model cars in order to be able to control the environment the photos were taken in such as the background, lighting, distance between the objects, and the camera etc. There are also a wide variety of shapes and colours of these models to obtain and work with. We look at the MPEG-7 descriptor schemes that are recommended by the MPEG group for video and image retrieval as well as implement three of them. For the military it could be required that when the defence intelligence group is in the field, that the images be directly transmitted via satellite to the headquarters. We have therefore included the JPEG2000 standard which gives a compression performance increase of 20% over the original JPEG standard. It is also capable to transmit images wirelessly as well as securely. Including the MPEG-7 descriptors that we have implemented, we have also implemented the fuzzy histogram and colour correlogram descriptors. For our experimentation we implemented a series of experiments in order to determine the effects that rotation and elevation has on our model vehicle images. Observations are made when each vehicle is considered separately and when the vehicles are described and combined into a single database. After the experiments are done we look at the descriptors and determine which adjustments could be made in order to improve the retrieval performance thereof. / Dr. W.A. Clarke
94

Energy Efficient and Programmable Architecture for Wireless Vision Sensor Node

Imran, Muhammad January 2013 (has links)
Wireless Vision Sensor Networks (WVSNs) is an emerging field which has attracted a number of potential applications because of smaller per node cost, ease of deployment, scalability and low power stand alone solutions. WVSNs consist of a number of wireless Vision Sensor Nodes (VSNs). VSN has limited resources such as embedded processing platform, power supply, wireless radio and memory.  In the presence of these limited resources, a VSN is expected to perform complex vision tasks for a long duration of time without battery replacement/recharging. Currently, reduction of processing and communication energy consumptions have been major challenges for battery operated VSNs. Another challenge is to propose generic solutions for a VSN so as to make these solutions suitable for a number of applications. To meet these challenges, this thesis focuses on energy efficient and programmable VSN architecture for machine vision systems which can classify objects based on binary data. In order to facilitate generic solutions, a taxonomy has been developed together with a complexity model which can be used for systems’ classification and comparison without the need for actual implementation. The proposed VSN architecture is based on tasks partitioning between a VSN and a server as well as tasks partitioning locally on the node between software and hardware platforms. In relation to tasks partitioning, the effect on processing, communication energy consumptions, design complexity and lifetime has been investigated. The investigation shows that the strategy, in which front end tasks up to segmentation, accompanied by a bi-level coding, are implemented on Field Programmable Platform (FPGA) with small sleep power, offers a generalized low complexity and energy efficient VSN architecture. The implementation of data intensive front end tasks on hardware reconfigurable platform reduces processing energy. However, there is a scope for reducing communication energy, related to output data. This thesis also explores data reduction techniques including image coding, region of interest coding and change coding which reduces output data significantly. For proof of concept, VSN architecture together with tasks partitioning, bi-level video coding, duty cycling and low complexity background subtraction technique has been implemented on real hardware and functionality has been verified for four applications including particle detection system, remote meter reading, bird detection and people counting. The results based on measured energy values shows that, depending on the application, the energy consumption can be reduced by a factor of approximately 1.5 up to 376 as compared to currently published VSNs. The lifetime based on measured energy values showed that for a sample period of 5 minutes, VSN can achieve 3.2 years lifetime with a battery of 37.44 kJ energy. In addition to this, proposed VSN offers generic architecture with smaller design complexity on hardware reconfigurable platform and offers easy adaptation for a number of applications as compared to published systems.
95

RevGlyph - codificação e reversão esteroscópica anaglífica / RevGlyph - stereoscopic coding and reversing of anaglyphs

Zingarelli, Matheus Ricardo Uihara 27 September 2013 (has links)
A atenção voltada à produção de conteúdos 3D atualmente tem sido alta, em grande parte devido à aceitação e à manifestação de interesse do público para esta tecnologia. Isso reflete num maior investimento das indústrias cinematográfica, de televisores e de jogos visando trazer o 3D para suas produções e aparelhos, oferecendo modos diferentes de interação ao usuário. Com isso, novas técnicas de captura, codificação e modos de reprodução de vídeos 3D, em especial, os vídeos estereoscópicos, vêm surgindo ou sendo melhorados, visando aperfeiçoar e integrar esta nova tecnologia com a infraestrutura disponível. Entretanto, notam-se divergências nos avanços feitos no campo da codificação, com cada método de visualização estereoscópica utilizando uma técnica de codificação diferente. Isso leva ao problema da incompatibilidade entre métodos de visualização. Uma proposta é criar uma técnica que seja genérica, isto é, independente do método de visualização. Tal técnica, por meio de parâmetros adequados, codifica o vídeo estéreo sem nenhuma perda significativa tanto na qualidade quanto na percepção de profundidade, característica marcante nesse tipo de conteúdo. A técnica proposta, denominada RevGlyph, transforma um par estéreo de vídeos em um único fluxo anaglífico, especialmente codificado. Tal fluxo, além de ser compatível com o método anaglífico de visualização, é também reversível a uma aproximação do par estéreo original, garantindo a independência do método de visualização / Attention towards 3D content production has been currently high, mostly because of public acceptance and interest in this kind of technology. That reflects in more investment from film, television and gaming industries, aiming at bringing 3D to their content and devices, as well as offering different ways of user interaction. Therefore, new capturing techniques, coding and playback modes for 3D video, particularly stereoscopic video, have been emerging or being enhanced, focusing on improving and integrating this new kind of technology with the available infrastructure. However, regarding advances in the coding area, there are conflicts because each stereoscopic visualization method uses a different coding technique. That leads to incompatibility between those methods. One proposal is to develop a generic technique, that is, a technique that is appropriate regardless the visualization method. Such technique, with suitable parameters, outputs a stereoscopic video with no significant loss of quality or depth perception, which is the remarkable feature of this kind of content. The proposed technique, named RevGlyph, transforms a stereo pair of videos into a single anaglyph stream, coded in a special manner. Such stream is not only compliant with the anaglyph visualization method but also reversible to something close to the original stereo pair, allowing visualization independence
96

Reversão anaglífica em vídeos estereoscópicos / Anaglyphic reversion in stereoscopic videos

Rodrigues, Felipe Maciel 24 May 2016 (has links)
A atenção voltada à produção de conteúdos 3D atualmente tem sido alta, grande parte devido à aceitação e à manifestação de interesse do público para esta tecnologia. Novas técnicas de captação e codificação e modos de reprodução de vídeos 3D, particularmente vídeos estereoscópicos, vêm surgindo ou sendo melhorados, visando aperfeiçoar e integrar esta nova tecnologia com a infraestrutura disponível. No entanto, em relação a avanços na área de codificação, nota-se a ausência de uma técnica compatível com mais de um método de visualização de vídeos estereoscópicos - para cada método de visualização há uma técnica de codificação diferente, o que inviabiliza ao usuário escolher o método que deseja visualizar o conteúdo. Uma abordagem para resolver este problema é desenvolver uma técnica genérica, isto é, uma técnica que seja independentemente do método de visualização, que através de parâmetros adequados, produza um vídeo estereoscópico sem perda significativa de qualidade ou a percepção de profundidade, que é a característica marcante desse tipo de conteúdo. O método proposto neste trabalho, chamado HaaRGlyph, transforma um vídeo esterescópico em um único fluxo contendo um anáglifo, codificado de modo especial. Esse fluxo além de ser compatível com o método de visualização anaglífica é também reversível à uma aproximação do par estéreo original, possibilitando a independência de visualização. Além disso, a HaaRGlyph atinge maiores taxas de compressão do que o trabalho relacionado. / Attention towards 3D content production has been currently high, mostly because of public acceptance and interest in this kind of technology. Therefore, new capturing techniques, coding and playback modes for 3D video, particularly stereoscopic video, have been emerging or being enhanced, focusing on improving and integrating this new kind of technology with the available infrastructure. However, regarding advances in the coding area, there are conflicts because each stereoscopic visualization method uses a different coding technique. That leads to incompatibility between those methods. An approach to tackle this problem is to develop a generic technique, that is, a technique that is appropriate regardless the visualization method. Such technique, with suitable parameters, outputs a stereoscopic video with no significant loss of quality or depth perception, which is the remarkable feature of this kind of content. The method proposed in this work, named HaaRGlyph, transforms a stereo pair of videos into a single anaglyph stream, coded in a special manner. Such stream is not only compliant with the anaglyph visualization method but also reversible to something close to the original stereo pair, allowing visualization independence. Moreover, HaarGlyph achieves higher compression rates than related work.
97

Codificação escalável de vídeo para recepção fixa no sistema brasileiro de televisão digital. / Scalable video coding for fixed reception in the Brazilian digital TV system.

Nunes, Rogério Pernas 29 May 2009 (has links)
Em dezembro de 2007, a partir da cidade de São Paulo, as transmissões de televisão digital terrestre e aberta tiveram início no Brasil. Um avanço significativo do Sistema Brasileiro de TV Digital (SBTVD) foi a adoção do padrão H.264/AVC e o formato de vídeo 1080i para a codificação de vídeo em alta definição. A adoção em larga escala de tecnologia de alta definição tem sido um processo observado em vários mercados do mundo, e novos formatos superiores ao 1080i já estão sendo discutidos e propostos. Tendo em vista o que será a próxima geração da televisão, centros de pesquisa, como o centro japonês da emissora NHK, investigam os fatores humanos determinantes para caracterizar o sistema que deverá ser o último passo em tecnologia de televisão 2D. Já nomeado de UHDTV, este sistema deve contemplar resolução de 7680 pontos horizontais por 4320 pontos verticais, além de outras características ainda em estudo. Ao mesmo tempo, o trabalho aqui apresentado discute as ferramentas de suporte da escalabilidade na codificação multimídia como forma de evolução gradual dos formatos de vídeo na radiodifusão. Especificamente este trabalho sistematiza as ferramentas de escalabilidade do padrão H.264/AVC tendo em vista a sua aplicação ao SBTVD. Neste sentido, são discutidas as possibilidades de evolução do sistema frente à escalabilidade e são apresentados levantamentos experimentais da atual ocupação do espectro na cidade de São Paulo, evidenciando a disponibilidade de taxa para expansões. São apresentados também resultados iniciais relativos à codificação SVC, que apontam objetivamente as vantagens da escalabilidade sobre o simulcast, evidenciando que esta técnica pode ser utilizada no SBTVD para prover novos formatos de vídeo, tendo como premissa a compatibilidade com os atuais receptores que suportam o formato 1080i. O trabalho apresenta contribuições teóricas e experimentais na direção de adoção da escalabilidade no SBTVD, apontando também possíveis trabalhos futuros que, se realizados, poderão confirmar a transmissão de formatos superiores de vídeo nos próximos anos no SBTVD. / On December 2007, starting from São Paulo city, the open digital terrestrial transmissions were launched in Brazil. A significant improvement of the Brazilian Digital TV System (SBTVD) was the adoption of the H.264/AVC standard supporting the 1080i video format for the high definition video coding. Wide adoption of high definition technology has been a process that can be observed in lots of countries, and new video formats, beyond 1080i have already been discussed and proposed. With both eyes in the next generation of TV, research centers like Japanese broadcaster NHK investigate human factors that should drive the system specifications of this one that may be the last step in terms of 2D television technology. Named UHDTV, the system may support 7680 horizontal dots per 4320 vertical dots in terms of resolution among other features. At the same time, the work exposed here discusses tools that support multimedia coding scalability as a way of gradually improving video formats for broadcast. This work specially deals with the H.264/AVC standard scalability tools, aiming their use within SBTVD. Therefore, the evolution of the system is discussed based on scalability and experimental results related to the digital TV spectral occupation in São Paulo city are analyzed, showing that there is enough exceeding bit rate available for future expansion. Initial results related to SVC coding are also shown, objectively indicating that video scalability is more advantageous than simulcast and that this technique can be used in SBTVD to provide new video formats, keeping compatibility with current receivers that only support 1080i format. This work presents theoretical and experimental contributions towards the adoption of SVC in the SBTVD, pointing out some future works that, if executed, could confirm the transmission of new video format in SBTVD in the next years.
98

Non-expansive symmetrically extended wavelet transform for arbitrarily shaped video object plane.

January 1998 (has links)
by Lai Chun Kit. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1998. / Includes bibliographical references (leaves 68-70). / Abstract also in Chinese. / ACKNOWLEDGMENTS --- p.IV / ABSTRACT --- p.v / Chapter Chapter 1 --- Traditional Image and Video Coding --- p.1 / Chapter 1.1 --- Introduction --- p.1 / Chapter 1.2 --- Fundamental Principle of Compression --- p.1 / Chapter 1.3 --- Entropy - Value of Information --- p.2 / Chapter 1.4 --- Performance Measure --- p.3 / Chapter 1.5 --- Image Coding Overview --- p.4 / Chapter 1.5.1 --- Digital Image Formation --- p.4 / Chapter 1.5.2 --- Needs of Image Compression --- p.4 / Chapter 1.5.3 --- Classification of Image Compression --- p.5 / Chapter 1.5.4 --- Transform Coding --- p.6 / Chapter 1.6 --- Video Coding Overview --- p.8 / Chapter Chapter 2 --- Discrete Wavelets Transform (DWT) and Subband Coding --- p.11 / Chapter 2.1 --- Subband Coding --- p.11 / Chapter 2.1.1 --- Introduction --- p.11 / Chapter 2.1.2 --- Quadrature Mirror Filters (QMFs) --- p.12 / Chapter 2.1.3 --- Subband Coding for Image --- p.13 / Chapter 2.2 --- Discrete Wavelets Transformation (DWT) --- p.15 / Chapter 2.2.1 --- Introduction --- p.15 / Chapter 2.2.2 --- Wavelet Theory --- p.15 / Chapter 2.2.3 --- Comparison Between Fourier Transform and Wavelet Transform --- p.16 / Chapter Chapter 3 --- Non-expansive Symmetric Extension --- p.19 / Chapter 3.1 --- Introduction --- p.19 / Chapter 3.2 --- Types of extension scheme --- p.19 / Chapter 3.3 --- Non-expansive Symmetric Extension and Symmetric Sub-sampling --- p.21 / Chapter Chapter 4 --- Content-based Video Coding in MPEG-4 Purposed Standard --- p.24 / Chapter 4.1 --- Introduction --- p.24 / Chapter 4.2 --- Motivation of the new MPEG-4 standard --- p.25 / Chapter 4.2.1 --- Changes in the production of audio-visual material --- p.25 / Chapter 4.2.2 --- Changes in the consumption of multimedia information --- p.25 / Chapter 4.2.3 --- Reuse of audio-visual material --- p.26 / Chapter 4.2.4 --- Changes in mode of implementation --- p.26 / Chapter 4.3 --- Objective of MPEG-4 standard --- p.27 / Chapter 4.4 --- Technical Description of MPEG-4 --- p.28 / Chapter 4.4.1 --- Overview of MPEG-4 coding system --- p.28 / Chapter 4.4.2 --- Shape Coding --- p.29 / Chapter 4.4.3 --- Shape Adaptive Texture Coding --- p.33 / Chapter 4.4.4 --- Motion Estimation and Compensation (ME/MC) --- p.35 / Chapter Chapter 5 --- Shape Adaptive Wavelet Transformation Coding Scheme (SA WT) --- p.36 / Chapter 5.1 --- Shape Adaptive Wavelet Transformation --- p.36 / Chapter 5.1.1 --- Introduction --- p.36 / Chapter 5.1.2 --- Description of Transformation Scheme --- p.37 / Chapter 5.2 --- Quantization --- p.40 / Chapter 5.3 --- Entropy Coding --- p.42 / Chapter 5.3.1 --- Introduction --- p.42 / Chapter 5.3.2 --- Stack Run Algorithm --- p.42 / Chapter 5.3.3 --- ZeroTree Entropy (ZTE) Coding Algorithm --- p.45 / Chapter 5.4 --- Binary Shape Coding --- p.49 / Chapter Chapter 6 --- Simulation --- p.51 / Chapter 6.1 --- Introduction --- p.51 / Chapter 6.2 --- SSAWT-Stack Run --- p.52 / Chapter 6.3 --- SSAWT-ZTR --- p.53 / Chapter 6.4 --- Simulation Results --- p.55 / Chapter 6.4.1 --- SSAWT - STACK --- p.55 / Chapter 6.4.2 --- SSAWT ´ؤ ZTE --- p.56 / Chapter 6.4.3 --- Comparison Result - Cjpeg and Wave03. --- p.57 / Chapter 6.5 --- Shape Coding Result --- p.61 / Chapter 6.6 --- Analysis --- p.63 / Chapter Chapter 7 --- Conclusion --- p.64 / Appendix A: Image Segmentation --- p.65 / Reference --- p.68
99

Media Scaling for Power Optimization on Wireless Video Sensors

Lu, Rui 23 August 2007 (has links)
"Video-based sensor networks can be used to improve environment surveillance, health care and emergency response. Many sensor network scenarios require multiple high quality video streams that share limited wireless bandwidth. At the same time, the lifetime of wireless video sensors are constrained by the capacity of their batteries. Media scaling may extend battery life by reducing the video data rate while still maintaining visual quality, but comes at the expense of additional compression time. This thesis studies the effects of media scaling on video sensor energy consumption by: measuring the energy consumption on the different components of the video sensor; building a energy consumption model with several adjustable parameters to analyze the performance of a video sensor; exploring the trade-offs between the video quality and the energy consumption for a video sensor; and, finally, building a working video sensor to validate the accuracy of the model. The results show that the model is an accurate representation of the power usage of an actual video sensor. In addition, media scaling is often an effective way to reduce energy consumption in a video sensor."
100

Machine learning mode decision for complexity reduction and scaling in video applications

Grellert, Mateus January 2018 (has links)
As recentes inovações em técnicas de Aprendizado de Máquina levaram a uma ampla utilização de modelos inteligentes para resolver problemas complexos que são especialmente difíceis de computar com algoritmos e estruturas de dados convencionais. Em particular, pesquisas recentes em Processamento de Imagens e Vídeo mostram que é possível desenvolver modelos de Aprendizado de Máquina que realizam reconhecimento de objetos e até mesmo de ações com altos graus de confiança. Além disso, os últimos avanços em algoritmos de treinamento para Redes Neurais Profundas (Deep Learning Neural Networks) estabeleceram um importante marco no estudo de Aprendizado de Máquina, levando a descobertas promissoras em Visão Computacional e outras aplicações. Estudos recentes apontam que também é possível desenvolver modelos inteligentes capazes de reduzir drasticamente o espaço de otimização do modo de decisão em codificadores de vídeo com perdas irrelevantes em eficiência de compressão. Todos esses fatos indicam que Aprendizado de Máquina para redução de complexidade em aplicações de vídeo é uma área promissora para pesquisa. O objetivo desta tese é investigar técnicas baseadas em aprendizado para reduzir a complexidade das decisões da codificação HEVC, com foco em aplicações de codificação e transcodificação rápidas. Um perfilamento da complexidade em codificadores é inicialmente apresentado, a fim de identificar as tarefas que requerem prioridade para atingir o objetivo dessa tese. A partir disso, diversas variáveis e métricas são extraídas durante os processos de codificação e decodificação para avaliar a correlação entre essas variáveis e as decisões de codificação associadas a essas tarefas. Em seguida, técnicas de Aprendizado de Máquina são empregadas para construir classificadores que utilizam a informação coletada para prever o resultado dessas decisões, eliminando o custo computacional necessário para computá-las. As soluções de codificação e transcodificação foram desenvolvidas separadamente, pois o tipo de informação é diferente em cada caso, mas a mesma metologia foi aplicada em ambos os casos. Além disso, mecanismos de complexidade escalável foram desenvolvidos para permitir o melhor desempenho taxa-compressão para um dado valor de redução de complexidade. Resultados experimentais apontam que as soluções desenvolvidas para codificação rápida atingiram reduções de complexidade entre 37% e 78% na média, com perdas de qualidade entre 0.04% e 4.8% (medidos em Bjontegaard Delta Bitrate – BD-BR). Já as soluções para trancodificação rápida apresentaram uma redução de 43% até 67% na complexidade, com BD-BR entre 0.34% e 1.7% na média. Comparações com o estado da arte confirmam a eficácia dos métodos desenvolvidos, visto que são capazes de superar os resultados atingidos por soluções similares. / The recent innovations in Machine Learning techniques have led to a large utilization of intelligent models to solve complex problems that are especially hard to compute with traditional data structures and algorithms. In particular, the current research on Image and Video Processing shows that it is possible to design Machine Learning models that perform object recognition and even action recognition with high confidence levels. In addition, the latest progress on training algorithms for Deep Learning Neural Networks was also an important milestone in Machine Learning, leading to prominent discoveries in Computer Vision and other applications. Recent studies have also shown that it is possible to design intelligent models capable of drastically reducing the optimization space of mode decision in video encoders with minor losses in coding efficiency. All these facts indicate that Machine Learning for complexity reduction in visual applications is a very promising field of study. The goal of this thesis is to investigate learning-based techniques to reduce the complexity of the HEVC encoding decisions, focusing on fast video encoding and transcoding applications. A complexity profiling of HEVC is first presented to identify the tasks that must be prioritized to accomplish our objective. Several variables and metrics are then extracted during the encoding and decoding processes to assess their correlation with the encoding decisions associated with these tasks. Next, Machine Learning techniques are employed to construct classifiers that make use of this information to accurately predict the outcome of these decisions, eliminating the timeconsuming operations required to compute them. The fast encoding and transcoding solutions were developed separately, as the source of information is different on each case, but the same methodology was followed in both cases. In addition, mechanisms for complexity scalability were developed to provide the best rate-distortion performance given a target complexity reduction. Experimental results demonstrated that the designed fast encoding solutions achieve time savings of 37% up to 78% on average, with Bjontegaard Delta Bitrate (BD-BR) increments between 0.04% and 4.8%. In the transcoding results, a complexity reduction ranging from 43% to 67% was observed, with average BD-BR increments from 0.34% up to 1.7%. Comparisons with state of the art confirm the efficacy of the designed methods, as they outperform the results achieved by related solutions.

Page generated in 0.0649 seconds