Global ETD Search

Return to search

Visual Attention Guided Adaptive Quantization for x265 using Deep Learning / Visuellt fokus baserad adaptiv kvantisering för x265 med djup inlärning

The video on demand streaming is raising drastically in popularity, bringing new challenges to the video coding field. There is a need for new video coding techniques that improve performance and reduce the bitrates. One of the most promising areas of research is perceptual video coding where attributes of the human visual system are considered to minimize visual redundancy. The visual attention only makes it possible for humans to focus on a smaller region at the time, which is led by different cues, and with deep neural networks it has become possible to create high-accuracy models of this. The purpose of this study is therefore to investigate how adaptive quantization (AQ) based on a deep visual attention model can be used to improve the subjective video quality for low bitrates. A deep visual attention model was integrated into the encoder x265 to control how the bits are distributed on frame level by adaptively setting the quantization parameter. The effect on the subjective video quality was evaluated through A/B testing where the solution was compared to one of the standard methods for AQ in x265. The results show that the ROI-based AQ was perceived to be of better quality in one out of ten cases. The results can partly be explained by certain methodological choices, but also highlights a need for more research on how to make use of visual attention modeling in more complex real-world streaming scenarios to make streaming content more accessible and reduce bitrates. / "Video on demand"-streamingen ökar kraftigt i popularitet vilket skapar nya utmaningar inom video kodning. Det finns ett behov av nya videokodningstekniker som ökar prestanda och reducerar bithastigheten. Ett av de mest lovade forskningsområdena är perceptuell videokodning där man tar hänsyn till synens egenskaper för att minimera visuell redundans. Det visuella fokuset gör att människan bara kan fokusera på ett mindre områden åt gången, lett av olika typer av signaler, och med hjälp av djupa neurala nätverk har det blivit möjligt att skapa välpresterande modeller av det. Syftet med denna studie är därför att undersöka hur adaptiv kvantisering baserat på en djupinlärningsmodell av visuellt fokus kan användas för att förbättra den subjektiva videokvaliteten för låga bithastigheter. En djup modell av visuellt fokus var integrerad i videokodaren x265 för att kontrollera hur bitarna ditribueras på bildnivå genom att adaptivt sätta kvantiseringsparametern. Den subjektiva videokvaliteten utvärderades genom A/B tester där lösningen jämfördes med en standardmetod för adaptiv kvantisering i x265. Resultaten visar att den visuellt fokus-baserade adaptiva kvantiseringen upplevdes ge bättre kvalitet i ett av tio fall. Detta resultat kan delvis förklaras av vissa metodval, men visar också på ett behov för mer forskning på hur modeller för visuellt fokus kan användas i mer komplexa och verkliga streamingscenarion för att kunna göra innehållet mer tillgängligt och reducera bithastigheten.

http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-333989

video encoding

deep learning

visual attention

adaptive quantization

Datavetenskap (datalogi)

Media and Communication Technology

Medieteknik

Computer and Information Sciences

Data- och informationsvetenskap

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:kth-333989
Date	January 2023
Creators	Gärde, Mikaela
Publisher	KTH, Skolan för elektroteknik och datavetenskap (EECS), Stockholm : KTH Royal Institute of Technology
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	Swedish
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess
Relation	TRITA-EECS-EX ; 2023:400

Page generated in 0.0015 seconds

Visual Attention Guided Adaptive Quantization for x265 using Deep Learning / Visuellt fokus baserad adaptiv kvantisering för x265 med djup inlärning

Description

Links & Downloads

Tags

Additional Fields