Return to search

Histogram of Oriented Gradients in a Vision Transformer

This study aims to modify Vision Transformer (ViT) to achieve higher accuracy. ViT is a model used in computer vision to, among other things, classify images. By applying ViT to the MNIST data set, an accuracy of approximately 98% is achieved. ViT is modified by implementing a method called Histogram of Oriented Gradients (HOG) in two different ways. The results show that the first approach with HOG gives an accuracy of 98,74% (setup 1) and the second approach gives an accuracy of 96,87% (patch size 4x4 pixels). The study shows that when HOG is applied on the entire image, a better accuracy is obtained. However, no systematic optimization has taken place, which makes it difficult to draw conclusions with certainty.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-476352
Date January 2022
CreatorsMalmsten, Jakob, Cengiz, Heja, Lood, David
PublisherUppsala universitet, Avdelningen för visuell information och interaktion
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess
RelationMATVET-F ; 22020

Page generated in 0.0023 seconds