This study aims to modify Vision Transformer (ViT) to achieve higher accuracy. ViT is a model used in computer vision to, among other things, classify images. By applying ViT to the MNIST data set, an accuracy of approximately 98% is achieved. ViT is modified by implementing a method called Histogram of Oriented Gradients (HOG) in two different ways. The results show that the first approach with HOG gives an accuracy of 98,74% (setup 1) and the second approach gives an accuracy of 96,87% (patch size 4x4 pixels). The study shows that when HOG is applied on the entire image, a better accuracy is obtained. However, no systematic optimization has taken place, which makes it difficult to draw conclusions with certainty.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-476352 |
Date | January 2022 |
Creators | Malmsten, Jakob, Cengiz, Heja, Lood, David |
Publisher | Uppsala universitet, Avdelningen för visuell information och interaktion |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Relation | MATVET-F ; 22020 |
Page generated in 0.0018 seconds