Global ETD Search

Return to search

Histogram of Oriented Gradients in a Vision Transformer

This study aims to modify Vision Transformer (ViT) to achieve higher accuracy. ViT is a model used in computer vision to, among other things, classify images. By applying ViT to the MNIST data set, an accuracy of approximately 98% is achieved. ViT is modified by implementing a method called Histogram of Oriented Gradients (HOG) in two different ways. The results show that the first approach with HOG gives an accuracy of 98,74% (setup 1) and the second approach gives an accuracy of 96,87% (patch size 4x4 pixels). The study shows that when HOG is applied on the entire image, a better accuracy is obtained. However, no systematic optimization has taken place, which makes it difficult to draw conclusions with certainty.

http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-476352

vision transformer

vit

histogram of oriented gradients

hog

machine learning

artificial intelligence

MNIST

Other Computer and Information Science

Annan data- och informationsvetenskap

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-476352
Date	January 2022
Creators	Malmsten, Jakob, Cengiz, Heja, Lood, David
Publisher	Uppsala universitet, Avdelningen för visuell information och interaktion
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess
Relation	MATVET-F ; 22020

Page generated in 0.0021 seconds

Histogram of Oriented Gradients in a Vision Transformer

Description

Links & Downloads

Tags

Additional Fields