Return to search

Sketch Quality Prediction Using Transformers

The quality of an input sketch can affect performance of the computational algorithms.
However, the quality of a sketch is not often considered when working with sketch tasks and automated sketch quality prediction has not been previously studied. This thesis presents quality prediction on the "Sketchy" dataset. The method presented here predicts a quality label rather than a zero to one quality metric. This thesis predicts an understandable label rather than a computer-generated quality metric with no human input. Previous tasks like sketch classification have used a transformer architecture to leverage the vector format of sketches. The architecture used in sketch classification was called Sketchformer. The Sketchformer was adopted and trained to predict quality labels of hand-drawn sketches.
This Sketchformer architecture achieves 66% accuracy when predicting the 5-labels. The same transformer achieves up to 97% accuracy in a different experiment when combining the different labels into good versus bad (2-label) experiments. The sketchformer significantly outperforms the SVM baseline. The results of the experiments show that the transformer embedding space facilitates separation of 'good' sketch quality from 'bad' sketch quality with high accuracy. / Master of Science / If pictures are worth 1000 words, then sketches are worth a few hundred words. Sketches are easy to create using a pen and tablet. Objects in the sketches can be drawn many ways, depending on the talent of the creator and pose of the object. The quality of the sketches vary pretty drastically. When using sketches in computer vision tasks, the quality of a sketch can affect the performance of the computational algorithm. However, the quality of a sketch is not often considered when working with other sketch tasks. One common sketch task is called Sketch-Based Image Retrieval (SBIR). The input of this task is the sketch of an object/subject, and the model returns a matching image of the same object/subject. If the quality of the input sketch is bad, the output of this model will be poor. This thesis predicts the quality of sketches. The dataset used is called the "Sketchy" dataset, this dataset was originally used to study SBIR. However, the creators of the dataset provided quality labels for the sketches. This allows for quality prediction on this dataset, which has not previously been completed. There are 5 different labels assigned to sketches. One of the experiments completed for this thesis was predicting 1 of the 5 labels for each sketch. The other experiments for this thesis create good and bad labels by combining the 5 labels. The Sketchformer architecture created by Ribeiro et al. is used to run the experiments. The Sketchformer achieves 66% on the 5-label experiment and up to 97% on the good and bad (2-label) experiment. This transformer outperforms a Support Vector Machine baseline on this quality labels. The results of the experiments show that the transformer applied to this dataset is a valuable contribution by surpassing the baseline on multiple tasks. Additionally, accuracy values from these experiments are similar to values found in the corresponding image quality prediction task.

Identiferoai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/113536
Date26 January 2023
CreatorsMaxseiner, Sarah Boyes
ContributorsElectrical and Computer Engineering, Abbott, A. Lynn, Wang, Yue J., Jones, Creed F. III
PublisherVirginia Tech
Source SetsVirginia Tech Theses and Dissertation
LanguageEnglish
Detected LanguageEnglish
TypeThesis
FormatETD, application/pdf
RightsIn Copyright, http://rightsstatements.org/vocab/InC/1.0/

Page generated in 0.0023 seconds