Global ETD Search

Return to search

Learning Embeddings for Fashion Images

Today the process of sorting second-hand clothes and textiles is mostly manual. In this master’s thesis, methods for automating this process as well as improving the manual sorting process have been investigated. The methods explored include the automatic prediction of price and intended usage for second-hand clothes, as well as different types of image retrieval to aid manual sorting. Two models were examined: CLIP, a multi-modal model, and MAE, a self-supervised model. Quantitatively, the results favored CLIP, which outperformed MAE in both image retrieval and prediction. However, MAE may still be useful for some applications in terms of image retrieval as it returns items that look similar, even if they do not necessarily have the same attributes. In contrast, CLIP is better at accurately retrieving garments with as many matching attributes as possible. For price prediction, the best model was CLIP. When fine-tuned on the dataset used, CLIP achieved an F1-Score of 38.08 using three different price categories in the dataset. For predicting the intended usage (either reusing the garment or exporting it to another country) the best model managed to achieve an F1-Score of 59.04.

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-194433

Masked Autoencoders (MAE)

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-194433
Date	January 2023
Creators	Hermansson, Simon
Publisher	Linköpings universitet, Datorseende
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0041 seconds

Learning Embeddings for Fashion Images

Description

Links & Downloads

Tags

Additional Fields