Return to search

Learning Embeddings for Fashion Images

Today the process of sorting second-hand clothes and textiles is mostly manual. In this master’s thesis, methods for automating this process as well as improving the manual sorting process have been investigated. The methods explored include the automatic prediction of price and intended usage for second-hand clothes, as well as different types of image retrieval to aid manual sorting. Two models were examined: CLIP, a multi-modal model, and MAE, a self-supervised model. Quantitatively, the results favored CLIP, which outperformed MAE in both image retrieval and prediction. However, MAE may still be useful for some applications in terms of image retrieval as it returns items that look similar, even if they do not necessarily have the same attributes. In contrast, CLIP is better at accurately retrieving garments with as many matching attributes as possible. For price prediction, the best model was CLIP. When fine-tuned on the dataset used, CLIP achieved an F1-Score of 38.08 using three different price categories in the dataset. For predicting the intended usage (either reusing the garment or exporting it to another country) the best model managed to achieve an F1-Score of 59.04.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-194433
Date January 2023
CreatorsHermansson, Simon
PublisherLinköpings universitet, Datorseende
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0041 seconds