Return to search

Anomaly Detection with Machine Learning using CLIP in a Video Surveillance Context

This thesis explores the application of Contrastive Language-Image Pre-Training (CLIP), a vision-language model, in an automated video surveillance system for anomaly detection. The ability of CLIP to perform zero-shot learning, coupled with its robustness against minor image alterations due to its lack of reliance on pixel-level image analysis, makes it a suitable candidate for this application. The study investigates the performance of CLIP in tandem with various anomaly detection algorithms within a visual surveillance system. A custom dataset was created for video anomaly detection, encompassing two distinct views and two varying levels of anomaly difficulty. One view offers a more zoomed-in perspective, while the other provides a wider perspective. This was conducted to evaluate the capacity of CLIP to manage objects that occupy either a larger or smaller portion of the entire scene. Several different anomaly detection methods were tested with varying levels of supervision, including unsupervised, one-class classification, and weakly- supervised algorithms, which were compared against each other. To create better separation between the CLIP embeddings, a metric learning model was trained and then used to transform the CLIP embeddings to a new embedding space. The study found that CLIP performs effectively when anomalies take up a larger part of the image, such as in the zoomed-in view where some of the One- Class-Classification (OCC) and weakly supervised methods demonstrated superior performance. When anomalies take up a significantly smaller part of the image in the wider view, CLIP has difficulty distinguishing anomalies from normal scenes even using the transformed CLIP embeddings. For the wider view the results showed on better performance for the OCC and weakly supervised methods.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-194771
Date January 2023
CreatorsGärdin, Christoffer
PublisherLinköpings universitet, Datorseende
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess
RelationLiTH-ISY-Ex ; 23/5564--SE

Page generated in 0.0021 seconds