Spelling suggestions: "subject:"person deidentification"" "subject:"person desidentification""
1 |
Resource-constrained re-identification in camera networksTahir, Syed Fahad January 2016 (has links)
In multi-camera surveillance, association of people detected in different camera views over time, known as person re-identification, is a fundamental task. Re-identification is a challenging problem because of changes in the appearance of people under varying camera conditions. Existing approaches focus on improving the re-identification accuracy, while no specific effort has yet been put into efficiently utilising the available resources that are normally limited in a camera network, such as storage, computation and communication capabilities. In this thesis, we aim to perform and improve the task of re-identification under constrained resources. More specifically, we reduce the data needed to represent the appearance of an object through a proposed feature selection method and a difference-vector representation method. The proposed feature-selection method considers the computational cost of feature extraction and the cost of storing the feature descriptor jointly with the feature's re-identification performance to select the most cost-effective and well-performing features. This selection allows us to improve inter-camera re-identification while reducing storage and computation requirements within each camera. The selected features are ranked in the order of effectiveness, which enable a further reduction by dropping the least effective features when application constraints require this conformity. We also reduce the communication overhead in the camera network by transferring only a difference vector, obtained from the extracted features of an object and the reference features within a camera, as an object representation for the association. In order to reduce the number of possible matches per association, we group the objects appearing within a defined time-interval in un-calibrated camera pairs. Such a grouping improves the re-identification, since only those objects that appear within the same time-interval in a camera pair are needed to be associated. For temporal alignment of cameras, we exploit differences between the frame numbers of the detected objects in a camera pair. Finally, in contrast to pairwise camera associations used in literature, we propose a many-to-one camera association method for re-identification, where multiple cameras can be candidates for having generated the previous detections of an object. We obtain camera-invariant matching scores from the scores obtained using the pairwise re-identification approaches. These scores measure the chances of a correct match between the objects detected in a group of cameras. Experimental results on publicly available and in-lab multi-camera image and video datasets show that the proposed methods successfully reduce storage, computation and communication requirements while improving the re-identification rate compared to existing re-identification approaches.
|
2 |
PERSON RE-IDENTIFICATION & VIDEO-BASED HEART RATE ESTIMATIONDahjung Chung (7030574) 13 August 2019 (has links)
<div>
<div>
<div>
<p>Estimation of physiological vital signs such as the Heart Rate (HR) has attracted a
lot of attention due to the increase interest in health monitoring. The most common
HR estimation methods such as Photoplethysmography(PPG) require the physical
contact with the subject and limit the movement of the subject. Video-based HR estimation, known as videoplethysmography (VHR), uses image/video processing techniques to estimate remotely the human HR. Even though various VHR methods have
been proposed over the past 5 years, there are still challenging problems such as diverse skin tone and motion artifacts. In this thesis we present a VHR method using
temporal difference filtering and small variation amplification based on the assumption that HR is the small color variations of skin, i.e. micro blushing. This method is
evaluated and compared with the two previous VHR methods. Additionally, we propose the use of spatial pruning for an alternative of skin detection and homomorphic
filtering for the motion artifact compensation.
</p><p><br></p>
<p>Intelligent video surveillance system is a crucial tool for public safety. One of
the goals is to extract meaningful information efficiently from the large volume of
surveillance videos. Person re-identification (ReID) is a fundamental task associated
with intelligent video surveillance system. For example, ReID can be used to identity
the person of interest to help law enforcement when they re-appear in the different
cameras at different time. ReID can be formally defined as establishing the correspondence between images of a person taken from different cameras. Even though
ReID has been intensively studied over the past years, it is still an active research
area due to various challenges such as illumination variations, occlusions, view point changes and the lack of data. In this thesis we propose a weighted two stream train-
ing objective function which combines the Siamese cost of the spatial and temporal
streams with the objective of predicting a person’s identity. Additionally, we present
a camera-aware image-to-image translation method using similarity preserving Star-
GAN (SP-StarGAN) as the data augmentation for ReID. We evaluate our proposed
methods on the publicly available datasets and demonstrate the efficacy of our methods.</p></div></div></div>
|
3 |
Minimising human annotation for scalable person re-identificationWang, Hanxiao January 2017 (has links)
Among the diverse tasks performed by an intelligent distributed multi-camera surveillance system, person re-identification (re-id) is one of the most essential. Re-id refers to associating an individual or a group of people across non-overlapping cameras at different times and locations, and forms the foundation of a variety of applications ranging from security and forensic search to quotidian retail and health care. Though attracted rapidly increasing academic interests over the past decade, it still remains a non-trivial and unsolved problem for launching a practical reid system in real-world environments, due to the ambiguous and noisy feature of surveillance data and the potentially dramatic visual appearance changes caused by uncontrolled variations in human poses and divergent viewing conditions across distributed camera views. To mitigate such visual ambiguity and appearance variations, most existing re-id approaches rely on constructing fully supervised machine learning models with extensively labelled training datasets which is unscalable for practical applications in the real-world. Particularly, human annotators must exhaustively search over a vast quantity of offline collected data, manually label cross-view matched images of a large population between every possible camera pair. Nonetheless, having the prohibitively expensive human efforts dissipated, a trained re-id model is often not easily generalisable and transferable, due to the elastic and dynamic operating conditions of a surveillance system. With such motivations, this thesis proposes several scalable re-id approaches with significantly reduced human supervision, readily applied to practical applications. More specifically, this thesis has developed and investigated four new approaches for reducing human labelling effort in real-world re-id as follows: Chapter 3 The first approach is affinity mining from unlabelled data. Different from most existing supervised approaches, this work aims to model the discriminative information for reid without exploiting human annotations, but from the vast amount of unlabelled person image data, thus applicable to both semi-supervised and unsupervised re-id. It is non-trivial since the human annotated identity matching correspondence is often the key to discriminative re-id modelling. In this chapter, an alternative strategy is explored by specifically mining two types of affinity relationships among unlabelled data: (1) inter-view data affinity and (2) intra-view data affinity. In particular, with such affinity information encoded as constraints, a Regularised Kernel Subspace Learning model is developed to explicitly reduce inter-view appearance variations and meanwhile enhance intra-view appearance disparity for more discriminative re-id matching. Consequently, annotation costs can be immensely alleviated and a scalable re-id model is readily to be leveraged to plenty of unlabelled data which is inexpensive to collect. Chapter 4 The second approach is saliency discovery from unlabelled data. This chapter continues to investigate the problem of what can be learned in unlabelled images without identity labels annotated by human. Other than affinity mining as proposed by Chapter 3, a different solution is proposed. That is, to discover localised visual appearance saliency of person appearances. Intuitively, salient and atypical appearances of human are able to uniquely and representatively describe and identify an individual, whilst also often robust to view changes and detection variances. Motivated by this, an unsupervised Generative Topic Saliency model is proposed to jointly perform foreground extraction, saliency detection, as well as discriminative re-id matching. This approach completely avoids the exhaustive annotation effort for model training, and thus better scales to real-world applications. Moreover, its automatically discovered re-id saliency representations are shown to be semantically interpretable, suitable for generating useful visual analysis for deployable user-oriented software tools. Chapter 5 The third approach is incremental learning from actively labelled data. Since learning from unlabelled data alone yields less discriminative matching results, and in some cases there will be limited human labelling resources available for re-id modelling, this chapter thus investigate the problem of how to maximise a model's discriminative capability with minimised labelling efforts. The challenges are to (1) automatically select the most representative data from a vast number of noisy/ambiguous unlabelled data in order to maximise model discrimination capacity; and (2) incrementally update the model parameters to accelerate machine responses and reduce human waiting time. To that end, this thesis proposes a regression based re-id model, characterised by its very fast and efficient incremental model updates. Furthermore, an effective active data sampling algorithm with three novel joint exploration-exploitation criteria is designed, to make automatic data selection feasible with notably reduced human labelling costs. Such an approach ensures annotations to be spent only on very few data samples which are most critical to model's generalisation capability, instead of being exhausted by blindly labelling many noisy and redundant training samples. Chapter 6 The last technical area of this thesis is human-in-the-loop learning from relevance feedback. Whilst former chapters mainly investigate techniques to reduce human supervision for model training, this chapter motivates a novel research area to further minimise human efforts spent in the re-id deployment stage. In real-world applications where camera network and potential gallery size increases dramatically, even the state-of-the-art re-id models generate much inferior re-id performances and human involvements at deployment stage is inevitable. To minimise such human efforts and maximise re-id performance, this thesis explores an alternative approach to re-id by formulating a hybrid human-computer learning paradigm with humans in the model matching loop. Specifically, a Human Verification Incremental Learning model is formulated which does not require any pre-labelled training data, therefore scalable to new camera pairs; Moreover, the proposed model learns cumulatively from human feedback to provide an instant improvement to re-id ranking of each probe on-the-fly, thus scalable to large gallery sizes. It has been demonstrated that the proposed re-id model achieves significantly superior re-id results whilst only consumes much less human supervision effort. For facilitating a holistic understanding about this thesis, the main studies are summarised and framed into a graphical abstract.
|
4 |
Reidentifikace objektů ve video streamu pomocí metod data analytics / Re-identification of Objects in Video Stream using Data AnalyticsSmrž, Dominik January 2021 (has links)
The wide usage of surveillance cameras provides data that can be used in various areas, such as security and urban planning. An important stepping stone for useful information extraction is matching the seen object across different points in time or different cameras. In this work, we focus specifically on this part of the video processing, usually referred to as re-identification. We split our work into two stages. In the first part, we focus on the spatial and temporal information regarding the detected objects. In the second part, we combine this metadata with the visual information. For the extraction of useful descriptors from the images, we use methods based on the color distribution as well as state-of-the-art deep neural networks. We also annotate a dataset to provide a comprehensive evaluation of our approaches. Additionally, we provide a custom tool we used to annotate the dataset. 1
|
5 |
Person re-identification in images with deep learning / Ré-identification de personnes dans des images par apprentissage automatiqueChen, Yiqiang 12 October 2018 (has links)
La vidéosurveillance est d’une grande valeur pour la sécurité publique. En tant que l’un des plus importantes applications de vidéosurveillance, la ré-identification de personnes est définie comme le problème de l’identification d’individus dans des images captées par différentes caméras de surveillance à champs non-recouvrants. Cependant, cette tâche est difficile à cause d’une série de défis liés à l’apparence de la personne, tels que les variations de poses, de point de vue et de l’éclairage etc. Pour régler ces différents problèmes, dans cette thèse, nous proposons plusieurs approches basées sur l’apprentissage profond de sorte d’améliorer de différentes manières la performance de ré-identification. Dans la première approche, nous utilisons les attributs des piétons tels que genre, accessoires et vêtements. Nous proposons un système basé sur un réseau de neurones à convolution(CNN) qui est composé de deux branches : une pour la classification d’identité et l’autre pour la reconnaissance d’attributs. Nous fusionnons ensuite ces deux branches pour la ré-identification. Deuxièmement, nous proposons un CNN prenant en compte différentes orientations du corps humain. Le système fait une estimation de l’orientation et, de plus, combine les caractéristiques de différentes orientations extraites pour être plus robuste au changement de point de vue. Comme troisième contribution de cette thèse, nous proposons une nouvelle fonction de coût basée sur une liste d’exemples. Elle introduit une pondération basée sur le désordre du classement et permet d’optimiser directement les mesures d’évaluation. Enfin, pour un groupe de personnes, nous proposons d’extraire une représentation de caractéristiques visuelles invariante à la position d’un individu dans une image de group. Cette prise en compte de contexte de groupe réduit ainsi l’ambigüité de ré-identification. Pour chacune de ces quatre contributions, nous avons effectué de nombreuses expériences sur les différentes bases de données publiques pour montrer l’efficacité des approches proposées. / Video surveillance systems are of a great value for public safety. As one of the most import surveillance applications, person re-identification is defined as the problem of identifying people across images that have been captured by different surveillance cameras without overlapping fields of view. With the increasing need for automated video analysis, this task is increasingly receiving attention. However, this problem is challenging due to the large variations of lighting, pose, viewpoint and background. To tackle these different difficulties, in this thesis, we propose several deep learning based approaches to obtain a better person re-identification performance in different ways. In the first proposed approach, we use pedestrian attributes to enhance the person re-identification. The attributes are defined as semantic mid-level descriptions of persons, such as gender, accessories, clothing etc. They could be helpful to extract characteristics that are invariant to the pose and viewpoint variations thanks to the descriptor being on a higher semantic level. In order to make use of the attributes, we propose a CNN-based person re-identification framework composed of an identity classification branch and of an attribute recognition branch. At a later stage, these two cues are combined to perform person re-identification. Secondly, among the challenges, one of the most difficult is the variation under different viewpoint. The same person shows very different appearances from different points of view. To deal with this issue, we consider that the images under various orientations are from different domains. We propose an orientation-specific CNN. This framework performs body orientation regression in a gating branch, and in another branch learns separate orientation-specific layers as local experts. The combined orientation-specific CNN feature representations are used for the person re-identification task. Thirdly, learning a similarity metric for person images is a crucial aspect of person re-identification. As the third contribution, we propose a novel listwise loss function taking into account the order in the ranking of gallery images with respect to different probe images. Further, an evaluation gain-based weighting is introduced in the loss function to optimize directly the evaluation measures of person re-identification. At the end, in a large gallery set, many people could have similar clothing. In this case, using only the appearance of single person leads to strong ambiguities. In realistic settings, people often walk in groups rather than alone. As the last contribution, we propose to learn a deep feature representation with displacement invariance for group context and introduce a method to combine the group context and single-person appearance. For all the four contributions of this thesis, we carry out extensive experiments on popular benchmarks and datasets to demonstrate the effectiveness of the proposed systems.
|
6 |
Protección de la Privacidad Visual basada en el Reconocimiento del ContextoPadilla López, José Ramón 16 October 2015 (has links)
En la actualidad, la cámara de vídeo se ha convertido en un dispositivo omnipresente. Debido a su miniaturización, estas se pueden encontrar integradas en multitud de dispositivos de uso diario, desde teléfonos móviles o tabletas, hasta ordenadores portátiles. Aunque estos dispositivos son empleados por millones de personas diariamente de forma inofensiva, capturando vídeo, realizando fotografías que luego son compartidas, etc.; el empleo de videocámaras para tareas de videovigilancia levanta cierta preocupación entre la población, sobre todo cuando estas forman parte de sistemas inteligentes de monitorización. Esto supone una amenaza para la privacidad debido a que las grabaciones realizadas por estos sistemas contienen una gran cantidad de información que puede ser extraída de forma automática mediante técnicas de visión artificial. Sin embargo, la aplicación de esta tecnología en diversas áreas puede suponer un impacto muy positivo para las personas. Por otro lado, la población mundial está envejeciendo rápidamente. Este cambio demográfico provocará que un mayor número de personas en situación de dependencia, o que requieren apoyo en su vida diaria, vivan solas. Por lo que se hace necesario encontrar una solución que permita extender su autonomía. La vida asistida por el entorno (AAL por sus siglas en inglés) ofrece una solución aportando inteligencia al entorno donde residen la personas de modo que este les asista en sus actividades diarias. Estos entornos requieren la instalación de sensores para la captura de datos. La utilización de videocámaras, con la riqueza en los datos que ofrecen, en entornos privados haría posible la creación de servicios AAL orientados hacia el cuidado de las personas como, por ejemplo, la detección de accidentes en el hogar, detección temprana de problemas cognitivos y muchos otros. Sin embargo, dada la sencilla interpretación de imágenes por las personas, esto plantea problemas éticos que afectan a la privacidad. En este trabajo se propone una solución para poder hacer uso de videocámaras en entornos privados con el objetivo de dar soporte a las personas y habilitar así el desarrollo de servicios de la vida asistida por el entorno en un hogar inteligente. En concreto, se propone la protección de la privacidad en aquellos servicios AAL de monitorización que requieren acceso al vídeo por parte de un cuidador, ya sea profesional o informal. Esto sucede, por ejemplo, cuando se detecta un accidente en un sistema de monitorización y ese evento requiere la confirmación visual de lo ocurrido. Igualmente, en servicios AAL de telerehabilitación puede ser requerida la supervisión por parte de un humano. En este tipo de escenarios es fundamental proteger la privacidad en el momento en que se esté accediendo u observando el vídeo. Como parte de este trabajo se ha llevado a cabo el estudio del estado de la cuestión en la cual se han revisado los métodos de protección de la privacidad visual presentes en la literatura. Esta revisión es la primera en realizar un análisis exhaustivo de este tema centrándose, principalmente, en los métodos de protección. Como resultado, se ha desarrollado un esquema de protección de la privacidad visual basado en el reconocimiento del contexto que permite adecuar el nivel de privacidad durante la observación cuando las preferencias del usuario coinciden con el contexto. La detección del contexto es necesaria para poder detectar en la escena las circunstancias en que el usuario demanda determinado nivel de privacidad. Mediante la utilización de este esquema, cada uno de los fotogramas que componen un flujo de vídeo en directo es modificado antes de su transmisión teniendo en cuenta los requisitos de privacidad del usuario. El esquema propuesto hace uso de diversas técnicas de modificación de imágenes para proteger la privacidad, así como de visión artificial para reconocer dicho contexto. Por tanto, en esta tesis doctoral se realizan diversas contribuciones en distintas áreas con el objetivo de llevar a cabo el desarrollo del esquema propuesto de protección de la privacidad visual. De este modo, se espera que los resultados obtenidos nos sitúen un paso más cerca de la utilización de videocámaras en entornos privados, incrementando su aceptación y haciendo posible la implantación de servicios AAL basados en visión artificial que permitan aumentar la autonomía de las personas en situación de dependencia.
|
7 |
OSPREY: Person Re-Identification in the sport of Padel : Utilizing One-Shot Person Re-identification with locally aware transformers to improve trackingSvensson, Måns, Hult, Jim January 2022 (has links)
This thesis is concerned with the topic of person re-identification. Many tracking algorithms today cannot keep track of players reentering the scene from different angles and times. Therefore, in this thesis, current literature is explored to gather information about the topic, and a current state-of-the-art model is tested. The person re-identification techniques will be applied to Padel games due to the collaboration with PadelPlay AB. The purpose of the thesis is to keep track of players during full matches of Padel with correct identities. To this, a current state-of-the-art model is applied to an existing tracking algorithm to enhance its capabilities. Furthermore, the purpose is broken down into two research questions. Firstly, how well does an existing person re-id model perform on Padel matches when it comes to keeping a consistent and accurate id on all players. Secondly, how can this model be improved upon to perform better in the new domain, being the sport of Padel? To be able to answer the research questions, a Padel dataset is created for benchmarking purposes. The state-of-the-art model is tested on the new dataset to see how it handles a new domain. Additionally, the same state-of-the-art model is retrained on the Padel dataset to answer the second research question. The results show that the state-of-the-art model that is previously trained on the Market-1501 dataset is highly generalizable on the Padel dataset and performs closely to the new model that is purely trained on the Padel dataset. Although they perform alike, the new model trained on the Padel dataset is slightly better as seen through both the quantitative and qualitative evaluations. Furthermore, the application of re-identification technology to keep track of players yielded significantly higher results than conventional solutions such as YOLOv5 with Deepsort.
|
8 |
Pedestrian Tracking by using Deep Neural Networks / Spårning av fotgängare med hjälp av Deep Neural NetworkPeng, Zeng January 2021 (has links)
This project aims at using deep learning to solve the pedestrian tracking problem for Autonomous driving usage. The research area is in the domain of computer vision and deep learning. Multi-Object Tracking (MOT) aims at tracking multiple targets simultaneously in a video data. The main application scenarios of MOT are security monitoring and autonomous driving. In these scenarios, we often need to track many targets at the same time which is not possible with only object detection or single object tracking algorithms for their lack of stability and usability. Therefore we need to explore the area of multiple object tracking. The proposed method breaks the MOT into different stages and utilizes the motion and appearance information of targets to track them in the video data. We used three different object detectors to detect the pedestrians in frames, a person re-identification model as appearance feature extractor and Kalman filter as motion predictor. Our proposed model achieves 47.6% MOT accuracy and 53.2% in IDF1 score while the results obtained by the model without person re-identification module is only 44.8% and 45.8% respectively. Our experiment results indicate the fact that a robust multiple object tracking algorithm can be achieved by splitted tasks and improved by the representative DNN based appearance features. / Detta projekt syftar till att använda djupinlärning för att lösa problemet med att följa fotgängare för autonom körning. For ligger inom datorseende och djupinlärning. Multi-Objekt-följning (MOT) syftar till att följa flera mål samtidigt i videodata. de viktigaste applikationsscenarierna för MOT är säkerhetsövervakning och autonom körning. I dessa scenarier behöver vi ofta följa många mål samtidigt, vilket inte är möjligt med endast objektdetektering eller algoritmer för enkel följning av objekt för deras bristande stabilitet och användbarhet, därför måste utforska området för multipel objektspårning. Vår metod bryter MOT i olika steg och använder rörelse- och utseendinformation för mål för att spåra dem i videodata, vi använde tre olika objektdetektorer för att upptäcka fotgängare i ramar en personidentifieringsmodell som utseendefunktionsavskiljare och Kalmanfilter som rörelsesprediktor. Vår föreslagna modell uppnår 47,6 % MOT-noggrannhet och 53,2 % i IDF1 medan resultaten som erhållits av modellen utan personåteridentifieringsmodul är endast 44,8%respektive 45,8 %. Våra experimentresultat visade att den robusta algoritmen för multipel objektspårning kan uppnås genom delade uppgifter och förbättras av de representativa DNN-baserade utseendefunktionerna.
|
9 |
Closed and Open World Multi-shot Person Re-identification / Ré-identification de personnes à partir de multiples images dans le cadre de bases d'identités fermées et ouvertesChan-Lang, Solène 06 December 2017 (has links)
Dans cette thèse, nous nous sommes intéressés au problème de la ré-identification de personnes dans le cadre de bases d'identités ouvertes. Ré-identifier une personne suppose qu'elle a déjà été identifiée auparavant. La galerie fait référence aux identités connues. Dans le cas de bases d'identités ouvertes, la galerie ne contient pas toutes les identités possibles. Ainsi une personne requête peut être une des personnes de la galerie, mais peut aussi ne pas être présente dans la galerie. Ré-identifier en base ouverte consiste donc non seulement à ranger par ordre de similarité les identités galeries les plus semblables à la personne requête mais également à rejeter les personnes requêtes si elles ne correspondent à aucune personne de la galerie. Une de nos contributions, COPReV, s'appuie exclusivement sur des contraintes de vérification afin d'apprendre une projection des descripteurs telle que la distance entre les descripteurs d'une même personne soit inférieure à un seuil et que la distance entre les descripteurs de deux personnes distinctes soit supérieure au même seuil. Nos autres contributions se basent sur des méthodes parcimonieuses collaboratives qui sont performantes pour résoudre des tâches de classement. Nous proposons d'améliorer ces méthodes en introduisant un aspect vérification grâce à une collaboration élargie. De plus, une variante bidirectionnelle de cette approche la rend encore plus robuste et donne des résultats meilleurs que les autres approches actuelles de l'état de l'art dans le cadre de la ré-identification de personne en base d'identités ouverte. / In this thesis we tackle the open world person re-identification task in which the people we want to re-identify (probe) might not appear in the database of known identities (gallery). For a given probe person, the goal is to find out whether he is present in the gallery or not and if so, who he is. Our first contribution is based on a verification formulation of the problem. A linear transformation of the features is learnt so that the distance between features of the same person are below a threshold and that of distinct people are above that same threshold so that it is easy to determine whether two sets of images represent the same person or not. Our other contributions are based on collaborative sparse representations. A usual way to use collaborative sparse representation for re-identification is to approximate the feature of a probe image by a sparse linear combination of gallery elements, where all the known identities collaborate but only the most similar elements are selected. Gallery identities are then ranked according to how much they contributed to the approximation. We propose to enhance the collaborative aspect so that collaborative sparse representations can be used not only as a ranking tool but also as a detection tool which rejects wrong matches. A bidirectional variant gives even more robust results by taking into account the fact that a good match is a match where there is a reciprocal relation in which both the probe and the gallery identities consider the other one as a good match. COPReV shows average performances but bidirectional collaboration enhanced sparse representation method outperforms state-of-the-art methods for open world scenarios.
|
10 |
Person Re-Identification in the wild : Evaluation and application for soccer games using Deep LearningKarapoulios, Vasileios January 2021 (has links)
Person Re-Identification (ReID) is the process of associating images of the same person taken from different angles, cameras and at different times. The task is very challenging as a slight change in the appearance of the person can cause troubles in identifying them. In this thesis, the Re-Identification task is applied in the context of soccer games. In soccer games, the players of the same team wear the same outfit and colors, thus the task of Re-Identification is very hard. To address this problem, a state-of-the-art deep neural network based model named AlignedReID and a variation of it called Vanilla model are explored and compared to a baseline approach based on Euclidean distance in the image space. The AlignedReID model uses two feature extractor branches, one global and one local feature extractor. The Vanilla approach is a variation of the AlignedReID which uses only the global feature extractor branch of the AlignedReID. They are trained using two different loss functions, the Batch Hard and its soft-margin variation. The triplet loss is used, where for each loss calculation a triplet of images is used, an anchor, a positive pair (coming from the same person) and a negative pair. By comparing the metrics used for their evaluation, that is rank-1, rank-5, mean Average Precision (mAP) and the Area Under Curve (AUC), and by statistically comparing their mAPs which is assumed to be the most important metric, the AlignedReID model using the Batch Hard loss function outperforms the rest of the models with a mAP of 81\% and rank-1 \& rank-5 above 98\%. Also, a qualitative evaluation of the best model is presented using Grad-CAM, in order to figure how the model decides which images are similar by investigating in which parts of the images it focuses on to produce their embedding representations. It is observed that the model focuses on some discriminative features, such as face, legs and hands other than clothing color and outfit. The empirical results suggest that the AlignedReid is usable in real world applications, however further research to get a better understanding of the generalization to different cameras, leagues and other factors that may affect appearance would be interesting.
|
Page generated in 0.1054 seconds