Global ETD Search

11	Multi-object detection and tracking in video sequences / Détection et suivi multi-objets dans des séquences vidéo Mhalla, Ala 04 April 2018 (has links) Le travail développé dans cette thèse porte sur l'analyse de séquences vidéo. Cette dernière est basée sur 3 taches principales : la détection, la catégorisation et le suivi des objets. Le développement de solutions fiables pour l'analyse de séquences vidéo ouvre de nouveaux horizons pour plusieurs applications telles que les systèmes de transport intelligents, la vidéosurveillance et la robotique. Dans cette thèse, nous avons mis en avant plusieurs contributions pour traiter les problèmes de détection et de suivi d'objets multiples sur des séquences vidéo. Les techniques proposées sont basées sur l’apprentissage profonds et des approches de transfert d'apprentissage. Dans une première contribution, nous abordons le problème de la détection multi-objets en proposant une nouvelle technique de transfert d’apprentissage basé sur le formalisme et la théorie du filtre SMC (Sequential Monte Carlo) afin de spécialiser automatiquement un détecteur de réseau de neurones convolutionnel profond (DCNN) vers une scène cible. Dans une deuxième contribution, nous proposons une nouvelle approche de suivi multi-objets original basé sur des stratégies spatio-temporelles (entrelacement / entrelacement inverse) et un détecteur profond entrelacé, qui améliore les performances des algorithmes de suivi par détection et permet de suivre des objets dans des environnements complexes (occlusion, intersection, fort mouvement). Dans une troisième contribution, nous fournissons un système de surveillance du trafic, qui intègre une extension du technique SMC afin d’améliorer la précision de la détection de jour et de nuit et de spécialiser tout détecteur DCNN pour les caméras fixes et mobiles. Tout au long de ce rapport, nous fournissons des résultats quantitatifs et qualitatifs. Sur plusieurs aspects liés à l’analyse de séquences vidéo, ces travaux surpassent les cadres de détection et de suivi de pointe. En outre, nous avons implémenté avec succès nos infrastructures dans une plate-forme matérielle intégrée pour la surveillance et la sécurité du trafic routier. / The work developed in this PhD thesis is focused on video sequence analysis. Thelatter consists of object detection, categorization and tracking. The development ofreliable solutions for the analysis of video sequences opens new horizons for severalapplications such as intelligent transport systems, video surveillance and robotics.In this thesis, we put forward several contributions to deal with the problems ofdetecting and tracking multi-objects on video sequences. The proposed frameworksare based on deep learning networks and transfer learning approaches.In a first contribution, we tackle the problem of multi-object detection by puttingforward a new transfer learning framework based on the formalism and the theoryof a Sequential Monte Carlo (SMC) filter to automatically specialize a Deep ConvolutionalNeural Network (DCNN) detector towards a target scene. The suggestedspecialization framework is used in order to transfer the knowledge from the sourceand the target domain to the target scene and to estimate the unknown target distributionas a specialized dataset composed of samples from the target domain. Thesesamples are selected according to the importance of their weights which reflectsthe likelihood that they belong to the target distribution. The obtained specializeddataset allows training a specialized DCNN detector to a target scene withouthuman intervention.In a second contribution, we propose an original multi-object tracking frameworkbased on spatio-temporal strategies (interlacing/inverse interlacing) and aninterlaced deep detector, which improves the performances of tracking-by-detectionalgorithms and helps to track objects in complex videos (occlusion, intersection,strong motion).In a third contribution, we provide an embedded system for traffic surveillance,which integrates an extension of the SMC framework so as to improve the detectionaccuracy in both day and night conditions and to specialize any DCNN detector forboth mobile and stationary cameras.Throughout this report, we provide both quantitative and qualitative results.On several aspects related to video sequence analysis, this work outperformsthe state-of-the-art detection and tracking frameworks. In addition, we havesuccessfully implemented our frameworks in an embedded hardware platform forroad traffic safety and monitoring. Intelligence artificielle Vision par ordinateur Transfert d'apprentissage Apprentissage profond Détection multi-objets Spécialisation Suivi par détection Suivi multi-objets Système embarqué Artificial intelligence Computer vision Transfer learning Deep learning Multiobject detection Specialization Tracking-by-detection Multi-object tracking Embedded system
12	Analyzing different approaches to Visual SLAM in dynamic environments : A comparative study with focus on strengths and weaknesses / Analys av olika metoder för Visual SLAM i dynamisk miljö : En jämförande studie med fokus på styrkor och svagheter Ólafsdóttir, Kristín Sól January 2023 (has links) Simultaneous Localization and Mapping (SLAM) is the crucial ability for many autonomous systems to operate in unknown environments. In recent years SLAM development has focused on achieving robustness regarding the challenges the field still faces e.g. dynamic environments. During this thesis work different existing approaches to tackle dynamics with Visual SLAM systems were analyzed by surveying the recent literature within the field. The goal was to define the advantages and drawbacks of the approaches to provide further insight into the field of dynamic SLAM. Furthermore, two methods of different approaches were chosen for experiments and their implementation was documented. Key conclusions from the literature survey and experiments are the following. The exclusion of dynamic objects with regard to camera pose estimation presents promising results. Tracking of dynamic objects provides valuable information when combining SLAM with other tasks e.g. path planning. Moreover, dynamic reconstruction with SLAM offers better scene understanding and analysis of objects’ behavior within an environment. Many solutions rely on pre-processing and heavy hardware requirements due to the nature of the object detection methods. Methods of motion confirmation of objects lack consideration of camera movement, resulting in static objects being excluded from feature extraction. Considerations for future work within the field include accounting for camera movement for motion confirmation and producing available benchmarks that offer evaluation of the SLAM result as well as the dynamic object detection i.e. ground truth for both camera and objects within the scene. / Simultaneous Localization and Mapping (SLAM) är för många autonoma system avgörande för deras förmåga att kunna verka i tidigare outforskade miljöer. Under de senaste åren har SLAM-utvecklingen fokuserat på att uppnå robusthet när det gäller de utmaningar som fältet fortfarande står inför, t.ex. dynamiska miljöer. I detta examensarbete analyserades befintliga metoder för att hantera dynamik med visuella SLAM-system genom att kartlägga den senaste litteraturen inom området. Målet var att definiera för- och nackdelar hos de olika tillvägagångssätten för att bidra med insikter till området dynamisk SLAM. Dessutom valdes två metoder från olika tillvägagångssätt ut för experiment och deras implementering dokumenterades. De viktigaste slutsatserna från litteraturstudien och experimenten är följande. Uteslutningen av dynamiska objekt vid uppskattning av kamerans position ger lovande resultat. Spårning av dynamiska objekt ger värdefull information när SLAM kombineras med andra uppgifter, t.ex. path planning. Dessutom ger dynamisk rekonstruktion med SLAM bättre förståelse om omgivningen och analys av objekts beteende i den kringliggande miljön. Många lösningar är beroende av förbehandling samt ställer höga hårdvarumässiga krav till följd av objektdetekteringsmetodernas natur. Metoder för rörelsebekräftelse av objekt tar inte hänsyn till kamerarörelser, vilket leder till att statiska objekt utesluts från funktionsextraktion. Uppmaningar för framtida studier inom området inkluderar att ta hänsyn till kamerarörelser under rörelsebekräftelse samt att ta ändamålsenliga riktmärken för att möjliggöra tydligare utvärdering av SLAM-resultat såväl som för dynamisk objektdetektion, dvs. referensvärden för både kamerans position såväl som för objekt i scenen. Visual SLAM RGB-D Vision Dynamic Objects Object Detection Multi-Object Tracking Image Segmentation Optical Flow Visual SLAM RGB-D Syn Dynamiska objekt Objektdetektering Multi-Objekt Spårning Bildsegmentation Optiskt Flöde Robotics Robotteknik och automation Computer and Information Sciences Data- och informationsvetenskap
13	Tracking with Joint-Embedding Predictive Architectures : Learning to track through representation learning / Spårning genom Prediktiva Arkitekturer med Gemensam Inbäddning : Att lära sig att spåra genom representations inlärning Maus, Rickard January 2024 (has links) Multi-object tracking is a classic engineering problem wherein a system must keep track of the identities of a set of a priori unknown objects through a sequence, for example video. Perfect execution of this task would mean no spurious or missed detections or identities, neither swapped identities. To measure performance of tracking systems, the Higher Order Tracking Accuracy metric is often used, which takes into account both detection and association accuracy. Prior work in monocular vision-based multi-object tracking has integrated deep learning to various degrees, with deep learning based detectors and visual feature extractors being commonplace alongside motion models of varying complexities. These methods have historically combined the usage of position and appearance in their association stage using hand-crafted heuristics, featuring increasingly complex algorithms to achieve higher performance tracking. With an interest in simplifying tracking algorithms, we turn to the field of representation learning. Presenting a novel method using a Joint-Embedding Predictive Architecture, trained through a contrastive objective, we learn object feature embeddings initialized by detections from a pre-trained detector. The results are features that fuse both positional and visual features. Comparing the performance of our method on the complex DanceTrack and relatively simpler MOT17 datasets to that of the most performant heuristic-based alternative, Deep OC-SORT, we see a significant improvement of 66.1 HOTA compared to the 61.3 HOTA of Deep OC-SORT on DanceTrack. On MOT17, which features less complex motion and less training data, heuristics-based methods outperform the proposed and prior learned tracking methods. While the method lags behind the state of the art in complex scenes, which follows the tracking-by-attention paradigm, it presents a novel approach and brings with it a new avenue of possible research. / Spårning av multipla objekt är ett typiskt ingenjörsproblem där ett system måste hålla reda på identiteterna hos en uppsättning på förhand okända objekt genom en sekvens, till exempel video. Att perfekt utföra denna uppgift skulle innebära inga felaktiga eller missade detektioner eller identiteter, inte heller utbytta identiteter. För att mäta prestanda hos spårningssystem används ofta metriken HOTA, som tar hänsyn till både detektions- och associationsnoggrannhet. Tidigare arbete inom monokulär vision-baserad flerobjektsspårning har integrerat djupinlärning i olika grad, med detektorer baserade på djupinlärning och visuella funktionsutdragare som är vanliga tillsammans med rörelsemodeller av varierande komplexitet. Dessa metoder har historiskt kombinerat användningen av position och utseende i deras associationsfas med hjälp av handgjorda heuristiker, med alltmer komplexa algoritmer för att uppnå högre prestanda i spårningen. Med ett intresse för att förenkla spårningsalgoritmer, vänder vi oss till fältet för representationsinlärning. Vi presenterar en ny metod som använder en prediktiv arkitektur med gemensam inbäddning, tränad genom ett kontrastivt mål, där vi lär oss objekt representationer initierade av detektioner från en förtränad detektor. Resultatet är en funktion som sammansmälter både position och visuel information. När vi jämför vår metod på det komplexa DanceTrack och det relativt enklare MOT17-datasetet med det mest presterande heuristikbaserade alternativet, Deep OC-SORT, ser vi en betydande förbättring på 66,1 HOTA jämfört med 61,3 HOTA för Deep OC-SORT på DanceTrack. På MOT17, som har mindre komplex rörelse och mindre träningsdata, presterar heuristikbaserade metoder bättre än den föreslagna och tidigare lärande spårningsmetoderna. Även om metoden ligger efter den senaste utvecklingen i komplexa scener, som följer paradigm för spårning-genom-uppmärksamhet, presenterar den ett nytt tillvägagångssätt och för med sig möjligheter för ny forskning. Contrastive learning Joint-Embedding predictive architectures Multi-object tracking Representation learning Kontrastiv inlärning Spårning av flera objekt Representationsinlärning Computer Sciences Datavetenskap (datalogi) Computer Engineering Datorteknik
14	Evaluation of Target Tracking Using Multiple Sensors and Non-Causal Algorithms Vestin, Albin, Strandberg, Gustav January 2019 (has links) Today, the main research field for the automotive industry is to find solutions for active safety. In order to perceive the surrounding environment, tracking nearby traffic objects plays an important role. Validation of the tracking performance is often done in staged traffic scenarios, where additional sensors, mounted on the vehicles, are used to obtain their true positions and velocities. The difficulty of evaluating the tracking performance complicates its development. An alternative approach studied in this thesis, is to record sequences and use non-causal algorithms, such as smoothing, instead of filtering to estimate the true target states. With this method, validation data for online, causal, target tracking algorithms can be obtained for all traffic scenarios without the need of extra sensors. We investigate how non-causal algorithms affects the target tracking performance using multiple sensors and dynamic models of different complexity. This is done to evaluate real-time methods against estimates obtained from non-causal filtering. Two different measurement units, a monocular camera and a LIDAR sensor, and two dynamic models are evaluated and compared using both causal and non-causal methods. The system is tested in two single object scenarios where ground truth is available and in three multi object scenarios without ground truth. Results from the two single object scenarios shows that tracking using only a monocular camera performs poorly since it is unable to measure the distance to objects. Here, a complementary LIDAR sensor improves the tracking performance significantly. The dynamic models are shown to have a small impact on the tracking performance, while the non-causal application gives a distinct improvement when tracking objects at large distances. Since the sequence can be reversed, the non-causal estimates are propagated from more certain states when the target is closer to the ego vehicle. For multiple object tracking, we find that correct associations between measurements and tracks are crucial for improving the tracking performance with non-causal algorithms. evaluation target tracking multiple sensors non-causal smoother smoothing tracking vehicle tracking camera lidar estimate estimation prediction vehicle dynamics sensor fusion real-time tracking extended kalman filter filter validation validation position estimation velocity estimation dynamic model model complexity multi object tracking multiple object tracking single object tracking data association tracking fundamentals iterated kalman filter track management gnn global nearest neighbour mahalanobis mahalanobis distance performance evaluation differential gps dgps roi ego several sensors sensors rmse root mean square error invertible motion anti-causal motion anti-causal tracking constant velocity gnn imu tfs two filter smoother ekf rts radar inertial measurement unit nonlinear nonlinear systems mono camera monocular camera noise model tracking performance fixed interval smoothing m/n logic centralized fusion non-causal object tracker car tracking car dynamics automotive active safety object tracking automotive industry thesis master reverse dynamics reverse tracking reverse sequence sequence tracking data propagation ground truth estimating ground truth additional sensors mounted sensors true estimates environment comparison algorithm independent targets overlapping measurements occluded track switch improve lower uncertainty more certain state process noise covariance sampling image sprt adas cnn cv pdf track target ego tracker tentative track observatiom online tracking offline tracking online offline recorded sequences robust self driving self-driving car traffic trajectory true state scenario scenarios future accurate output advanced driver assistance systems non-linear complex noise pedestrian truck bus maneuvering vehicles processed measurement frame state correction probability density function tuning likelihood transition measurement motion model recursion gaussian approximation distribution linear jacobian multiplicative noise ratio ad hoc ad hoc state space approach backward auction euclidean distance statistical threshold gating association margin normalize covariance matrix fusion confirmed rejected tentative history absolute error modular ego motion parameters variables logg hardware specification fused causal factorization independent uncorrelated transform moving rotation translation oncoming overtaking Control Engineering Reglerteknik

Page generated in 0.0619 seconds