Global ETD Search

201	Analyzing different approaches to Visual SLAM in dynamic environments : A comparative study with focus on strengths and weaknesses / Analys av olika metoder för Visual SLAM i dynamisk miljö : En jämförande studie med fokus på styrkor och svagheter Ólafsdóttir, Kristín Sól January 2023 (has links) Simultaneous Localization and Mapping (SLAM) is the crucial ability for many autonomous systems to operate in unknown environments. In recent years SLAM development has focused on achieving robustness regarding the challenges the field still faces e.g. dynamic environments. During this thesis work different existing approaches to tackle dynamics with Visual SLAM systems were analyzed by surveying the recent literature within the field. The goal was to define the advantages and drawbacks of the approaches to provide further insight into the field of dynamic SLAM. Furthermore, two methods of different approaches were chosen for experiments and their implementation was documented. Key conclusions from the literature survey and experiments are the following. The exclusion of dynamic objects with regard to camera pose estimation presents promising results. Tracking of dynamic objects provides valuable information when combining SLAM with other tasks e.g. path planning. Moreover, dynamic reconstruction with SLAM offers better scene understanding and analysis of objects’ behavior within an environment. Many solutions rely on pre-processing and heavy hardware requirements due to the nature of the object detection methods. Methods of motion confirmation of objects lack consideration of camera movement, resulting in static objects being excluded from feature extraction. Considerations for future work within the field include accounting for camera movement for motion confirmation and producing available benchmarks that offer evaluation of the SLAM result as well as the dynamic object detection i.e. ground truth for both camera and objects within the scene. / Simultaneous Localization and Mapping (SLAM) är för många autonoma system avgörande för deras förmåga att kunna verka i tidigare outforskade miljöer. Under de senaste åren har SLAM-utvecklingen fokuserat på att uppnå robusthet när det gäller de utmaningar som fältet fortfarande står inför, t.ex. dynamiska miljöer. I detta examensarbete analyserades befintliga metoder för att hantera dynamik med visuella SLAM-system genom att kartlägga den senaste litteraturen inom området. Målet var att definiera för- och nackdelar hos de olika tillvägagångssätten för att bidra med insikter till området dynamisk SLAM. Dessutom valdes två metoder från olika tillvägagångssätt ut för experiment och deras implementering dokumenterades. De viktigaste slutsatserna från litteraturstudien och experimenten är följande. Uteslutningen av dynamiska objekt vid uppskattning av kamerans position ger lovande resultat. Spårning av dynamiska objekt ger värdefull information när SLAM kombineras med andra uppgifter, t.ex. path planning. Dessutom ger dynamisk rekonstruktion med SLAM bättre förståelse om omgivningen och analys av objekts beteende i den kringliggande miljön. Många lösningar är beroende av förbehandling samt ställer höga hårdvarumässiga krav till följd av objektdetekteringsmetodernas natur. Metoder för rörelsebekräftelse av objekt tar inte hänsyn till kamerarörelser, vilket leder till att statiska objekt utesluts från funktionsextraktion. Uppmaningar för framtida studier inom området inkluderar att ta hänsyn till kamerarörelser under rörelsebekräftelse samt att ta ändamålsenliga riktmärken för att möjliggöra tydligare utvärdering av SLAM-resultat såväl som för dynamisk objektdetektion, dvs. referensvärden för både kamerans position såväl som för objekt i scenen. Visual SLAM RGB-D Vision Dynamic Objects Object Detection Multi-Object Tracking Image Segmentation Optical Flow Visual SLAM RGB-D Syn Dynamiska objekt Objektdetektering Multi-Objekt Spårning Bildsegmentation Optiskt Flöde Robotics Robotteknik och automation Computer and Information Sciences Data- och informationsvetenskap
202	Privacy-preserving Building Occupancy Estimation via Low-Resolution Infrared Thermal Cameras Zhu, Shuai January 2021 (has links) Building occupancy estimation has become an important topic for sustainable buildings that has attracted more attention during the pandemics. Estimating building occupancy is a considerable problem in computer vision, while computer vision has achieved breakthroughs in recent years. But, machine learning algorithms for computer vision demand large datasets that may contain users’ private information to train reliable models. As privacy issues pose a severe challenge in the field of machine learning, this work aims to develop a privacypreserved machine learningbased method for people counting using a lowresolution thermal camera with 32 × 24 pixels. The method is applicable for counting people in different scenarios, concretely, counting people in spaces smaller than the field of view (FoV) of the camera, as well as large spaces over the FoV of the camera. In the first scenario, counting people in small spaces, we directly count people within the FoV of the camera by Multiple Object Detection (MOD) techniques. Our MOD method achieves up to 56.8% mean average precision (mAP). In the second scenario, we use Multiple Object Tracking (MOT) techniques to track people entering and exiting the space. We record the number of people who entered and exited, and then calculate the number of people based on the tracking results. The MOT method reaches 47.4% multiple object tracking accuracy (MOTA), 78.2% multiple object tracking precision (MOTP), and 59.6% identification F-Score (IDF1). Apart from the method, we create a novel thermal images dataset containing 1770 thermal images with proper annotation. / Uppskattning av hur många personer som vistas i en byggnad har blivit ett viktigt ämne för hållbara byggnader och har fått mer uppmärksamhet under pandemierna. Uppskattningen av byggnaders beläggning är ett stort problem inom datorseende, samtidigt som datorseende har fått ett genombrott under de senaste åren. Algoritmer för maskininlärning för datorseende kräver dock stora datamängder som kan innehålla användarnas privata information för att träna tillförlitliga modeller. Eftersom integritetsfrågor utgör en allvarlig utmaning inom maskininlärning syftar detta arbete till att utveckla en integritetsbevarande maskininlärningsbaserad metod för personräkning med hjälp av en värmekamera med låg upplösning med 32 x 24 pixlar. Metoden kan användas för att räkna människor i olika scenarier, dvs. att räkna människor i utrymmen som är mindre än kamerans FoV och i stora utrymmen som är större än kamerans FoV. I det första scenariot, att räkna människor i små utrymmen, räknar vi direkt människor inom kamerans FoV med MOD teknik. Vår MOD-metod uppnår upp till 56,8% av den totala procentuella fördelningen. I det andra scenariot använder vi MOT-teknik för att spåra personer som går in i och ut ur rummet. Vi registrerar antalet personer som går in och ut och beräknar sedan antalet personer utifrån spårningsresultaten. MOT-metoden ger 47,4% MOTA, 78,2% MOTP och 59,6% IDF1. Förutom metoden skapar vi ett nytt dataset för värmebilder som innehåller 1770 värmebilder med korrekt annotering. Building occupancy estimation People counting Privacy-preserving Low-resolution thermal camera Multiple Object Detection Multiple Object Tracking Uppskattning av bebyggelse personräkning integritetsbevarande värmekamera med låg upplösning detektering av flera objekt spårning av flera objekt Computer and Information Sciences Data- och informationsvetenskap
203	Tracking Under Countermeasures Using Infrared Imagery Modorato, Sara January 2022 (has links) Object tracking can be done in numerous ways, where the goal is to track a target through all frames in a sequence. The ground truth bounding box is used to initialize the object tracking algorithm. Object tracking can be carried out on infrared imagery suitable for military applications to execute tracking even without illumination. Objects, such as aircraft, can deploy countermeasures to impede tracking. The countermeasures most often mainly impact one wavelength band. Therefore, using two different wavelength bands for object tracking can counteract the impact of the countermeasures. The dataset was created from simulations. The countermeasures applied to the dataset are flares and Directional Infrared Countermeasures (DIRCMs). Different object tracking algorithms exist, and many are based on discriminative correlation filters (DCF). The thesis investigated the DCF-based trackers STRCF and ECO on the created dataset. The STRCF and the ECO trackers were analyzed using one and two wavelength bands. The following features were investigated for both trackers: grayscale, Histogram of Oriented Gradients (HOG), and pre-trained deep features. The results indicated that the STRCF and the ECO trackers using two wavelength bands instead of one improved performance on sequences with countermeasures. The use of HOG, deep features, or a combination of both improved the performance of the STRCF tracker using two wavelength bands. Likewise, the performance of the ECO tracker using two wavelength bands was improved by the use of deep features. However, the negative aspect of using two wavelength bands and introducing more features is that it resulted in a lower frame rate. computer vision infrared imagery tracking countermeasures discriminative correlation filters Efficient Convolution Operator tracker Visual Object Tracking flare DIRCM multiple wavelength bands features
204	Tracking with Joint-Embedding Predictive Architectures : Learning to track through representation learning / Spårning genom Prediktiva Arkitekturer med Gemensam Inbäddning : Att lära sig att spåra genom representations inlärning Maus, Rickard January 2024 (has links) Multi-object tracking is a classic engineering problem wherein a system must keep track of the identities of a set of a priori unknown objects through a sequence, for example video. Perfect execution of this task would mean no spurious or missed detections or identities, neither swapped identities. To measure performance of tracking systems, the Higher Order Tracking Accuracy metric is often used, which takes into account both detection and association accuracy. Prior work in monocular vision-based multi-object tracking has integrated deep learning to various degrees, with deep learning based detectors and visual feature extractors being commonplace alongside motion models of varying complexities. These methods have historically combined the usage of position and appearance in their association stage using hand-crafted heuristics, featuring increasingly complex algorithms to achieve higher performance tracking. With an interest in simplifying tracking algorithms, we turn to the field of representation learning. Presenting a novel method using a Joint-Embedding Predictive Architecture, trained through a contrastive objective, we learn object feature embeddings initialized by detections from a pre-trained detector. The results are features that fuse both positional and visual features. Comparing the performance of our method on the complex DanceTrack and relatively simpler MOT17 datasets to that of the most performant heuristic-based alternative, Deep OC-SORT, we see a significant improvement of 66.1 HOTA compared to the 61.3 HOTA of Deep OC-SORT on DanceTrack. On MOT17, which features less complex motion and less training data, heuristics-based methods outperform the proposed and prior learned tracking methods. While the method lags behind the state of the art in complex scenes, which follows the tracking-by-attention paradigm, it presents a novel approach and brings with it a new avenue of possible research. / Spårning av multipla objekt är ett typiskt ingenjörsproblem där ett system måste hålla reda på identiteterna hos en uppsättning på förhand okända objekt genom en sekvens, till exempel video. Att perfekt utföra denna uppgift skulle innebära inga felaktiga eller missade detektioner eller identiteter, inte heller utbytta identiteter. För att mäta prestanda hos spårningssystem används ofta metriken HOTA, som tar hänsyn till både detektions- och associationsnoggrannhet. Tidigare arbete inom monokulär vision-baserad flerobjektsspårning har integrerat djupinlärning i olika grad, med detektorer baserade på djupinlärning och visuella funktionsutdragare som är vanliga tillsammans med rörelsemodeller av varierande komplexitet. Dessa metoder har historiskt kombinerat användningen av position och utseende i deras associationsfas med hjälp av handgjorda heuristiker, med alltmer komplexa algoritmer för att uppnå högre prestanda i spårningen. Med ett intresse för att förenkla spårningsalgoritmer, vänder vi oss till fältet för representationsinlärning. Vi presenterar en ny metod som använder en prediktiv arkitektur med gemensam inbäddning, tränad genom ett kontrastivt mål, där vi lär oss objekt representationer initierade av detektioner från en förtränad detektor. Resultatet är en funktion som sammansmälter både position och visuel information. När vi jämför vår metod på det komplexa DanceTrack och det relativt enklare MOT17-datasetet med det mest presterande heuristikbaserade alternativet, Deep OC-SORT, ser vi en betydande förbättring på 66,1 HOTA jämfört med 61,3 HOTA för Deep OC-SORT på DanceTrack. På MOT17, som har mindre komplex rörelse och mindre träningsdata, presterar heuristikbaserade metoder bättre än den föreslagna och tidigare lärande spårningsmetoderna. Även om metoden ligger efter den senaste utvecklingen i komplexa scener, som följer paradigm för spårning-genom-uppmärksamhet, presenterar den ett nytt tillvägagångssätt och för med sig möjligheter för ny forskning. Contrastive learning Joint-Embedding predictive architectures Multi-object tracking Representation learning Kontrastiv inlärning Spårning av flera objekt Representationsinlärning Computer Sciences Datavetenskap (datalogi) Computer Engineering Datorteknik
205	Expressing emotions through vibration for perception and control / Expressing emotions through vibration ur Réhman, Shafiq January 2010 (has links) This thesis addresses a challenging problem: “how to let the visually impaired ‘see’ others emotions”. We, human beings, are heavily dependent on facial expressions to express ourselves. A smile shows that the person you are talking to is pleased, amused, relieved etc. People use emotional information from facial expressions to switch between conversation topics and to determine attitudes of individuals. Missing emotional information from facial expressions and head gestures makes the visually impaired extremely difficult to interact with others in social events. To enhance the visually impaired’s social interactive ability, in this thesis we have been working on the scientific topic of ‘expressing human emotions through vibrotactile patterns’. It is quite challenging to deliver human emotions through touch since our touch channel is very limited. We first investigated how to render emotions through a vibrator. We developed a real time “lipless” tracking system to extract dynamic emotions from the mouth and employed mobile phones as a platform for the visually impaired to perceive primary emotion types. Later on, we extended the system to render more general dynamic media signals: for example, render live football games through vibration in the mobile for improving mobile user communication and entertainment experience. To display more natural emotions (i.e. emotion type plus emotion intensity), we developed the technology to enable the visually impaired to directly interpret human emotions. This was achieved by use of machine vision techniques and vibrotactile display. The display is comprised of a ‘vibration actuators matrix’ mounted on the back of a chair and the actuators are sequentially activated to provide dynamic emotional information. The research focus has been on finding a global, analytical, and semantic representation for facial expressions to replace state of the art facial action coding systems (FACS) approach. We proposed to use the manifold of facial expressions to characterize dynamic emotions. The basic emotional expressions with increasing intensity become curves on the manifold extended from the center. The blends of emotions lie between those curves, which could be defined analytically by the positions of the main curves. The manifold is the “Braille Code” of emotions. The developed methodology and technology has been extended for building assistive wheelchair systems to aid a specific group of disabled people, cerebral palsy or stroke patients (i.e. lacking fine motor control skills), who don’t have ability to access and control the wheelchair with conventional means, such as joystick or chin stick. The solution is to extract the manifold of the head or the tongue gestures for controlling the wheelchair. The manifold is rendered by a 2D vibration array to provide user of the wheelchair with action information from gestures and system status information, which is very important in enhancing usability of such an assistive system. Current research work not only provides a foundation stone for vibrotactile rendering system based on object localization but also a concrete step to a new dimension of human-machine interaction. / Taktil Video Multimodal Signal Processing Mobile Communication Vibrotactile Rendering Locally Linear Embedding Object Detection Human Facial Expression Analysis Lip Tracking Object Tracking HCI Expectation-Maximization Algorithm Lipless Tracking Image Analysis Visually Impaired. Signal processing Signalbehandling Image analysis Bildanalys Computer science Datavetenskap Telecommunication Telekommunikation Systems engineering Systemteknik
206	Arcabouço para análise de eventos em vídeos. / Framework for analyzing events in videos. SILVA, Adson Diego Dionisio da. 07 May 2018 (has links) Submitted by Johnny Rodrigues (johnnyrodrigues@ufcg.edu.br) on 2018-05-07T15:29:04Z No. of bitstreams: 1 ADSON DIEGO DIONISIO DA SILVA - DISSERTAÇÃO PPGCC 2015..pdf: 2453030 bytes, checksum: 863c817f9714377b827d4d6fa0770c51 (MD5) / Made available in DSpace on 2018-05-07T15:29:04Z (GMT). No. of bitstreams: 1 ADSON DIEGO DIONISIO DA SILVA - DISSERTAÇÃO PPGCC 2015..pdf: 2453030 bytes, checksum: 863c817f9714377b827d4d6fa0770c51 (MD5) Previous issue date: 2015-08-31 / O reconhecimento automático de eventos de interesse em vídeos envolvendo conjuntos de ações ou de interações entre objetos. Pode agregar valor a sistemas de vigilância,aplicações de cidades inteligentes, monitoramento de pessoas com incapacidades físicas ou mentais, dentre outros. Entretanto, conceber um arcabouço que possa ser adaptado a diversas situações sem a necessidade de um especialista nas tecnologias envolvidas, continua sendo um desaﬁo para a área. Neste contexto, a pesquisa realizada tem como base a criação de um arcabouço genérico para detecção de eventos em vídeo com base em regras. Para criação das regras, os usuários formam expressões lógicas utilizando Lógica de Primeira Ordem e relacionamos termos com a álgebra de intervalos de Allen, adicionando assim um contexto temporal às regras. Por ser um arcabouço, ele é extensível, podendo receber módulos adicionais para realização de novas detecções e inferências Foi realizada uma avaliação experimental utilizando vídeos de teste disponíveis no site Youtube envolvendo um cenário de trânsito, com eventos de ultrapassagem do sinal vermelho e vídeos obtidos de uma câmera ao vivo do site Camerite, contendo eventos de carros estacionando. O foco do trabalho não foi criar detectores de objetos (e.g. carros ou pessoas) melhores do que aqueles existentes no estado da arte, mas propor e desenvolver uma estrutura genérica e reutilizável que integra diferentes técnicas de visão computacional. A acurácia na detecção dos eventos ﬁcou no intervalo de 83,82% a 90,08% com 95% de conﬁança. Obteve acurácia máxima (100%) na detecção dos eventos, quando substituído os detectores de objetos por rótulos atribuídos manualmente, o que indicou a eﬁcácia do motor de inferência desenvolvido para o arcabouço. / Automatic recognition of relevant events in videos involving sets of actions or interactions between objects can improve surveillance systems, smart cities applications, monitoring of people with physical or mental disabilities, among others. However, designing a framework that can be adapted to several situations without an expert in the involved technologies remains a challenge. In this context, this work is based on the creation of a rule-based generic framework for event detection in video. To create the rules, users form logical expressions using firstorder logic (FOL) and relate the terms with the Allen’s interval algebra, adding a temporal context to the rules. Once it is a framework, it is extensible, and may receive additional modules for performing new detections and inferences. Experimental evaluation was performed using test videos available on Youtube, involving a scenario of trafﬁc with red light crossing events and videos from Camerite website containing parking car events. The focus of the work was not to create object detectors (e.g. cars or people) better than those existing in the state-of-the-art, but, propose and develop a generic and reusable framework that integrates differents computer vision techniques. The accuracy in the detection of the events was within the range of 83.82% and 90.08% with 95% conﬁdence. Obtained maximum accuracy (100 %) in the detection of the events, when replacing the objects detectors by labels manually assigned, what indicated the effectiveness of the inference engine developed for this framework. Ciência da Computação. Reconhecimento automático de eventos Análise de eventos em vídeo Detecção de eventos em vídeo Detecção e rastreamento de objetos Firstorder Logic - FOL Automatic event recognition Video Event Analysis Object tracking and tracing Logic-based event detection Detection of learning-based events Traffic - video review Trânsito - análise de vídeos
207	Vizuální sledování objektu v reálném čase / Real-Time Object Tracking in Video Šimon, Martin January 2015 (has links) This thesis focuses on real-time visual object tracking with emphasis on problems caused by a long-term tracking task. Among theses problems belong primarily an occlusion problem, both the partial and the full one, and appearance changes of the object during the tracking. The work is also concerned with tracking objects of a very limited size and unsteady camera movements. These two particular problems are relatively common when tracking distant objects. A part of this work is also a summary of related work and a proposal of a system with high qualitative stability and robustness to problems mentioned. The proposed system was implemented and the evaluation demonstrated that it is capable of solving these problems partially.
208	Učení detektorů pomocí sledování objektů / Learning Detectors by Tracking Buchtela, Radim January 2013 (has links) This thesis is devoted to learn detectors by object tracking in video sequence. In this thesis, we discuss methods for object tracking, object detection and online learning and possibilities of their using in sophisticated techniques, which combine object tracking and online learning detectors.
209	Detekce a sledování objektů pomocí význačných bodů / Object Detection and Tracking Using Interest Points Bílý, Vojtěch January 2012 (has links) This paper deals with object detection and tracking using iterest points. Existing approaches are described here. Inovated method based on Generalized Hough transform and iterative Hough-space searching is proposed in this paper. Generality of proposed detector is shown in various types of objects. Object tracking is designed as frame by frame detection.
210	Sledování více osob ve videu z jedné kamery / Multi-Person Tracking in Video from Mono-Camera Vojvoda, Jakub January 2016 (has links) Multiple person detection and tracking is challenging problem with high application potential. The difficulty of the problem is caused mainly by complexity of scene and large variations in articulation and appearance of person. The aim of this work is to design and implement system capable of detecting and tracking people in video from static mono-camera. For this purpose, an online method for tracking has been proposed based on tracking-by-detection approach. The method combines detection, tracking and fusion of responses to achieve accurate results. The implementation was evaluated on available dataset and the results show that it is suitable to use for this task. A method for motion segmentation was proposed and implemented to improve the tracking results. Furthermore, implementation of detector based on histogram of oriented gradients was accelerated by taking advantage of graphics processing unit (GPU).

Search results