Global ETD Search

181	Сравнительный анализ МЛ-систем извлечения ключевых точек для видеозаписей жестового языка : магистерская диссертация / Comparative analysis of ML-based keypoint extraction systems for sign language videos Саенко, Л. Г., Saenko, L. G. January 2024 (has links) The object of the research is ML-systems for key point extraction for video recordings. The aim of the research is to analyze ML-systems and find the best model for extracting key points from video recordings. The aim of the paper is to analyze the existing ML-systems for extracting key gesture language from video recordings. The research methods are based on data analysis, theory of extracting key points from image, conducting experiments, measuring and comparing the obtained values to evaluate the models. The scientific novelty of the study lies in solving the actual problem of evaluating ML-based keypoint extraction systems for the task of sign language, using modern technologies. The result of the work is a comparative analysis of ML-systems of keypoint extraction from video recordings of sign language, which allowed to identify the best models in terms of metrics and efficiency, and a gloss gluing algorithm is developed, which allows to combine them into one single gesture. / Объект исследования — МЛ-системы извлечения ключевых точек для видеозаписей. Цель исследования – проанализировать мл-системы и найти наилучшую модель для извлечения ключевых точек из видеозаписей. Целью работы – анализ существующих МЛ-систем для извлечение ключевых жестового языка из видеозаписей. Методы исследования основываются на анализе данных, теории извлечения ключевых точек из изображения, проведение экспериментов, измерении и сравнении полученных значений для оценки моделей. Научная новизна исследования заключается в решение актуальной задачи оценки МЛ-систем извлечения ключевых точек для задачи жестового языка, с применением современных технологий. Результатом работы является сравнительный анализ МЛ-систем извлечения ключевых точек из видеозаписей жестового языка, который позволил определить лучшие по метрикам и по эффективности модели, а также разработан алгоритм склейки глоссов, который позволяет объединить их в один единый жест. MASTER'S THESIS COMPARATIVE ANALYSIS ML-SYSTEM KEYPOINT EXTRACTION GESTURE LANGUAGE GLOSS HUMAN POSE ESTIMATION СРАВНИТЕЛЬНЫЙ АНАЛИЗ МЛ-СИСТЕМА ЖЕСТОВЫЙ ЯЗЫК ГЛОСС ОЦЕНКА ПОЗ ЧЕЛОВЕКА
182	Stereo Camera Calibration Accuracy in Real-time Car Angles Estimation for Vision Driver Assistance and Autonomous Driving Algers, Björn January 2018 (has links) The automotive safety company Veoneer are producers of high end driver visual assistance systems, but the knowledge about the absolute accuracy of their dynamic calibration algorithms that estimate the vehicle’s orientation is limited. In this thesis, a novel measurement system is proposed to be used in gathering reference data of a vehicle’s orientation as it is in motion, more specifically the pitch and roll angle of the vehicle. Focus has been to estimate how the uncertainty of the measurement system is affected by errors introduced during its construction, and to evaluate its potential in being a viable tool in gathering reference data for algorithm performance evaluation. The system consisted of three laser distance sensors mounted on the body of the vehicle, and a range of data acquisition sequences with different perturbations were performed by driving along a stretch of road in Linköping with weights loaded in the vehicle. The reference data were compared to camera system data where the bias of the calculated angles were estimated, along with the dynamic behaviour of the camera system algorithms. The experimental results showed that the accuracy of the system exceeded 0.1 degrees for both pitch and roll, but no conclusions about the bias of the algorithms could be drawn as there were systematic errors present in the measurements. / Bilsäkerhetsföretaget Veoneer är utvecklare av avancerade kamerasystem inom förarassistans, men kunskapen om den absoluta noggrannheten i deras dynamiska kalibreringsalgoritmer som skattar fordonets orientering är begränsad. I denna avhandling utvecklas och testas ett nytt mätsystem för att samla in referensdata av ett fordons orientering när det är i rörelse, mer specifikt dess pitchvinkel och rollvinkel. Fokus har legat på att skatta hur osäkerheten i mätsystemet påverkas av fel som introducerats vid dess konstruktion, samt att utreda dess potential när det kommer till att vara ett gångbart alternativ för att samla in referensdata för evaluering av prestandan hos algoritmerna. Systemet bestod av tre laseravståndssensorer monterade på fordonets kaross. En rad mätförsök utfördes med olika störningar introducerade genom att köra längs en vägsträcka i Linköping med vikter lastade i fordonet. Det insamlade referensdatat jämfördes med data från kamerasystemet där bias hos de framräknade vinklarna skattades, samt att de dynamiska egenskaperna kamerasystemets algoritmer utvärderades. Resultaten från mätförsöken visade på att noggrannheten i mätsystemet översteg 0.1 grader för både pitchvinklarna och rollvinklarna, men några slutsatser kring eventuell bias hos algoritmerna kunde ej dras då systematiska fel uppstått i mätresultaten. Stereo camera camera calibration pnp-problem laser metrology Monte Carlo simulations uncertainty estimation camera pose estimation measurement system von-mises simulations circular statistics Vehicle Engineering Farkostteknik Signal Processing Signalbehandling Annan elektroteknik och elektronik Other Physics Topics Annan fysik
183	Presence through actions : theories, concepts, and implementations Khan, Muhammad Sikandar Lal January 2017 (has links) During face-to-face meetings, humans use multimodal information, including verbal information, visual information, body language, facial expressions, and other non-verbal gestures. In contrast, during computer-mediated-communication (CMC), humans rely either on mono-modal information such as text-only, voice-only, or video-only or on bi-modal information by using audiovisual modalities such as video teleconferencing. Psychologically, the difference between the two lies in the level of the subjective experience of presence, where people perceive a reduced feeling of presence in the case of CMC. Despite the current advancements in CMC, it is still far from face-to-face communication, especially in terms of the experience of presence. This thesis aims to introduce new concepts, theories, and technologies for presence design where the core is actions for creating presence. Thus, the contribution of the thesis can be divided into a technical contribution and a knowledge contribution. Technically, this thesis details novel technologies for improving presence experience during mediated communication (video teleconferencing). The proposed technologies include action robots (including a telepresence mechatronic robot (TEBoT) and a face robot), embodied control techniques (head orientation modeling and virtual reality headset based collaboration), and face reconstruction/retrieval algorithms. The introduced technologies enable action possibilities and embodied interactions that improve the presence experience between the distantly located participants. The novel setups were put into real experimental scenarios, and the well-known social, spatial, and gaze related problems were analyzed. The developed technologies and the results of the experiments led to the knowledge contribution of this thesis. In terms of knowledge contribution, this thesis presents a more general theoretical conceptual framework for mediated communication technologies. This conceptual framework can guide telepresence researchers toward the development of appropriate technologies for mediated communication applications. Furthermore, this thesis also presents a novel strong concept – presence through actions - that brings in philosophical understandings for developing presence- related technologies. The strong concept - presence through actions is an intermediate-level knowledge that proposes a new way of creating and developing future 'presence artifacts'. Presence- through actions is an action-oriented phenomenological approach to presence that differs from traditional immersive presence approaches that are based (implicitly) on rationalist, internalist views. Presence Immersion Computer mediated communication Strong concept Phenomenology Philosophy Biologically inspired system Neck robot Head pose estimation Embodied interaction Virtual reality headset Social presence Spatial presence Face reconstruction/retrieval Telepresence system Quality of interaction Embodied telepresence system Mona-Lisa gaze effect eye-contact Elektroteknik och elektronik Computer Systems Datorsystem Communication Systems Kommunikationssystem Signal Processing Signalbehandling
184	Detekce a sledování polohy hlavy v obraze / Head Pose Estimation and Tracking Pospíšil, Aleš January 2011 (has links) Diplomová práce je zaměřena na problematiku detekce a sledování polohy hlavy v obraze jako jednu s možností jak zlepšit možnosti interakce mezi počítačem a člověkem. Hlavním přínosem diplomové práce je využití inovativních hardwarových a softwarových technologií jakými jsou Microsoft Kinect, Point Cloud Library a CImg Library. Na úvod je představeno shrnutí předchozích prací na podobné téma. Následuje charakteristika a popis databáze, která byla vytvořena pro účely diplomové práce. Vyvinutý systém pro detekci a sledování polohy hlavy je založený na akvizici 3D obrazových dat a registračním algoritmu Iterative Closest Point. V závěru diplomové práce je nabídnuto hodnocení vzniklého systému a jsou navrženy možnosti jeho budoucího zlepšení.
185	3D Rekonstrukce historických míst z obrázků na Flickru / 3D Reconstruction of Historic Landmarks from Flickr Pictures Šimetka, Vojtěch January 2015 (has links) Tato práce popisuje problematiku návrhu a vývoje aplikace pro rekonstrukci 3D modelů z 2D obrazových dat, označované jako bundle adjustment. Práce analyzuje proces 3D rekonstrukce a důkladně popisuje jednotlivé kroky. Prvním z kroků je automatizované získání obrazové sady z internetu. Je představena sada skriptů pro hromadné stahování obrázků ze služeb Flickr a Google Images a shrnuty požadavky na tyto obrázky pro co nejlepší 3D rekonstrukci. Práce dále popisuje různé detektory, extraktory a párovací algoritmy klíčových bodů v obraze s cílem najít nejvhodnější kombinaci pro rekonstrukci budov. Poté je vysvětlen proces rekonstrukce 3D struktury, její optimalizace a jak je tato problematika realizovaná v našem programu. Závěr práce testuje výsledky získané z implementovaného programu pro několik různých datových sad a porovnává je s výsledky ostatních podobných programů, představených v úvodu práce.
186	Extraction d’une image dans une vidéo en vue de la reconnaissance du visage / Extraction of an image in order to apply face recognition methods Pyun, Nam Jun 09 November 2015 (has links) Une vidéo est une source particulièrement riche en informations. Parmi tous les objets que nous pouvons y trouver, les visages humains sont assurément les plus saillants, ceux qui attirent le plus l’attention des spectateurs. Considérons une séquence vidéo dont chaque trame contient un ou plusieurs visages en mouvement. Ils peuvent appartenir à des personnes connues ou qui apparaissent de manière récurrente dans la vidéo Cette thèse a pour but de créer une méthodologie afin d’extraire une ou plusieurs images de visage en vue d’appliquer, par la suite, un algorithme de reconnaissance du visage. La principale hypothèse de cette thèse réside dans le fait que certains exemplaires d’un visage sont meilleurs que d’autres en vue de sa reconnaissance. Un visage est un objet 3D non rigide projeté sur un plan pour obtenir une image. Ainsi, en fonction de la position relative de l’objectif par rapport au visage, l’apparence de ce dernier change. Considérant les études sur la reconnaissance de visages, on peut supposer que les exemplaires d’un visage, les mieux reconnus sont ceux de face. Afin d’extraire les exemplaires les plus frontaux possibles, nous devons d’une part estimer la pose de ce visage. D’autre part, il est essentiel de pouvoir suivre le visage tout au long de la séquence. Faute de quoi, extraire des exemplaires représentatifs d’un visage perd tout son sens. Les travaux de cette thèse présentent trois parties majeures. Dans un premier temps, lorsqu’un visage est détecté dans une séquence, nous cherchons à extraire position et taille des yeux, du nez et de la bouche. Notre approche se base sur la création de cartes d’énergie locale principalement à direction horizontale. Dans un second temps, nous estimons la pose du visage en utilisant notamment les positions relatives des éléments que nous avons extraits. Un visage 3D a trois degrés de liberté : le roulis, le lacet et le tangage. Le roulis est estimé grâce à la maximisation d’une fonction d’énergie horizontale globale au visage. Il correspond à la rotation qui s’effectue parallèlement au plan de l’image. Il est donc possible de le corriger pour qu’il soit nul, contrairement aux autres rotations. Enfin, nous proposons un algorithme de suivi de visage basé sur le suivi des yeux dans une séquence vidéo. Ce suivi repose sur la maximisation de la corrélation des cartes d’énergie binarisées ainsi que sur le suivi des éléments connexes de cette carte binaire. L’ensemble de ces trois méthodes permet alors tout d’abord d’évaluer la pose d’un visage qui se trouve dans une trame donnée puis de lier tous les visages d’une même personne dans une séquence vidéo, pour finalement extraire plusieurs exemplaires de ce visage afin de les soumettre à un algorithme de reconnaissance du visage. / The aim of this thesis is to create a methodology in order to extract one or a few representative face images of a video sequence with a view to apply a face recognition algorithm. A video is a media particularly rich. Among all the objects present in the video, human faces are, for sure, the most salient objects. Let us consider a video sequence where each frame contains a face of the same person. The primary assumption of this thesis is that some samples of this face are better than the others in terms of face recognition. A face is a non-rigid 3D object that is projected on a plan to form an image. Hence, the face appearance changes according to the relative positions of the camera and the face. Many works in the field of face recognition require faces as frontal as possible. To extract the most frontal face samples, on the one hand, we have to estimate the head pose. On the other hand, tracking the face is also essential. Otherwise, extraction representative face samples are senseless. This thesis contains three main parts. First, once a face has been detected in a sequence, we try to extract the positions and sizes of the eyes, the nose and the mouth. Our approach is based on local energy maps mainly with a horizontal direction. In the second part, we estimate the head pose using the relative positions and sizes of the salient elements detected in the first part. A 3D face has 3 degrees of freedom: the roll, the yaw and the pitch. The roll is estimated by the maximization of a global energy function computed on the whole face. Since this roll corresponds to the rotation which is parallel to the image plan, it is possible to correct it to have a null roll value face, contrary to other rotations. In the last part, we propose a face tracking algorithm based on the tracking of the region containing both eyes. This tracking is based on the maximization of a similarity measure between two consecutive frames. Therefore, we are able to estimate the pose of the face present in a video frame, then we are also able to link all the faces of the same person in a video sequence. Finally, we can extract several samples of this face in order to apply a face recognition algorithm on them. Extraction des yeux Extraction du nez Extraction de la bouche Éléments anatomiques du visage Filtre de Haar Carte d’énergie locale Carte d’énergie globale Analyse multi-seuil Estimation de pose Roulis Lacet Tangage Suivi du visage Suivi des yeux Eye extraction Nose extraction Mouth extraction Face anatomic elements Haar filter Local energy map Global energy map Mutlithreshold analysis Pose estimation Roll Yaw Pitch Face tracking Eye tracking 004
187	Learning Sampling-Based 6D Object Pose Estimation Krull, Alexander 31 August 2018 (has links) The task of 6D object pose estimation, i.e. of estimating an object position (three degrees of freedom) and orientation (three degrees of freedom) from images is an essential building block of many modern applications, such as robotic grasping, autonomous driving, or augmented reality. Automatic pose estimation systems have to overcome a variety of visual ambiguities, including texture-less objects, clutter, and occlusion. Since many applications demand real time performance the efficient use of computational resources is an additional challenge. In this thesis, we will take a probabilistic stance on trying to overcome said issues. We build on a highly successful automatic pose estimation framework based on predicting pixel-wise correspondences between the camera coordinate system and the local coordinate system of the object. These dense correspondences are used to generate a pool of hypotheses, which in turn serve as a starting point in a final search procedure. We will present three systems that each use probabilistic modeling and sampling to improve upon different aspects of the framework. The goal of the first system, System I, is to enable pose tracking, i.e. estimating the pose of an object in a sequence of frames instead of a single image. By including information from previous frames tracking systems can resolve many visual ambiguities and reduce computation time. System I is a particle filter (PF) approach. The PF represents its belief about the pose in each frame by propagating a set of samples through time. Our system uses the process of hypothesis generation from the original framework as part of a proposal distribution that efficiently concentrates samples in the appropriate areas. In System II, we focus on the problem of evaluating the quality of pose hypotheses. This task plays an essential role in the final search procedure of the original framework. We use a convolutional neural network (CNN) to assess the quality of an hypothesis by comparing rendered and observed images. To train the CNN we view it as part of an energy-based probability distribution in pose space. This probabilistic perspective allows us to train the system under the maximum likelihood paradigm. We use a sampling approach to approximate the required gradients. The resulting system for pose estimation yields superior results in particular for highly occluded objects. In System III, we take the idea of machine learning a step further. Instead of learning to predict an hypothesis quality measure, to be used in a search procedure, we present a way of learning the search procedure itself. We train a reinforcement learning (RL) agent, termed PoseAgent, to steer the search process and make optimal use of a given computational budget. PoseAgent dynamically decides which hypothesis should be refined next, and which one should ultimately be output as final estimate. Since the search procedure includes discrete non-differentiable choices, training of the system via gradient descent is not easily possible. To solve the problem, we model behavior of PoseAgent as non-deterministic stochastic policy, which is ultimately governed by a CNN. This allows us to use a sampling-based stochastic policy gradient training procedure. We believe that some of the ideas developed in this thesis, such as the sampling-driven probabilistically motivated training of a CNN for the comparison of images or the search procedure implemented by PoseAgent have the potential to be applied in fields beyond pose estimation as well. info:eu-repo/classification/ddc/004 ddc:004
188	Through the Blur with Deep Learning : A Comparative Study Assessing Robustness in Visual Odometry Techniques Berglund, Alexander January 2023 (has links) In this thesis, the robustness of deep learning techniques in the field of visual odometry is investigated, with a specific focus on the impact of motion blur. A comparative study is conducted, evaluating the performance of state-of-the-art deep convolutional neural network methods, namely DF-VO and DytanVO, against ORB-SLAM3, a well-established non-deep-learning technique for visual simultaneous localization and mapping. The objective is to quantitatively assess the performance of these models as a function of motion blur. The evaluation is carried out on a custom synthetic dataset, which simulates a camera navigating through a forest environment. The dataset includes trajectories with varying degrees of motion blur, caused by camera translation, and optionally, pitch and yaw rotational noise. The results demonstrate that deep learning-based methods maintained robust performance despite the challenging conditions presented in the test data, while excessive blur lead to tracking failures in the geometric model. This suggests that the ability of deep neural network architectures to automatically learn hierarchical feature representations and capture complex, abstract features may enhance the robustness of deep learning-based visual odometry techniques in challenging conditions, compared to their geometric counterparts. artificial intelligence AI machine learning ML deep learning DL computer vision neural networks NN convolutional neural networks CNN visual odometry VO robustness motion blur AirForestry localization navigation ego-motion pose estimation SLAM DF-VO DytanVO ORB-SLAM3 artificiell intelligens maskininlärning datorseende Computer Sciences Datavetenskap (datalogi)
189	Modulating Depth Map Features to Estimate 3D Human Pose via Multi-Task Variational Autoencoders / Modulerande djupkartfunktioner för att uppskatta människans ställning i 3D med multi-task-variationsautoenkoder Moerman, Kobe January 2023 (has links) Human pose estimation (HPE) constitutes a fundamental problem within the domain of computer vision, finding applications in diverse fields like motion analysis and human-computer interaction. This paper introduces innovative methodologies aimed at enhancing the accuracy and robustness of 3D joint estimation. Through the integration of Variational Autoencoders (VAEs), pertinent information is extracted from depth maps, even in the presence of inevitable image-capturing inconsistencies. This concept is enhanced through the introduction of noise to the body or specific regions surrounding key joints. The deliberate introduction of noise to these areas enables the VAE to acquire a robust representation that captures authentic pose-related patterns. Moreover, the introduction of a localised mask as a constraint in the loss function ensures the model predominantly relies on pose-related cues while disregarding potential confounding factors that may hinder the compact representation of accurate human pose information. Delving into the latent space modulation further, a novel model architecture is devised, joining a VAE and fully connected network into a multi-task joint training objective. In this framework, the VAE and regressor harmoniously influence the latent representations for accurate joint detection and localisation. By combining the multi-task model with the loss function constraint, this study attains results that compete with state-of-the-art techniques. These findings underscore the significance of leveraging latent space modulation and customised loss functions to address challenging human poses. Additionally, these novel methodologies pave the way for future explorations and provide prospects for advancing HPE. Subsequent research endeavours may optimising these techniques, evaluating their performance across diverse datasets, and exploring potential extensions to unravel further insights and advancements in the field. / Human pose estimation (HPE) är ett grundläggande problem inom datorseende och används inom områden som rörelseanalys och människa-datorinteraktion. I detta arbete introduceras innovativa metoder som syftar till att förbättra noggrannheten och robustheten i 3D-leduppskattning. Genom att integrera variationsautokodare (eng. variational autoencoder, VAE) extraheras relevant information från djupkartor, trots närvaro av inkonsekventa avvikelser i bilden. Dessa avvikelser förstärks genom att applicera brus på kroppen eller på specifika regioner som omger viktiga leder. Det avsiktliga införandet av brus i dessa områden gör det möjligt för VAE att lära sig en robust representation som fångar autentiska poseringsrelaterade mönster. Dessutom införs en lokaliserad mask som en begränsning i förlustfunktionen, vilket säkerställer att modellen främst förlitar sig på poseringsrelaterade signaler samtidigt som potentiella störande faktorer som hindrar den kompakta representationen av korrekt mänsklig poseringsinformation bortses ifrån. Genom att fördjupa sig ytterligare i den latenta rumsmoduleringen har en ny modellarkitektur tagits fram som förenar en VAE och ett fullständigt anslutet nätverk i en fleruppgiftsmodell. I detta ramverk påverkar VAE och det fullständigt ansluta nätverket de latenta representationerna på ett harmoniskt sätt för att uppnå korrekt leddetektering och lokalisering. Genom att kombinera fleruppgiftsmodellen med förlustfunktionsbegränsningen uppnår denna studie resultat som konkurrerar med toppmoderna tekniker. Dessa resultat understryker betydelsen av att utnyttja latent rymdmodulering och anpassade förlustfunktioner för att hantera utmanande mänskliga poser. Dessutom banar dessa nya metoder väg för framtida utveckling inom uppskattning av HPE. Efterföljande forskningsinsatser kan optimera dessa tekniker, utvärdera deras prestanda över olika datamängder och utforska potentiella tillägg för att avslöja ytterligare insikter och framsteg inom området. 3D pose estimation Joint landmarks Variational autoencoder Multi-task model Loss discrimination Latent-space modulation Depth map 3D-positionsuppskattning Gemensamma landmärken Variationell autoencoder Multitask-modell Förlustdiskriminering Latent-space-modulering Djupkarta Computer Sciences Datavetenskap (datalogi) Computer Engineering Datorteknik Computer and Information Sciences Data- och informationsvetenskap
190	Crime Detection From Pre-crime Video Analysis Sedat Kilic (18363729) 03 June 2024 (has links) <p dir="ltr">his research investigates the detection of pre-crime events, specifically targeting behaviors indicative of shoplifting, through the advanced analysis of CCTV video data. The study introduces an innovative approach that leverages augmented human pose and emotion information within individual frames, combined with the extraction of activity information across subsequent frames, to enhance the identification of potential shoplifting actions before they occur. Utilizing a diverse set of models including 3D Convolutional Neural Networks (CNNs), Graph Neural Networks (GNNs), Recurrent Neural Networks (RNNs), and a specially developed transformer architecture, the research systematically explores the impact of integrating additional contextual information into video analysis.</p><p dir="ltr">By augmenting frame-level video data with detailed pose and emotion insights, and focusing on the temporal dynamics between frames, our methodology aims to capture the nuanced behavioral patterns that precede shoplifting events. The comprehensive experimental evaluation of our models across different configurations reveals a significant improvement in the accuracy of pre-crime detection. The findings underscore the crucial role of combining visual features with augmented data and the importance of analyzing activity patterns over time for a deeper understanding of pre-shoplifting behaviors.</p><p dir="ltr">The study’s contributions are multifaceted, including a detailed examination of pre-crime frames, strategic augmentation of video data with added contextual information, the creation of a novel transformer architecture customized for pre-crime analysis, and an extensive evaluation of various computational models to improve predictive accuracy.</p> Computer vision Image and video coding Image processing Pattern recognition Video processing crime detection video analysis augmented information pose estimation emotion estimation optical flow deep learning pre-crime video analysis video understanding anomaly detection contextual information shoplifting prevention crime prevention vision transformer transformer generative AI

Search results