Global ETD Search

91	A Composite Field-Based Learning Framework for Pose Estimation and Object Detection : Exploring Scale Variation Adaptations in Composite Field-Based Pose Estimation and Extending the Framework for Object Detection / En sammansatt fältbaserad inlärningsramverk för posuppskattning och objektdetektering : Utforskning av skalvariationsanpassningar i sammansatt fältbaserad posuppskattning och utvidgning av ramverket för objektdetektering Guo, Jianting January 2024 (has links) This thesis aims to address the concurrent challenges of multi-person 2D pose estimation and object detection within a unified bottom-up framework. Our foundational solutions encompass a recently proposed pose estimation framework named OpenPifPaf, grounded in composite fields. OpenPifPaf employs the Composite Intensity Field (CIF) for precise joint localization and the Composite Association Field (CAF) for seamless joint connectivity. To assess the model’s robustness against scale variances, a Feature Pyramid Network (FPN) is incorporated into the baseline. Additionally, we present a variant of OpenPifPaf known as CifDet. CifDet utilizes the Composite Intensity Field to classify and detect object centers, subsequently regressing bounding boxes from these identified centers. Furthermore, we introduce an extended version of CifDet specifically tailored for enhanced object detection capabilities—CifCafDet. This augmented framework is designed to more effectively tackle the challenges inherent in object detection tasks. The baseline OpenPifPaf model outperforms most existing bottom-up pose estimation methods and achieves comparable results with some state-of-the-art top-down methods on the COCO keypoint dataset. Its variant, CifDet, adapts the OpenPifPaf’s composite field-based architecture for object detection tasks. Further modifications result in CifCafDet, which demonstrates enhanced performance on the MS COCO detection dataset over CifDet, suggesting its viability as a multi-task framework. / Denna avhandling syftar till att ta itu med de samtidiga utmaningarna med flerpersons 2D-posestimering och objektdetektion inom en enhetlig bottom-up-ram. Våra grundläggande lösningar omfattar ett nyligen föreslaget ramverk för posestimering med namnet OpenPifPaf, som grundar sig i kompositfält. OpenPifPaf använder Composite Intensity Field (CIF) för exakt leddlokalisering och Composite Association Field (CAF) för sömlös ledanslutning. För att bedöma modellens robusthet mot skalvariationer införlivas ett Feature Pyramid Network (FPN) i baslinjen. Dessutom presenterar vi en variant av OpenPifPaf känd som CifDet. CifDet använder Composite Intensity Field för att klassificera och detektera objektcentrum, för att sedan regrediera inramningslådor från dessa identifierade centrum. Vidare introducerar vi en utökad version av CifDet som är speciellt anpassad för förbättrade objektdetekteringsförmågor—CifCafDet. Detta förstärkta ramverk är utformat för att mer effektivt ta itu med de utmaningar som är inneboende i objektdetekteringsuppgifter. Basmodellen OpenPifPaf överträffar de flesta befintliga bottom-up-metoder för posestimering och uppnår jämförbara resultat med vissa toppmoderna top-down-metoder på COCO-keypoint-datasetet. Dess variant, CifDet, anpassar OpenPifPafs kompositfältbaserade arkitektur för objekt-detekteringsuppgifter. Ytterligare modifieringar resulterar i CifCafDet, som visar förbättrad prestanda på MS COCO-detektionsdatasetet över CifDet, vilket antyder dess livskraft som ett ramverk för flera uppgifter. Composite fields Pose Estimation Object Detection Computer Vision Deep Learning Sammansatta fält Pose uppskattning Objektdetektering Datorseende Djupinlärning Computer and Information Sciences Data- och informationsvetenskap
92	Channel-Coded Feature Maps for Computer Vision and Machine Learning Jonsson, Erik January 2008 (has links) This thesis is about channel-coded feature maps applied in view-based object recognition, tracking, and machine learning. A channel-coded feature map is a soft histogram of joint spatial pixel positions and image feature values. Typical useful features include local orientation and color. Using these features, each channel measures the co-occurrence of a certain orientation and color at a certain position in an image or image patch. Channel-coded feature maps can be seen as a generalization of the SIFT descriptor with the options of including more features and replacing the linear interpolation between bins by a more general basis function. The general idea of channel coding originates from a model of how information might be represented in the human brain. For example, different neurons tend to be sensitive to different orientations of local structures in the visual input. The sensitivity profiles tend to be smooth such that one neuron is maximally activated by a certain orientation, with a gradually decaying activity as the input is rotated. This thesis extends previous work on using channel-coding ideas within computer vision and machine learning. By differentiating the channel-coded feature maps with respect to transformations of the underlying image, a method for image registration and tracking is constructed. By using piecewise polynomial basis functions, the channel coding can be computed more efficiently, and a general encoding method for N-dimensional feature spaces is presented. Furthermore, I argue for using channel-coded feature maps in view-based pose estimation, where a continuous pose parameter is estimated from a query image given a number of training views with known pose. The optimization of position, rotation and scale of the object in the image plane is then included in the optimization problem, leading to a simultaneous tracking and pose estimation algorithm. Apart from objects and poses, the thesis examines the use of channel coding in connection with Bayesian networks. The goal here is to avoid the hard discretizations usually required when Markov random fields are used on intrinsically continuous signals like depth for stereo vision or color values in image restoration. Channel coding has previously been used to design machine learning algorithms that are robust to outliers, ambiguities, and discontinuities in the training data. This is obtained by finding a linear mapping between channel-coded input and output values. This thesis extends this method with an incremental version and identifies and analyzes a key feature of the method -- that it is able to handle a learning situation where the correspondence structure between the input and output space is not completely known. In contrast to a traditional supervised learning setting, the training examples are groups of unordered input-output points, where the correspondence structure within each group is unknown. This behavior is studied theoretically and the effect of outliers and convergence properties are analyzed. All presented methods have been evaluated experimentally. The work has been conducted within the cognitive systems research project COSPAL funded by EC FP6, and much of the contents has been put to use in the final COSPAL demonstrator system. computer vision machine learning object recognition pose estimation Image analysis Bildanalys
93	Enhancing mobile camera pose estimation through the inclusion of sensors Hughes, Lloyd Haydn 12 1900 (has links) Thesis (MSc)--Stellenbosch University, 2014. / ENGLISH ABSTRACT: Monocular structure from motion (SfM) is a widely researched problem, however many of the existing approaches prove to be too computationally expensive for use on mobile devices. In this thesis we investigate how inertial sensors can be used to increase the performance of SfM algorithms on mobile devices. Making use of the low cost inertial sensors found on most mobile devices we design and implement an extended Kalman filter (EKF) to exploit their complementary nature, in order to produce an accurate estimate of the attitude of the device. We make use of a quaternion based system model in order to linearise the measurement stage of the EKF, thus reducing its computational complexity. We use this attitude estimate to enhance the feature tracking and camera localisation stages in our SfM pipeline. In order to perform feature tracking we implement a hybrid tracking algorithm which makes use of Harris corners and an approximate nearest neighbour search to reduce the search space for possible correspondences. We increase the robustness of this approach by using inertial information to compensate for inter-frame camera rotation. We further develop an efficient bundle adjustment algorithm which only optimises the pose of the previous three key frames and the 3D map points common between at least two of these frames. We implement an optimisation based localisation algorithm which makes use of our EKF attitude estimate and the tracked features, in order to estimate the pose of the device relative to the 3D map points. This optimisation is performed in two steps, the first of which optimises only the translation and the second optimises the full pose. We integrate the aforementioned three sub-systems into an inertial assisted pose estimation pipeline. We evaluate our algorithms with the use of datasets captured on the iPhone 5 in the presence of a Vicon motion capture system for ground truth data. We find that our EKF can estimate the device’s attitude with an average dynamic accuracy of ±5°. Furthermore, we find that the inclusion of sensors into the visual pose estimation pipeline can lead to improvements in terms of robustness and computational efficiency of the algorithms and are unlikely to negatively affect the accuracy of such a system. Even though we managed to reduce execution time dramatically, compared to typical existing techniques, our full system is found to still be too computationally expensive for real-time performance and currently runs at 3 frames per second, however the ever improving computational power of mobile devices and our described future work will lead to improved performance. From this study we conclude that inertial sensors make a valuable addition into a visual pose estimation pipeline implemented on a mobile device. / AFRIKAANSE OPSOMMING: Enkel-kamera struktuur-vanaf-beweging (structure from motion, SfM) is ’n bekende navorsingsprobleem, maar baie van die bestaande benaderings is te berekeningsintensief vir gebruik op mobiele toestelle. In hierdie tesis ondersoek ons hoe traagheidsensors gebruik kan word om die prestasie van SfM algoritmes op mobiele toestelle te verbeter. Om van die lae-koste traagheidsensors wat op meeste mobiele toestelle gevind word gebruik te maak, ontwerp en implementeer ons ’n uitgebreide Kalman filter (extended Kalman filter, EKF) om hul komplementêre geaardhede te ontgin, en sodoende ’n akkurate skatting van die toestel se postuur te verkry. Ons maak van ’n kwaternioon-gebaseerde stelselmodel gebruik om die meetstadium van die EKF te lineariseer, en so die berekeningskompleksiteit te verminder. Hierdie afskatting van die toestel se postuur word gebruik om die fases van kenmerkvolging en kameralokalisering in ons SfM proses te verbeter. Vir kenmerkvolging implementeer ons ’n hibriede volgingsalgoritme wat gebruik maak van Harris-hoekpunte en ’n benaderde naaste-buurpunt-soektog om die soekruimte vir moontlike ooreenstemmings te verklein. Ons verhoog die robuustheid van hierdie benadering, deur traagheidsinligting te gebruik om vir kamerarotasies tussen raampies te kompenseer. Verder ontwikkel ons ’n doeltreffende bondelaanpassingsalgoritme wat slegs optimeer oor die vorige drie sleutelraampies, en die 3D punte gemeenskaplik tussen minstens twee van hierdie raampies. Ons implementeer ’n optimeringsgebaseerde lokaliseringsalgoritme, wat gebruik maak van ons EKF se postuurafskatting en die gevolgde kenmerke, om die posisie en oriëntasie van die toestel relatief tot die 3D punte in die kaart af te skat. Die optimering word in twee stappe uitgevoer: eerstens net oor die kamera se translasie, en tweedens oor beide die translasie en rotasie. Ons integreer die bogenoemde drie sub-stelsels in ’n pyplyn vir postuurafskatting met behulp van traagheidsensors. Ons evalueer ons algoritmes met die gebruik van datastelle wat met ’n iPhone 5 opgeneem is, terwyl dit in die teenwoordigheid van ’n Vicon bewegingsvasleggingstelsel was (vir die gelyktydige opneming van korrekte postuurdata). Ons vind dat die EKF die toestel se postuur kan afskat met ’n gemiddelde dinamiese akkuraatheid van ±5°. Verder vind ons dat die insluiting van sensors in die visuele postuurafskattingspyplyn kan lei tot verbeterings in terme van die robuustheid en berekeningsdoeltreffendheid van die algoritmes, en dat dit waarskynlik nie die akkuraatheid van so ’n stelsel negatief beïnvloed nie. Al het ons die uitvoertyd drasties verminder (in vergelyking met tipiese bestaande tegnieke) is ons volledige stelsel steeds te berekeningsintensief vir intydse verwerking op ’n mobiele toestel en hardloop tans teen 3 raampies per sekonde. Die voortdurende verbetering van mobiele toestelle se berekeningskrag en die toekomstige werk wat ons beskryf sal egter lei tot ’n verbetering in prestasie. Uit hierdie studie kan ons aflei dat traagheidsensors ’n waardevolle toevoeging tot ’n visuele postuurafskattingspyplyn kan maak. Computer vision Motion detectors Extended Kalman filter (EKF) Pose estimation Mobile cameras UCTD
94	Estimation de pose et asservissement de robot par vision omnidirectionnelle Caron, Guillaume 30 November 2010 (has links) (PDF) Ce travail s'inscrit dans le cadre d'un programme de recherche sur la vision omnidirectionnelle, monoculaire et stéréoscopique. L'estimation de position et d'orientation de robot par ce type de vision artificielle repose sur le même formalisme de base que l'asservissement visuel. Cette technique consiste à commander le mouvement d'un robot en utilisant l'information visuelle apportée par une ou plusieurs caméras. Ce mouvement est virtuel dans le cas d'une estimation de pose. Utiliser le large champ de vue apporté par la vision omnidirectionnelle a généralement une bonne influence sur le comportement du robot mais les informations visuelles, et en particulier leur représentation, ont un impact important sur ce comportement. Un choix inadéquat de représentation d'information peut engendrer l'échec de l'asservissement ou du calcul de pose. Cette thèse vise à étudier et comparer différentes primitives visuelles et leurs représentations en vision omnidirectionnelle. Même si un modèle de projection stéréographique impliquant une sphère a été défini pour les caméras omnidirectionnelles centrales, un point, une droite ou un plan est presque toujours représenté dans le plan image. Quelques travaux ont certes formulé l'estimation de pose ou l'asservissement visuel à partir de primitives définies sur la sphère mais sans montrer, en pratique, quel est le meilleur lieu de définition : le plan image ou la sphère. C'est une des motivations de cette étude. Ce travail propose aussi d'utiliser les intensités des pixels d'une image omnidirectionnelle comme primitive visuelle pour l'asservissement de robot, permettant ainsi de s'affranchir du traitement d'image nécessaire à la détection de points, de droites, etc. La validation expérimentale montre une très grande précision de positionnement. Cette constatation a servi de base à l'utilisation de voisinage photométrique inclus dans une région pour le suivi de plan ou avoisinant un segment pour le suivi de droites verticales. Ce dernier point est une des contributions les plus importantes de cette thèse car cela permet de reconstruire une droite verticale par ajustement photométrique des voisinages de ses projections dans les images de stéréovision omnidirectionnelle. Ceci peut, de plus, être étendu à l'estimation de mouvement et à toute primitive géométrique. Vision omnidirectionnelle stéréoscopie estimation de pose asservissement visuel suivi
95	Face pose estimation in monocular images Shafi, Muhammad January 2010 (has links) People use orientation of their faces to convey rich, inter-personal information. For example, a person will direct his face to indicate who the intended target of the conversation is. Similarly in a conversation, face orientation is a non-verbal cue to listener when to switch role and start speaking, and a nod indicates that a person has understands, or agrees with, what is being said. Further more, face pose estimation plays an important role in human-computer interaction, virtual reality applications, human behaviour analysis, pose-independent face recognition, driver s vigilance assessment, gaze estimation, etc. Robust face recognition has been a focus of research in computer vision community for more than two decades. Although substantial research has been done and numerous methods have been proposed for face recognition, there remain challenges in this field. One of these is face recognition under varying poses and that is why face pose estimation is still an important research area. In computer vision, face pose estimation is the process of inferring the face orientation from digital imagery. It requires a serious of image processing steps to transform a pixel-based representation of a human face into a high-level concept of direction. An ideal face pose estimator should be invariant to a variety of image-changing factors such as camera distortion, lighting condition, skin colour, projective geometry, facial hairs, facial expressions, presence of accessories like glasses and hats, etc. Face pose estimation has been a focus of research for about two decades and numerous research contributions have been presented in this field. Face pose estimation techniques in literature have still some shortcomings and limitations in terms of accuracy, applicability to monocular images, being autonomous, identity and lighting variations, image resolution variations, range of face motion, computational expense, presence of facial hairs, presence of accessories like glasses and hats, etc. These shortcomings of existing face pose estimation techniques motivated the research work presented in this thesis. The main focus of this research is to design and develop novel face pose estimation algorithms that improve automatic face pose estimation in terms of processing time, computational expense, and invariance to different conditions. 005.3
96	Ernst Ludwig Kirchner (1880-1938): Early Female Nudes in Landscapes Rogge, Kathryn 17 November 2010 (has links) This thesis examines how Ernst Ludwig Kirchner reconceived the female nude within the two contexts of Expressionism and the German nudist movement. In particular, it looks to Kirchner’s early paintings, executed between 1909 and 1914, of female nudes in landscape settings to determine how Kirchner operated within and departed from the conventions of the female nude. This thesis challenges the feminist critique of Expressionist painting and Kirchner’s female nudes. It also examines how Kirchner’s female nudes in landscapes are complicated by the early twentieth-century development of German nudism. While these paintings are often categorized as bathers following nineteenth-century French precedent, they in fact are unique products of die Brücke philosophy. Kirchner Expressionism Germany nudism Duncan pose feminism die Brücke Arts and Humanities
97	Diskrepance záklonových poloh užívaných ve fyzioterapii / Discrepancy of extension positions used in physiotherapy Krátká, Gabriela January 2017 (has links) Title: Discrepancy of extention positions used in physiotherapy Objectives: The aim of this thesis is to describe in detail selected extinction position (bhudžangásána, úrdhvamukhašvanásána, McKenzie extension, extension from developmental kinesiology) and then using surface electromyography to verify the similarity and difference of selected muscle involvement in these positions. From these detection then conclude whether this practice often confused position brings the same or different therapeutic effect. Methods: It is the type of quantitative research to theoretical and empirical character. The research method is observing (intra experiment). Using surface electromyography was sensed electrical activity m. trapezius (superior et inferior), m. erector spinae, mm. oblique, m. gluteus maximus and m. semitendinosus in 7 subjects in the implementation of extension positions. Results: It was confirmed that for each of these positions was dominant differnt of the measured muscles. It was confirmed that the position bhudžangásána and úrdhvamukhašvanásána are described in the literature differently than is shown practical experiment. Other positions didn't confirm this statement. It was confirmed that each of mentioned positions has a different therapeutic effect. Keywords: extention position,...
98	Tělo jako živý obraz / Body as a moving picture Zemanová, Manuela January 2018 (has links) Master's thesis "Body as a Moving Picture" focuses in its theoretical part on the phenomenon of gesture, expression, pose, body language and pathos as we can find it in works for classical authors to contemporary art. The practical part of the thesis describes pedagogically-psychological resources necessary for the realization of art projects in its first chapter. It aims to search different forms of action art and practically test their applicability with students in the lessons of Art Education. The second chapter is dedicated to the realization of art projects. The art projects are focused on a dialog with chosen works of art and the use of artefiletic techniques in practice and their overlap into the students' artistic creations. KEYWORDS gesture, expression, pose, patos, performance art, actionism
99	Efficient Factor Graph Fusion for Multi-robot Mapping Natarajan, Ramkumar 12 June 2017 (has links) "This work presents a novel method to efficiently factorize the combination of multiple factor graphs having common variables of estimation. The fast-paced innovation in the algebraic graph theory has enabled new tools of state estimation like factor graphs. Recent factor graph formulation for Simultaneous Localization and Mapping (SLAM) like Incremental Smoothing and Mapping using the Bayes tree (ISAM2) has been very successful and garnered much attention. Variable ordering, a well-known technique in linear algebra is employed for solving the factor graph. Our primary contribution in this work is to reuse the variable ordering of the graphs being combined to find the ordering of the fused graph. In the case of mapping, multiple robots provide a great advantage over single robot by providing a faster map coverage and better estimation quality. This coupled with an inevitable increase in the number of robots around us produce a demand for faster algorithms. For example, a city full of self-driving cars could pool their observation measurements rapidly to plan a traffic free navigation. By reusing the variable ordering of the parent graphs we were able to produce an order-of-magnitude difference in the time required for solving the fused graph. We also provide a formal verification to show that the proposed strategy does not violate any of the relevant standards. A common problem in multi-robot SLAM is relative pose graph initialization to produce a globally consistent map. The other contribution addresses this by minimizing a specially formulated error function as a part of solving the factor graph. The performance is illustrated on a publicly available SuiteSparse dataset and the multi-robot AP Hill dataset." sam isam variable ordering SLAM multi-robot graph SLAM pose graph initialization smoothing and mapping
100	Suivi multi-caméras de personnes dans un environnement contraint Aziz, Kheir Eddine 11 May 2012 (has links) La consommation est considérée comme étant l'une des formes simples de la vie quotidienne. L'évolution de la société moderne a entraîné un environnement fortement chargé d'objets, de signes et d'interactions fondées sur des transactions commerciales. À ce phénomène s'ajoutent l'accélération du renouvellement de l'offre disponible et le pouvoir d'achat qui devient une préoccupation grandissante pour la majorité des consommateurs et oú l'inflation des prix est un sujet récurrent. Compte tenu de cette complexité et de ces enjeux économiques aussi consé- quents, la nécessité de modéliser le comportement d'achat des consommateurs dans les diffé- rents secteurs d'activité présente une phase primordiale pour les grands acteurs économiques ou analystes. En 2008, la société Cliris s'est lancée dans le projet de suivi multi-caméras de trajectoires des clients. En effet, le projet repose sur la mise au point d'un système d'analyse automatique multi-flux basé sur le suivi multi-caméras de clients. Ce système permet d'analy- ser la fréquentation et les parcours des clients dans les surfaces de grandes distributions. Dans le cadre de cette thèse CIFRE, nous avons abordé l'ensemble du processus de suivi multi-caméras de personnes tout en mettant l'accent sur le côté applicatif du problème en apportant notre contribution à la réponse aux questions suivantes :1. Comment suivre un individu à partir d'un flux vidéo mono-caméra en assurant la gestion des occultations ?2. Comment effectuer un comptage de personnes dans les surfaces denses ?3. Comment reconnaître un individu en différents points du magasin à partir des flux vidéo multi-caméras et suivre ainsi son parcours ? / ... Détection de têtes Squelette 2D Comptage de personnes Classification des apparences Segmentation de silhouettes Estimation de pose Ré-identification de personnes ...

Search results