Global ETD Search

221	3D monitor pomocí detekce pozice hlavy / 3D Monitor Based on Head Pose Detection Zivčák, Jan January 2011 (has links) With the development of posibilities of image processing, stereoscopy, prices of web cameras and power of computers an opportunity to multiply an experience with working with 3D programs showed. From the picture from webcamera an estimation of a pose of user's head can be made. According to this pose a view on 3D scene can be changed. Then, when user moves his head, he will have a feeling as if monitor was a window through which one can see the scene behind. With the system which is the result of this project it will be possible to easily and cheaply add this kind of behaviour to any 3D application.
222	Interpretable Fine-Grained Visual Categorization Guo, Pei 16 June 2021 (has links) Not all categories are created equal in object recognition. Fine-grained visual categorization (FGVC) is a branch of visual object recognition that aims to distinguish subordinate categories within a basic-level category. Examples include classifying an image of a bird into specific species like "Western Gull" or "California Gull". Such subordinate categories exhibit characteristics like small inter-category variation and large intra-class variation, making distinguishing them extremely difficult. To address such challenges, an algorithm should be able to focus on object parts and be invariant to object pose. Like many other computer vision tasks, FGVC has witnessed phenomenal advancement following the resurgence of deep neural networks. However, the proposed deep models are usually treated as black boxes. Network interpretation and understanding aims to unveil the features learned by neural networks and explain the reason behind network decisions. It is not only a necessary component for building trust between humans and algorithms, but also an essential step towards continuous improvement in this field. This dissertation is a collection of papers that contribute to FGVC and neural network interpretation and understanding. Our first contribution is an algorithm named Pose and Appearance Integration for Recognizing Subcategories (PAIRS) which performs pose estimation and generates a unified object representation as the concatenation of pose-aligned region features. As the second contribution, we propose the task of semantic network interpretation. For filter interpretation, we represent the concepts a filter detects using an attribute probability density function. We propose the task of semantic attribution using textual summarization that generates an explanatory sentence consisting of the most important visual attributes for decision-making, as found by a general Bayesian inference algorithm. Pooling has been a key component in convolutional neural networks and is of special interest in FGVC. Our third contribution is an empirical and experimental study towards a thorough yet intuitive understanding and extensive benchmark of popular pooling approaches. Our fourth contribution is a novel LMPNet for weakly-supervised keypoint discovery. A novel leaky max pooling layer is proposed to explicitly encourages sparse feature maps to be learned. A learnable clustering layer is proposed to group the keypoint proposals into final keypoint predictions. 2020 marks the 10th year since the beginning of fine-grained visual categorization. It is of great importance to summarize the representative works in this domain. Our last contribution is a comprehensive survey of FGVC containing nearly 200 relevant papers that cover 7 common themes. fine-grained visual categorization pose-aligned representation global pooling network interpretation network understanding weakly-supervised keypoint discovery Physical Sciences and Mathematics
223	Facial Feature Tracking and Head Pose Tracking as Input for Platform Games Andersson, Anders Tobias January 2016 (has links) Modern facial feature tracking techniques can automatically extract and accurately track multiple facial landmark points from faces in video streams in real time. Facial landmark points are deﬁned as points distributed on a face in regards to certain facial features, such as eye corners and face contour. This opens up for using facial feature movements as a handsfree human-computer interaction technique. These alternatives to traditional input devices can give a more interesting gaming experience. They also open up for more intuitive controls and can possibly give greater access to computers and video game consoles for certain disabled users with diﬃculties using their arms and/or ﬁngers. This research explores using facial feature tracking to control a character's movements in a platform game. The aim is to interpret facial feature tracker data and convert facial feature movements to game input controls. The facial feature input is compared with other handsfree inputmethods, as well as traditional keyboard input. The other handsfree input methods that are explored are head pose estimation and a hybrid between the facial feature and head pose estimation input. Head pose estimation is a method where the application is extracting the angles in which the user's head is tilted. The hybrid input method utilises both head pose estimation and facial feature tracking. The input methods are evaluated by user performance and subjective ratings from voluntary participants playing a platform game using the input methods. Performance is measured by the time, the amount of jumps and the amount of turns it takes for a user to complete a platform level. Jumping is an essential part of platform games. To reach the goal, the player has to jump between platforms. An ineﬃcient input method might make this a diﬃcult task. Turning is the action of changing the direction of the player character from facing left to facing right or vice versa. This measurement is intended to pick up diﬃculties in controling the character's movements. If the player makes many turns, it is an indication that it is diﬃcult to use the input method to control the character movements eﬃciently. The results suggest that keyboard input is the most eﬀective input method, while it is also the least entertaining of the input methods. There is no signiﬁcant diﬀerence in performance between facial feature input and head pose input. The hybrid input version has the best results overall of the alternative input methods. The hybrid input method got signiﬁcantly better performance results than the head pose input and facial feature input methods, while it got results that were of no statistically signiﬁcant diﬀerence from the keyboard input method. Keywords: Computer Vision, Facial Feature Tracking, Head Pose Tracking, Game Control / Moderna tekniker kan automatiskt extrahera och korrekt följa multipla landmärken från ansikten i videoströmmar. Landmärken från ansikten är deﬁnerat som punkter placerade på ansiktet utefter ansiktsdrag som till exempel ögat eller ansikts konturer. Detta öppnar upp för att använda ansiktsdragsrörelser som en teknik för handsfree människa-datorinteraktion. Dessa alternativ till traditionella tangentbord och spelkontroller kan användas för att göra datorer och spelkonsoler mer tillgängliga för vissa rörelsehindrade användare. Detta examensarbete utforskar användbarheten av ansiktsdragsföljning för att kontrollera en karaktär i ett plattformsspel. Målet är att tolka data från en appliktion som följer ansiktsdrag och översätta ansiktsdragens rörelser till handkontrollsinmatning. Ansiktsdragsinmatningen jämförs med inmatning med huvudposeuppskattning, en hybrid mellan ansikstdragsföljning och huvudposeuppskattning, samt traditionella tangentbordskontroller. Huvudposeuppskattning är en teknik där applikationen extraherar de vinklar användarens huvud lutar. Hybridmetoden använder både ansiktsdragsföljning och huvudposeuppskattning. Inmatningsmetoderna granskas genom att mäta eﬀektivitet i form av tid, antal hopp och antal vändningar samt subjektiva värderingar av frivilliga testanvändare som spelar ett plattformspel med de olika inmatningsmetoderna. Att hoppa är viktigt i ett plattformsspel. För att nå målet, måste spelaren hoppa mellan plattformar. En inefektiv inmatningsmetod kan göra detta svårt. En vändning är när spelarkaraktären byter riktning från att rikta sig åt höger till att rikta sig åt vänster och vice versa. Ett högt antal vändningar kan tyda på att det är svårt att kontrollera spelarkaraktärens rörelser på ett eﬀektivt sätt. Resultaten tyder på att tangentbordsinmatning är den mest eﬀektiva metoden för att kontrollera plattformsspel. Samtidigt ﬁck metoden lägst resultat gällande hur roligt användaren hade under spelets gång. Där var ingen statisktiskt signiﬁkant skillnad mellan huvudposeinmatning och ansikstsdragsinmatning. Hybriden mellan ansiktsdragsinmatning och huvudposeinmatning ﬁck bäst helhetsresultat av de alternativa inmatningsmetoderna. Nyckelord: Datorseende, Följning av Ansiktsdrag, Följning av Huvud, Spelinmatning facial feature tracking head pose tracking alternative interface real-time hci human computer interface Interaction Technologies Interaktionsteknik
224	Evaluation of 3D motion capture data from a deep neural network combined with a biomechanical model Rydén, Anna, Martinsson, Amanda January 2021 (has links) Motion capture has in recent years grown in interest in many fields from both game industry to sport analysis. The need of reflective markers and expensive multi-camera systems limits the business since they are costly and time-consuming. One solution to this could be a deep neural network trained to extract 3D joint estimations from a 2D video captured with a smartphone. This master thesis project has investigated the accuracy of a trained convolutional neural network, MargiPose, that estimates 25 joint positions in 3D from a 2D video, against a gold standard, multi-camera Vicon-system. The project has also investigated if the data from the deep neural network can be connected to a biomechanical modelling software, AnyBody, for further analysis. The final intention of this project was to analyze how accurate such a combination could be in golf swing analysis. The accuracy of the deep neural network has been evaluated with three parameters: marker position, angular velocity and kinetic energy for different segments of the human body. MargiPose delivers results with high accuracy (Mean Per Joint Position Error (MPJPE) = 1.52 cm) for a simpler movement but for a more advanced motion such as a golf swing, MargiPose achieves less accuracy in marker distance (MPJPE = 3.47 cm). The mean difference in angular velocity shows that MargiPose has difficulties following segments that are occluded or has a greater motion, such as the wrists in a golf swing where they both move fast and are occluded by other body segments. The conclusion of this research is that it is possible to connect data from a trained CNN with a biomechanical modelling software. The accuracy of the network is highly dependent on the intention of the data. For the purpose of golf swing analysis, this could be a great and cost-effective solution which could enable motion analysis for professionals but also for interested beginners. MargiPose shows a high accuracy when evaluating simple movements. However, when using it with the intention of analyzing a golf swing in i biomechanical modelling software, the outcome might be beyond the bounds of reliable results. Human pose estimation motion capture deep neural network CNN MargiPose biomechanical modelling AnyBody modelling system Other Medical Engineering Annan medicinteknik
225	Learning to Predict Dense Correspondences for 6D Pose Estimation Brachmann, Eric 17 January 2018 (has links) Object pose estimation is an important problem in computer vision with applications in robotics, augmented reality and many other areas. An established strategy for object pose estimation consists of, firstly, finding correspondences between the image and the object’s reference frame, and, secondly, estimating the pose from outlier-free correspondences using Random Sample Consensus (RANSAC). The first step, namely finding correspondences, is difficult because object appearance varies depending on perspective, lighting and many other factors. Traditionally, correspondences have been established using handcrafted methods like sparse feature pipelines. In this thesis, we introduce a dense correspondence representation for objects, called object coordinates, which can be learned. By learning object coordinates, our pose estimation pipeline adapts to various aspects of the task at hand. It works well for diverse object types, from small objects to entire rooms, varying object attributes, like textured or texture-less objects, and different input modalities, like RGB-D or RGB images. The concept of object coordinates allows us to easily model and exploit uncertainty as part of the pipeline such that even repeating structures or areas with little texture can contribute to a good solution. Although we can train object coordinate predictors independent of the full pipeline and achieve good results, training the pipeline in an end-to-end fashion is desirable. It enables the object coordinate predictor to adapt its output to the specificities of following steps in the pose estimation pipeline. Unfortunately, the RANSAC component of the pipeline is non-differentiable which prohibits end-to-end training. Adopting techniques from reinforcement learning, we introduce Differentiable Sample Consensus (DSAC), a formulation of RANSAC which allows us to train the pose estimation pipeline in an end-to-end fashion by minimizing the expectation of the final pose error. info:eu-repo/classification/ddc/004 ddc:004
226	Hypothesis Generation for Object Pose Estimation From local sampling to global reasoning Michel, Frank 14 February 2019 (has links) Pose estimation has been studied since the early days of computer vision. The task of object pose estimation is to determine the transformation that maps an object from it's inherent coordinate system into the camera-centric coordinate system. This transformation describes the translation of the object relative to the camera and the orientation of the object in three dimensional space. The knowledge of an object's pose is a key ingredient in many application scenarios like robotic grasping, augmented reality, autonomous navigation and surveillance. A general estimation pipeline consists of the following four steps: extraction of distinctive points, creation of a hypotheses pool, hypothesis verification and, finally, the hypotheses refinement. In this work, we focus on the hypothesis generation process. We show that it is beneficial to utilize geometric knowledge in this process. We address the problem of hypotheses generation of articulated objects. Instead of considering each object part individually we model the object as a kinematic chain. This enables us to use the inner-part relationships when sampling pose hypotheses. Thereby we only need K correspondences for objects consisting of K parts. We show that applying geometric knowledge about part relationships improves estimation accuracy under severe self-occlusion and low quality correspondence predictions. In an extension we employ global reasoning within the hypotheses generation process instead of sampling 6D pose hypotheses locally. We therefore formulate a Conditional-Random-Field operating on the image as a whole inferring those pixels that are consistent with the 6D pose. Within the CRF we use a strong geometric check that is able to assess the quality of correspondence pairs. We show that our global geometric check improves the accuracy of pose estimation under heavy occlusion. info:eu-repo/classification/ddc/004 ddc:004
227	Pose Estimation using Implicit Functions and Uncertainty in 3D Blomstedt, Frida January 2023 (has links) Human pose estimation in 3D is a large area within computer vision, with many application areas. A common approach is to first estimate the pose in 2D, resulting in a confidence heatmap, and then estimate the 3D pose using the most likely estimations in 2D. This may, however, cause problems in cases where pose estimates are more uncertain and the estimation of one point is far from the true position, for example when a limb is occluded. This thesis adapts the method Neural Radiance Fields (NeRF) to 2D confidence heatmaps in order to create an implicit representation of the uncertainty in 3D, thus attempting to make use of as much information in 2D as possible. The adapted method was evaluated on the Human3.6M dataset, and results show that this method outperforms a simple triangulation baseline, especially when the estimation in 2D is far from the true pose. pose estimation neural radiance fields nerf computer vision machine learning
228	3D-Reconstruction of the Common Murre / 3D-Rekonstruering av Sillgrissla Hägerlind, Johannes January 2023 (has links) Automatic 3D reconstruction of birds can aid researchers in studying their behavior. Recently there has been an attempt to reconstruct a variety of birds from single-view images. However, the common murre's appearance is different from the birds that have been studied. Moreover, recent studies have focused on side views. This thesis studies the 3D reconstruction of the common murre from single-view top-view images. A template mesh is first optimized to fit a 3D scan. Then the result is used to optimize a species-specific mean from side-view images annotated with keypoints and silhouettes. The resulting mean mesh is used to initialize the optimization for top-down images. Using a mask loss, a pose prior loss, and a bone length loss that uses a mean vector from the side-view images improves the 3D reconstruction as rated by humans. Furthermore, the intersection over union (IoU) and percentage of correct keypoint (PCK), although used by other authors, are insufficient in a single-view top-view setting. pose estimation shape estimation 3D reconstruction articulated mesh bird common murre
229	Polarimetric Imagery for Object Pose Estimation Siefring, Matthew D. 15 May 2023 (has links) No description available. Electrical Engineering Optics Polarimetric Imagery visible-spectrum deep-learning object pose estimation CNN late-fusion Stokes-products dataset
230	Scanning Laser Registration and Structural Energy Density Based Active Structural Acoustic Control Manwill, Daniel Alan 17 December 2010 (has links) (PDF) To simplify the measurement of energy-based structural metrics, a general registration process for the scanning laser doppler vibrometer (SLDV) has been developed. Existing registration techniques, also known as pose estimation or position registration, suffer from mathematical complexity, instrument specificity, and the need for correct optimization initialization. These difficulties have been addressed through development of a general linear laser model and hybrid registration algorithm. These are applicable to any SLDV and allow the registration problem to be solved using straightforward mathematics. Additionally, the hybrid registration algorithm eliminates the need for correct optimization initialization by separating the optimization process from solution selection. The effectiveness of this approach is demonstrated through simulated application and by validation measurements performed on a specially prepared pipe. To increase understanding of the relationships between structural energy metrics and the acoustic response, the use of structural energy density (SED) in active structural acoustic control (ASAC) has also been studied. A genetic algorithm and other simulations were used to determine achievable reduction in acoustic radiation, characterize control system design, and compare SED-based control with the simpler velocity-based control. Using optimized sensor and actuator placements at optimally excited modal frequencies, attenuation of net acoustic intensity was proportional to attenuation of SED. At modal and non-modal frequencies, optimal SED-based ASAC system design is guided by establishing general symmetry between the structural disturbing force and the SED sensor and control actuator. Using fixed sensor and actuator placement, SED-based control has been found to provide superior performance to single point velocity control and very comparable performance to two-point velocity control. Its greatest strength is that it rarely causes unwanted amplifications of large amplitude when properly designed. Genetic algorithm simulations of SED-based ASAC indicated that optimal control effectiveness is obtained when sensors and actuators function in more than one role. For example, an actuator can be placed to simultaneously reduce structural vibration amplitude and reshape the response such that it radiates less efficiently. These principles can be applied to the design of any type of ASAC system. Daniel Manwill scanning laser doppler vibrometer registration pose estimation active structural acoustic control structural vibration acoustics genetic algorithm Mechanical Engineering

Search results