• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 118
  • 11
  • 7
  • 5
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 182
  • 182
  • 91
  • 66
  • 50
  • 34
  • 33
  • 32
  • 31
  • 30
  • 28
  • 27
  • 26
  • 26
  • 25
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
61

Vision-Based Rendering: Using Computational Stereo to Actualize IBR View Synthesis

Steele, Kevin L. 14 August 2006 (has links) (PDF)
Computer graphics imagery (CGI) has enabled many useful applications in training, defense, and entertainment. One such application, CGI simulation, is a real-time system that allows users to navigate through and interact with a virtual rendition of an existing environment. Creating such systems is difficult, but particularly burdensome is the task of designing and constructing the internal representation of the simulation content. Authoring this content on a computer usually requires great expertise and many man-hours of labor. Computational stereo and image-based rendering offer possibilities to automatically create simulation content without user assistance. However, these technologies have largely been limited to creating content from only a few photographs, severely limiting the simulation experience. The purpose of this dissertation is to enable the process of automated content creation for large numbers of photographs. The workflow goal consists of a user photographing any real-world environment intended for simulation, and then loading the photographs into the computer. The theoretical and algorithmic contributions of the dissertation are then used to transform the photographs into the data required for real-time exploration of the photographed locale. This permits a rich simulation experience without the laborious effort required to author the content manually. To approach this goal we make four contributions to the fields of computer vision and image-based rendering: an improved point correspondence methodology, an adjacency graph construction algorithm for unordered photographs, a pose estimation ordering for unordered image sets, and an image-based rendering algorithm that interpolates omnidirectional images to synthesize novel views. We encapsulate our contributions into a working system that we call Vision-Based Rendering (VBR). With our VBR system we are able to automatically create simulation content from a large unordered collection of input photographs. However, there are severe restrictions in the type of image content our present system can accurately simulate. Photographs containing large regions of high frequency detail are incorporated very accurately, but images with smooth color gradations, including most indoor photographs, create distracting artifacts in the final simulation. Thus our system is a significant and functional step toward the ultimate goal of simulating any real-world environment.
62

Light-weighted Deep Learning for LiDAR and Visual Odometry Fusion in Autonomous Driving

Zhang, Dingnan 20 December 2022 (has links)
No description available.
63

Automated Implementation of the Edinburgh Visual Gait Score (EVGS)

Ramesh, Shri Harini 14 July 2023 (has links)
Analyzing a person's gait is important in determining their physical and neurological health. However, typical motion analysis laboratories are only in urban specialty care facilities and can be expensive due to the specialized personnel and technology needed for these examinations. Many patients, especially those who reside in underdeveloped or isolated locations, find it impractical to go to such facilities. With the help of recent developments in high-performance computing and artificial intelligence models, it is now feasible to evaluate human movement using digital video. Over the past 20 years, various visual gait analysis tools and scales have been developed. A study of the literature and discussions with physicians who are domain experts revealed that the Edinburgh Visual Gait Score (EVGS) is one of the most effective scales currently available. Clinical implementations of EVGS currently rely on human scoring of videos. In this thesis, an algorithmic implementation of EVGS scoring based on hand-held smart phone video was implemented. Walking gait was recorded using a handheld smartphone at 60Hz as participants walked along a hallway. Body keypoints representing joints and limb segments were then identified using the OpenPose - Body 25 pose estimation model. A new algorithm was developed to identify foot events and strides from the keypoints and determine EVGS parameters at relevant strides. The stride identification results were compared with ground truth foot events that were manually labeled through direct observation, and the EVGS results were compared with evaluations by human scorers. Stride detection was accurate within 2 to 5 frames. The level of agreement between the scorers and the algorithmic EVGS score was strong for 14 of 17 parameters. The algorithm EVGS results were highly correlated to scorers' scores (r>0.80) for eight of the 17 factors. Smartphone-based remote motion analysis with automated implementation of the EVGS may be employed in a patient's neighborhood, eliminating the need to travel. These results demonstrated the viability of automated EVGS for remote human motion analysis.
64

Reconstructing 3D Humans From Visual Data

Zheng, Ce 01 January 2023 (has links) (PDF)
Understanding humans in visual content is fundamental for numerous computer vision applications. Extensive research has been conducted in the field of human pose estimation (HPE) to accurately locate joints and construct body representations from images and videos. Expanding on HPE, human mesh recovery (HMR) addresses the more complex task of estimating the 3D pose and shape of the entire human body. HPE and HMR have gained significant attention due to their applications in areas such as digital human avatar modeling, AI coaching, and virtual reality [135]. However, HPE and HMR come with notable challenges, including intricate body articulation, occlusion, depth ambiguity, and the limited availability of annotated 3D data. Despite the progress made so far, the research community continues to strive for robust, accurate, and efficient solutions in HPE and HMR, advancing us closer to the ultimate goals in the field. This dissertation tackles various challenges in the domains of HPE and HMR. The initial focus is on video-based HPE, where we proposed a transformer architecture named PoseFormer [136] to leverage to capture the spatial relationships between body joints and temporal correlations across frames. This approach effectively harnesses the comprehensive connectivity and expressive power of transformers, leading to improved pose estimation accuracy in video sequences. Building upon this, the dissertation addresses the heavy computational and memory burden associated with image-based HMR. Our proposed Feater Map-based Transformer method (FeatER [133]) and a Pooling attention transformer method (POTTER[130]), demonstrate superior performance while significantly reducing computational and memory requirements compared to existing state-of-the-art techniques. Furthermore, a diffusion-based framework (DiffMesh[134]) is proposed for reconstructing high-quality human mesh outputs given input video sequences. These achievements provide practical and efficient solutions that cater to the demands of real-world applications in HPE and HMR. In this dissertation, our contributions advance the fields of HPE and HMR, bringing us closer to accurate and efficient solutions for understanding humans in visual content.
65

Rigorous Model of Panoramic Cameras

Shin, Sung Woong 31 March 2003 (has links)
No description available.
66

Deep Learning for estimation of fingertip location in 3-dimensional point clouds : An investigation of deep learning models for estimating fingertips in a 3D point cloud and its predictive uncertainty

Hölscher, Phillip January 2021 (has links)
Sensor technology is rapidly developing and, consequently, the generation of point cloud data is constantly increasing. Since the recent release of PointNet, it is possible to process this unordered 3-dimensional data directly in a neural network. The company TLT Screen AB, which develops cutting-edge tracking technology, seeks to optimize the localization of the fingertips of a hand in a point cloud. To do so, the identification of relevant 3D neural network models for modeling hands and detection of fingertips in various hand orientations is essential. The Hand PointNet processes point clouds of hands directly and generate estimations of fixed points (joints), including fingertips, of the hands. Therefore, this model was selected to optimize the localization of fingertips for TLT Screen AB and forms the subject of this research. The model has advantages over conventional convolutional neural networks (CNN). First of all, in contrast to the 2D CNN, the Hand PointNet can use the full 3-dimensional spatial information. Compared to the 3D CNN, moreover, it avoids unnecessarily voluminous data and enables more efficient learning. The model was trained and evaluated on the public dataset MRSA Hand. In contrast to previously published work, the main object of this investigation is the estimation of only 5 joints, for the fingertips. The behavior of the model with a reduction from the usual 21 to 11 and only 5 joints are examined. It is found that the reduction of joints contributed to an increase in the mean error of the estimated joints. Furthermore, the examination of the distribution of the residuals of the estimate for fingertips is found to be less dense. MC dropout to study the prediction uncertainty for the fingertips has shown that the uncertainty increases when the joints are decreased. Finally, the results show that the uncertainty is greatest for the prediction of the thumb tip. Starting from the tip of the thumb, it is observed that the uncertainty of the estimates decreases with each additional fingertip.
67

Learned structural and temporal context for dynamic 3D pose optimization and tracking

Patel, Mahir 30 September 2022 (has links)
Accurate 3D tracking of animals from video recordings is critical for many behavioral studies. However, other than for humans, there is a lack of publicly available datasets of videos of animals that the computer vision community could use for model development. Furthermore, due to occlusion and the uncontrollable nature of the animals, existing pose estimation models suffer from inadequate precision. People rely on biomechanical expertise to design mathematical models to optimize poses to mitigate this issue at the cost of generalization. We propose OptiPose, a generalizable attention-based deep learning pose optimization model, as a part of a post-processing pipeline for refining 3D poses estimated by pre-existing systems. Our experiments show how OptiPose is highly robust to noise and occlusion and can be used to optimize pose sequences provided by state-of-the-art models for animal pose estimation. Furthermore, we will make Rodent3D, a multimodal (RGB, Thermal, and Depth) dataset for rats, publicly available.
68

3-D Face Modeling from a 2-D Image with Shape and Head Pose Estimation

Oyini Mbouna, Ralph January 2014 (has links)
This paper presents 3-D face modeling with head pose and depth information estimated from a 2-D query face image. Many recent approaches to 3-D face modeling are based on a 3-D morphable model that separately encodes the shape and texture in a parameterized model. The model parameters are often obtained by applying statistical analysis to a set of scanned 3-D faces. Such approaches tend to depend on the number and quality of scanned 3-D faces, which are difficult to obtain and computationally intensive. To overcome the limitations of 3-D morphable models, several modeling techniques from 2-D images have been proposed. We propose a novel framework for depth estimation from a single 2-D image with an arbitrary pose. The proposed scheme uses a set of facial features in a query face image and a reference 3-D face model to estimate the head pose angles of the face. The depth information of the subject at each feature point is represented by the depth information of the reference 3-D face model multiplied by a vector of scale factors. We use the positions of a set of facial feature points on the query 2-D image to deform the reference face dense model into a person specific 3-D face by minimizing an objective function. The objective function is defined as the feature disparity between the facial features in the face image and the corresponding 3-D facial features on the rotated reference model projected onto 2-D space. The pose and depth parameters are iteratively refined until stopping criteria are reached. The proposed method requires only a face image of arbitrary pose for the reconstruction of the corresponding 3-D face dense model with texture. Experiment results with USF Human-ID and Pointing'04 databases show that the proposed approach is effective to estimate depth and head pose information with a single 2-D image. / Electrical and Computer Engineering
69

Design of Viewpoint-Equivariant Networks to Improve Human Pose Estimation

Garau, Nicola 31 May 2022 (has links)
Human pose estimation (HPE) is an ever-growing research field, with an increasing number of publications in the computer vision and deep learning fields and it covers a multitude of practical scenarios, from sports to entertainment and from surveillance to medical applications. Despite the impressive results that can be obtained with HPE, there are still many problems that need to be tackled when dealing with real-world applications. Most of the issues are linked to a poor or completely wrong detection of the pose that emerges from the inability of the network to model the viewpoint. This thesis shows how designing viewpoint-equivariant neural networks can lead to substantial improvements in the field of human pose estimation, both in terms of state-of-the-art results and better real-world applications. By jointly learning how to build hierarchical human body poses together with the observer viewpoint, a network can learn to generalise its predictions when dealing with previously unseen viewpoints. As a result, the amount of training data needed can be drastically reduced, simultaneously leading to faster and more efficient training and more robust and interpretable real-world applications.
70

MORP: Monocular Orientation Regression Pipeline

Gunderson, Jacob 01 June 2024 (has links) (PDF)
Orientation estimation of objects plays a pivotal role in robotics, self-driving cars, and augmented reality. Beyond mere position, accurately determining the orientation of objects is essential for constructing precise models of the physical world. While 2D object detection has made significant strides, the field of orientation estimation still faces several challenges. Our research addresses these hurdles by proposing an efficient pipeline which facilitates rapid creation of labeled training data and enables direct regression of object orientation from a single image. We start by creating a digital twin of a physical object using an iPhone, followed by generating synthetic images using the Unity game engine and domain randomization. Our deep learning model, trained exclusively on these synthetic images, demonstrates promising results in estimating the orientations of common objects. Notably, our model achieves a median geodesic distance error of 3.9 degrees and operates at a brisk 15 frames per second.

Page generated in 0.0301 seconds