21 |
The Use of Contextual Clues in Reducing False Positives in an Efficient Vision-Based Head Gesture Recognition SystemBlonski, Brian M 01 June 2010 (has links) (PDF)
This thesis explores the use of head gesture recognition as an intuitive interface for computer interaction. This research presents a novel vision-based head gesture recognition system which utilizes contextual clues to reduce false positives. The system is used as a computer interface for answering dialog boxes. This work seeks to validate similar research, but focuses on using more efficient techniques using everyday hardware. A survey of image processing techniques for recognizing and tracking facial features is presented along with a comparison of several methods for tracking and identifying gestures over time. The design explains an efficient reusable head gesture recognition system using efficient lightweight algorithms to minimize resource utilization. The research conducted consists of a comparison between the base gesture recognition system and an optimized system that uses contextual clues to reduce false positives. The results confirm that simple contextual clues can lead to a significant reduction of false positives. The head gesture recognition system achieves an overall accuracy of 96% using contextual clues and significantly reduces false positives. In addition, the results from a usability study are presented showing that head gesture recognition is considered an intuitive interface and desirable above conventional input for answering dialog boxes. By providing the detailed design and architecture of a head gesture recognition system using efficient techniques and simple hardware, this thesis demonstrates the feasibility of implementing head gesture recognition as an intuitive form of interaction using preexisting infrastructure, and also provides evidence that such a system is desirable.
|
22 |
Statistical Modeling of Video Event MiningMa, Limin 13 September 2006 (has links)
No description available.
|
23 |
Enhancing Surgical Gesture Recognition Using Bidirectional LSTM and Evolutionary Computation: A Machine Learning Approach to Improving Robotic-Assisted Surgery / BiLSTM and Evolutionary Computation for Surgical Gesture RecognitionZhang, Yifei January 2024 (has links)
The integration of artificial intelligence (AI) and machine learning in the medical field has led to significant advancements in surgical robotics, particularly in enhancing the precision and efficiency of surgical procedures. This thesis investigates the application of a single-layer bidirectional Long Short-Term Memory (BiLSTM) model to the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS) dataset, aiming to improve the recognition and classification of surgical gestures. The BiLSTM model, with its capability to process data in both forward and backward directions, offers a comprehensive analysis of temporal sequences, capturing intricate patterns within surgical motion data. This research explores the potential of BiLSTM models to outperform traditional unidirectional models in the context of robotic surgery.
In addition to the core model development, this study employs evolutionary computation techniques for hyperparameter tuning, systematically searching for optimal configurations to enhance model performance. The evaluation metrics include training and validation loss, accuracy, confusion matrices, prediction time, and model size. The results demonstrate that the BiLSTM model with evolutionary hyperparameter tuning achieves superior performance in recognizing surgical gestures compared to standard LSTM models.
The findings of this thesis contribute to the broader field of surgical robotics and human-AI partnership by providing a robust method for accurate gesture recognition, which is crucial for assessing and training surgeons and advancing automated and assistive technologies in surgical procedures. The improved model performance underscores the importance of sophisticated hyperparameter optimization in developing high-performing deep learning models for complex sequential data analysis. / Thesis / Master of Applied Science (MASc) / Advancements in artificial intelligence (AI) are transforming medicine, particularly in robotic surgery. This thesis focuses on improving how robots recognize and classify surgeons' movements during operations. Using a special AI model called a bidirectional Long Short-Term Memory (BiLSTM) network, which looks at data both forwards and backwards, the study aims to better understand and predict surgical gestures.
By applying this model to a dataset of surgical tasks, specifically suturing, and optimizing its settings with advanced techniques, the research shows significant improvements in accuracy and efficiency over traditional methods. The enhanced model is not only more accurate but also smaller and faster.
These improvements can help train surgeons more effectively and advance robotic assistance in surgeries, leading to safer and more precise operations, ultimately benefiting both surgeons and patients.
|
24 |
An Accelerometer-based Gesture Recognition System for a Tactical Communications ApplicationTidwell, Robert S., Jr. 12 1900 (has links)
In modern society, computers are primarily interacted with via keyboards, touch screens, voice recognition, video analysis, and many others. For certain applications, these methods may be the most efficient interface. However, there are applications that we can conceive where a more natural interface could be convenient and connect humans and computers in a more intuitive and natural way. These applications are gesture recognition systems and range from the interpretation of sign language by a computer to virtual reality control. This Thesis proposes a gesture recognition system that primarily uses accelerometers to capture gestures from a tactical communications application. A segmentation algorithm is developed based on the accelerometer energy to segment these gestures from an input sequence. Using signal processing and machine learning techniques, the segments are reduced to mathematical features and classified with support vector machines. Experimental results show that the system achieves an overall gesture recognition accuracy of 98.9%. Additional methods, such as non-gesture recognition/suppression, are also proposed and tested.
|
25 |
Multi-Manifold learning and Voronoi region-based segmentation with an application in hand gesture recognitionHettiarachchi, Randima 12 1900 (has links)
A computer vision system consists of many stages, depending on its application. Feature extraction and segmentation are two key stages of a typical computer vision system and hence developments in feature extraction and segmentation are significant in improving the overall performance of a computer vision system. There are many inherent problems associated with feature extraction and segmentation processes of a computer vision system. In this thesis, I propose novel solutions to some of these problems in feature extraction and segmentation.
First, I explore manifold learning, which is a non-linear dimensionality reduction technique for feature extraction in high dimensional data. The classical manifold learning techniques perform dimensionality reduction assuming that original data lie on a single low dimensional manifold. However, in reality, data sets often consist of data belonging to multiple classes, which lie on their own manifolds. Thus, I propose a multi-manifold learning technique to simultaneously learn multiple manifolds present in a data set, which cannot be achieved through classical single manifold learning techniques.
Secondly, in image segmentation, when the number of segments of the image is not known, automatically determining the number of segments becomes a challenging problem. In this thesis, I propose an adaptive unsupervised image segmentation technique based on spatial and feature space Dirichlet tessellation as a solution to this problem. Skin segmentation is an important as well as a challenging problem in computer vision applications. Thus, thirdly, I propose a novel skin segmentation technique by combining the multi-manifold learning-based feature extraction and Vorono\"{i} region-based image segmentation.
Finally, I explore hand gesture recognition, which is a prevalent topic in intelligent human computer interaction and demonstrate that the proposed improvements in the feature extraction and segmentation stages improve the overall recognition rates of the proposed hand gesture recognition framework. I use the proposed skin segmentation technique to segment the hand, the object of interest in hand gesture recognition and manifold learning for feature extraction to automatically extract the salient features. Furthermore, in this thesis, I show that different instances of the same dynamic hand gesture have similar underlying manifolds, which allows manifold-matching based hand gesture recognition. / February 2017
|
26 |
Master ’s Programme in Information Technology: Using multiple Leap Motion sensors in Assembly workplace in Smart FactoryKarimi, Majid January 2016 (has links)
The new industry revolution creates a vast transformation in the manufacturing methods. Embedded Intelligence and communication technologies facilitate the execution of the smart factory. It can provide lots of features for strong customization of products. Assembly system is a critical segment of the smart factory. However, the complexity of production planning and the variety of products being manufactured, persuade the factories to use different methods to guide the workers for unfamiliar tasks in the assembly section. Motion tracking is the process of capturing the movement of human body or objects which has been used in different industrial systems. It can be integrated to a wide range of applications such as interacting with computers, games and entertainment, industry, etc. Motion tracking can be integrated to assembly systems and it has the potential to create an improvement in this industry as well. But the integration of motion tracking in industrial processes is still not widespread. This thesis work provides a fully automatic tracking solution for future systems in manufacturing industry and other fields. In general a configurable, flexible, and scalable motion tracking system is created in this thesis work to amend the tracking process. According to our environment, we have done a research between different motion tracking methods and technologies including Kinect and Leap Motion sensor, and finally the leap motion sensor is selected as the most appropriate method, because it fulfils our demands in this project. Multiple Leap motion sensors are used in this work to cover areas with different size. Data fusion between multiple leap motion sensors can be considered as another novel contribution of this thesis work. To achieve this goal data from multiple sensors are combined. This system can improve the lack of accuracy in order to creating a practical industrial application. By fusion of several sensors in order to achieve accuracies that allow implementation in practice, a motion tracking system with higher accuracy is created.
|
27 |
Gestural musical interfaces using real time machine learningDasari, Sai Sandeep January 1900 (has links)
Master of Science / Department of Computer Science / William H. Hsu / We present gestural music instruments and interfaces that aid musicians and audio engineers to express themselves efficiently. While we have mastered building a wide variety of physical instruments, the quest for virtual instruments and sound synthesis is on the rise. Virtual instruments are essentially software that enable musicians to interact with a sound module in the computer. Since the invention of MIDI (Musical Instrument Digital Interface), devices and interfaces to interact with sound modules like keyboards, drum machines, joysticks, mixing and mastering systems have been flooding the music industry.
Research in the past decade gone one step further in interacting through simple musical gestures to create, shape and arrange music in real time. Machine learning is a powerful tool that can be smartly used to teach simple gestures to the interface. The ability to teach innovative gestures and shape the way a sound module behaves unleashes the untapped creativity of an artist. Timed music and multimedia programs such as Max/MSP/Jitter along with machine learning techniques open gateways to embodied musical experiences without physical touch. This master's report presents my research, observations and how this interdisciplinary field of research could be used to study wider neuroscience problems like embodied music cognition and human-computer interactions.
|
28 |
Generalized Conditional Matching Algorithm for Ordered and Unordered SetsKrishnan, Ravikiran 13 November 2014 (has links)
Designing generalized data-driven distance measures for both ordered and unordered set data is the core focus of the proposed work. An ordered set is a set where time-linear property is maintained when distance between pair of temporal segments. One application in the ordered set is the human gesture analysis from RGBD data. Human gestures are fast becoming the natural form of human computer interaction. This serves as a motivation to modeling, analyzing, and recognition of gestures. The large number of gesture categories such as sign language, traffic signals, everyday actions and also subtle cultural variations in gesture classes makes gesture recognition a challenging problem. As part of generalization, an algorithm is proposed as part of an overlap speech detection application for unordered set.
Any gesture recognition task involves comparing an incoming or a query gesture against a training set of gestures. Having one or few samples deters any class statistic learning approaches to classification, as the full range of variation is not covered. Due to the large variability in gesture classes, temporally segmenting individual gestures also becomes hard. A matching algorithm in such scenarios needs to be able to handle single sample classes and have the ability to label multiple gestures without temporal segmentation.
Each gesture sequence is considered as a class and each class is a data point on an input space. A pair-wise distances pattern between to gesture frame sequences conditioned on a third (anchor) sequence is considered and is referred to as warp vectors. Such a process is defined as conditional distances. At the algorithmic core we have two dynamic time warping processes, one to compute the warp vectors with the anchor sequences and the other to compare these warp vectors. We show that having class dependent distance function can disambiguate classification process where the samples of classes are close to each other. Given a situation where the model base is large (number of classes is also large); the disadvantage of such a distance would be the computational cost. A distributed version combined with sub-sampling anchor gestures is proposed as speedup strategy. In order to label multiple connected gestures in query we use a simultaneous segmentation and recognition matching algorithm called level building algorithm. We use the dynamic programming implementation of the level building algorithm. The core of this algorithm depends on a distance function that compares two gesture sequences. We propose that, we replace this distance function, with the proposed distances. Hence, this version of level building is called as conditional level building (clb). We present results on a large dataset of 8000 RGBD sequences spanning over 200 gesture classes, extracted from the ChaLearn Gesture Challenge dataset. The result is that there is significant improvement over the underlying distance used to compute conditional distance when compared to conditional distance.
As an application of unordered set and non-visual data, overlap speech segment detection algorithm is proposed. Speech recognition systems have a vast variety of application, but fail when there is overlap speech involved. This is especially true in a meeting-room setting. The ability to recognize speaker and localize him/her in the room is an important step towards a higher-level representation of the meeting dynamics. Similar to gesture recognition, a new distance function is defined and it serves as the core of the algorithm to distinguish between individual speech and overlap speech temporal segments. The overlap speech detection problem is framed as outlier detection problem. An incoming audio is broken into temporal segments based on Bayesian Information Criterion (BIC). Each of these segments is considered as node and conditional distance between the nodes are determined. The underlying distances for triples used in conditional distances is the symmetric KL distance. As each node is modeled as a Gaussian, the distance between the two segments or nodes is given by Monte-Carlo estimation of the KL distance. An MDS based global embedding is created based on the pairwise distance between the nodes and RANSAC is applied to compute the outliers. NIST meeting room data set is used to perform experiments on the overlap speech detection. An improvement of more than 20% is achieved with conditional distance based approach when compared to a KL distance based approach.
|
29 |
A Framework for Mobile Paper-based ComputingSylverberg, Tomas January 2007 (has links)
<p>Military work-practice is a difficult area of research where paper-based approaches are still extended. This thesis proposes a solution which permits the digitalization of information at the same time as workpractice remains unaltered for soldiers working with maps in the field. For this purpose, a mobile interactive paper-based platform has been developed which permits the users to maintain their current work-flow. The premise of the solution parts from a system consisting of a prepared paper-map, a cellular phone, a desktop computer, and a digital pen with bluetooth connection. The underlying idea is to permit soldiers to take advantage of the information a computerized system can offer, at the same time as the overhead it incurs is minimized. On one hand this implies that the solution must be light-weight, on the other it must retain current working procedures as far as possible. The desktop computer is used to develop new paper-driven applications through the application provided in the development framework, thus allowing the tailoring of applications to the changing needs of military operations. One major component in the application suite is a symbol recognizer which is capable of recognizing symbols parting from a template which can be created in one of the applications. This component permits the digitalization of information in the battlefield by drawing on the paper-map. The proposed solution has been found to be viable, but still there is a need for further development. Furthermore, there is a need to adapt the existing hardware to the requirements of the military to make it usable in a real-world situation.</p>
|
30 |
Mixed reality interactive storytelling : acting with gestures and facial expressionsMartin, Olivier 04 May 2007 (has links)
This thesis aims to answer the following question : “How could gestures and facial expressions be used to control the behavior of an interactive entertaining application?”. An answer to this question is presented and illustrated in the context of mixed reality interactive storytelling.
The first part focuses on the description of the Artificial Intelligence (AI) mechanisms that are used to model and control the behavior of the application. We present an efficient real-time hierarchical planning engine, and show how active modalities (such as intentional gestures) and passive modalities (such as facial expressions) can be integrated into the planning algorithm, in such a way that the narrative (driven by the behavior of the virtual characters inside the virtual world) can effectively evolve in accordance with user interactions.
The second part is devoted to the automatic recognition of user interactions. After briefly describing the implementation of a simple but robust rule-based gesture recognition system, the emphasis is set on facial expression recognition. A complete solution integrating state-of-the-art techniques along with original contributions is drawn. It includes face detection, facial feature extraction and analysis. The proposed approach combines statistical learning and probabilistic reasoning in order to deal with the uncertainty associated with the process of modeling facial expressions.
|
Page generated in 0.0298 seconds