Global ETD Search

1	Auditory-based processing of communication sounds Walters, Thomas C. January 2011 (has links) This thesis examines the possible benefits of adapting a biologically-inspired model of human auditory processing as part of a machine-hearing system. Features were generated by an auditory model, and used as input to machine learning systems to determine the content of the sound. Features were generated using the auditory image model (AIM) and were used for speech recognition and audio search. AIM comprises processing to simulate the human cochlea, and a 'strobed temporal integration' process which generates a stabilised auditory image (SAI) from the input sound. The communication sounds which are produced by humans, other animals, and many musical instruments take the form of a pulse-resonance signal: pulses excite resonances in the body, and the resonance following each pulse contains information both about the type of object producing the sound and its size. In the case of humans, vocal tract length (VTL) determines the size properties of the resonance. In the speech recognition experiments, an auditory filterbank was combined with a Gaussian fitting procedure to produce features which are invariant to changes in speaker VTL. These features were compared against standard mel-frequency cepstral coefficients (MFCCs) in a size-invariant syllable recognition task. The VTL-invariant representation was found to produce better results than MFCCs when the system was trained on syllables from simulated talkers of one range of VTLs and tested on those from simulated talkers with a different range of VTLs. The image stabilisation process of strobed temporal integration was analysed. Based on the properties of the auditory filterbank being used, theoretical constraints were placed on the properties of the dynamic thresholding function used to perform strobe detection. These constraints were used to specify a simple, yet robust, strobe detection algorithm. The syllable recognition system described above was then extended to produce features from profiles of the SAI and tested with the same syllable database as before. For clean speech, performance of the features was comparable to that of those generated from the filterbank output. However when pink noise was added to the stimuli, performance dropped more slowly as a function of signal-to-noise ratio when using the SAI-based AIM features, than when using either the filterbank-based features or the MFCCs, demonstrating the noise-robustness properties of the SAI representation. The properties of the auditory filterbank in AIM were also analysed. Three models of the cochlea were considered: the static gammatone filterbank, dynamic compressive gammachirp (dcGC) and the pole-zero filter cascade (PZFC). The dcGC and gammatone are standard filterbank models, whereas the PZFC is a filter cascade, which more accurately models signal propagation in the cochlea. However, while the architecture of the filterbanks is different, they have all been successfully fitted to psychophysical masking data from humans. The abilities of the filterbanks to measure pitch strength were assessed, using stimuli which evoke a weak pitch percept in humans, in order to ascertain whether there is any benefit in the use of the more computationally efficient PZFC.Finally, a complete sound effects search system using auditory features was constructed in collaboration with Google research. Features were computed from the SAI by sampling the SAI space with boxes of different scales. Vector quantization (VQ) was used to convert this multi-scale representation to a sparse code. The 'passive-aggressive model for image retrieval' (PAMIR) was used to learn the relationships between dictionary words and these auditory codewords. These auditory sparse codes were compared against sparse codes generated from MFCCs, and the best performance was found when using the auditory features. 006.3
2	Deep Learning for Computer Vision and it's Application to Machine Perception of Hand and Object Sangpil Kim (9745326) 15 December 2020 (has links) <div>The advances in computing power and artificial intelligence have made applications such as augmented reality/virtual reality (AR/VR) and smart factories possible. In smart factories, robots interact with workers and, AR/VR devices are used for skill transfer. In order to enable these types of applications, a computer needs to recognize the user’s hand and body movement with objects and their interactions. In this regard, machine perception of hands and objects is the first step for human and computer integration. This is because personal activity is represented by the interaction of objects and hands. For machine perception of objects and hands, vision sensors are widely used in a wide range of industrial applications since visual information provides non-contact input signals. For these reasons, computer vision-oriented machine perception has been researched extensively. However, due to the complexity of object space and hand movement, machine perception of hands and objects remains a challenging problem.</div><div><br></div><div>Recently, deep learning has been introduced with groundbreaking results in the computer vision domain, which address many challenging problems and significantly improves the performance of AI in many tasks. The success of deep learning algorithms depends on the learning strategy and the quality and quantity of the training data. Therefore, in this thesis, we tackle machine perception of hands and objects with four aspects: learning underlying structure of 2D data, fusing surface and volume content of a 3D object, developing an annotation tool for mechanical components, and using thermal information of bare hands. More broadly, we improve the machine perception of interacting hand and object by developing a learning strategy and framework for large-scale dataset creation.</div><div><br></div><div>For the learning strategy, we use a conditional generative model, which learns conditional distribution of the dataset by minimizing the gap between data distribution and the model distribution for hands and objects. First, we propose an efficient conditional generative model for 2D images that can traverse the latent space given a conditional vector. Subsequently, we develop a conditional generative model for 3D space that fuses volume and surface representations and learns the association of functional parts. These methods improve machine perception of objects and hands for not only 2D images but also in 3D space. However, the performance of deep learning algorithms has positive correlation with the quality and quantity of datasets, which motivates us to develop the a large-scale dataset creation framework.</div><div><br></div><div>In order to leverage the learning strategies of deep learning algorithms, we develop annotation tools that can establish a large-scale dataset for objects and hands and evaluate existing deep learning methods with extensive performance analysis. For the object dataset creation, we establish a taxonomy of mechanical components and a web-based annotation tool. With this framework, we create a large-scale mechanical components dataset. With the dataset, we benchmark seven different machine perception algorithms for 3D objects. For hand annotation, we propose a novel data curation method for pixel-wise hand segmentation dataset creation, which uses thermal information and hand geometry to identify and segment the hands from objects and backgrounds. Also, we introduce a data fusion method that fuses thermal information and RGB-D data for the machine perception of hands while interacting with objects.</div> Computer Vision deep learning computer vision artificial intelligence machine perception
3	A Semantics-based Approach to Machine Perception Henson, Cory Andrew January 2013 (has links) No description available. Artificial Intelligence Computer Science Information Science Semantic Web Semantic Sensor Web Semantic Sensor Networks Semantic Perception Machine Perception
4	Laser-based detection and tracking of dynamic objects Wang, Zeng January 2014 (has links) In this thesis, we present three main contributions to laser-based detection and tracking of dynamic objects, from both a model-based point of view and a model-free point of view, with an emphasis on applications to autonomous driving. A segmentation-based detector is first proposed to provide an end-to-end detection of the classes car, pedestrian and bicyclist in 3D laser data amongst significant background clutter. We postulate that, for the particular classes considered, solving a binary classification task outperforms approaches that tackle the multi-class problem directly. This is confirmed using custom and third-party datasets gathered of urban street scenes. The sliding window approach to object detection, while ubiquitous in the Computer Vision community, is largely neglected in laser-based object detectors, possibly due to its perceived computational inefficiency. We give a second thought to this opinion in this thesis, and demonstrate that, by fully exploiting the sparsity of the problem, exhaustive window searching in 3D can be made efficient. We prove the mathematical equivalence between sparse convolution and voting, and devise an efficient algorithm to compute exactly the detection scores at all window locations, processing a complete Velodyne scan containing 100K points in less than half a second. Its superior performance is demonstrated on the KITTI dataset, and compares commensurably with state of the art vision approaches. A new model-free approach to detection and tracking of moving objects with a 2D lidar is then proposed aiming at detecting dynamic objects of arbitrary shapes and classes. Objects are modelled by a set of rigidly attached sample points along their boundaries whose positions are initialised with and updated by raw laser measurements, allowing a flexible, nonparametric representation. Dealing with raw laser points poses a significant challenge to data association. We propose a hierarchical approach, and present a new variant of the well-known Joint Compatibility Branch and Bound algorithm to handle large numbers of measurements. The system is systematically calibrated on real world data containing 7.5K labelled object examples and validated on 6K test cases. Its performance is demonstrated over an existing industry standard targeted at the same problem domain as well as a classical approach to model-free tracking. 629.2
5	Material and mechanical emulation of the human hand Hockings, Nicholas January 2017 (has links) The hands and feet account for half of the complexity of the musculoskeletal system, while the skin of the hand is specialised with many important structures. Much of the subtlety of the mechanism of the hand lies in the soft tissues, and the tactile and proprioceptive sensitivity depends on the large number of mechanoreceptors embedded in specific structures of the soft tissues. This thesis investigates synthetic materials and manufacturing techniques to enable building robots that reproduce the biomechanics and tactile sensitivity of vertebrates – histomimetic robotics. The material and mechanical anatomy of the hand is reviewed, highlighting difficulty of numerical measurement in soft-tissue anatomy, and the predictive nature of descriptive anatomical knowledge. The biomechanical mechanisms of the hand and their support of sensorimotor control are presented. A palate of materials and layup techniques are identified for emulating ligaments, joint surfaces, tendon networks, sheaths, soft matrices, and dermal structures. A method for thermoplastically drawing fine elastic fibres, with liquid metal amalgam cores, for connecting embedded sensors is demonstrated. The performance requirements of skeletal muscles are identified. Two classes of muscle-like bulk MEMS electrostatic actuators are shown theoretically to be capable of meeting these requirements. Means to manufacture them, and their additional application as mechanoreceptors are described. A novel machine perception algorithm is outlined as a solution to the problem of measuring soft tissue anatomy, CAD/CAE/CNC for layup of histomimetic robots, and sensory perception by such robots. The results of the work support the view that histomimetic robotics is a viable approach, and identify a number of areas for further investigation including: polymer modification by graft-polymerisation, automated layup tools, and machine perception. 617.5

1

Page generated in 0.0998 seconds