Global ETD Search

1	Auditory-based processing of communication sounds Walters, Thomas C. January 2011 (has links) This thesis examines the possible benefits of adapting a biologically-inspired model of human auditory processing as part of a machine-hearing system. Features were generated by an auditory model, and used as input to machine learning systems to determine the content of the sound. Features were generated using the auditory image model (AIM) and were used for speech recognition and audio search. AIM comprises processing to simulate the human cochlea, and a 'strobed temporal integration' process which generates a stabilised auditory image (SAI) from the input sound. The communication sounds which are produced by humans, other animals, and many musical instruments take the form of a pulse-resonance signal: pulses excite resonances in the body, and the resonance following each pulse contains information both about the type of object producing the sound and its size. In the case of humans, vocal tract length (VTL) determines the size properties of the resonance. In the speech recognition experiments, an auditory filterbank was combined with a Gaussian fitting procedure to produce features which are invariant to changes in speaker VTL. These features were compared against standard mel-frequency cepstral coefficients (MFCCs) in a size-invariant syllable recognition task. The VTL-invariant representation was found to produce better results than MFCCs when the system was trained on syllables from simulated talkers of one range of VTLs and tested on those from simulated talkers with a different range of VTLs. The image stabilisation process of strobed temporal integration was analysed. Based on the properties of the auditory filterbank being used, theoretical constraints were placed on the properties of the dynamic thresholding function used to perform strobe detection. These constraints were used to specify a simple, yet robust, strobe detection algorithm. The syllable recognition system described above was then extended to produce features from profiles of the SAI and tested with the same syllable database as before. For clean speech, performance of the features was comparable to that of those generated from the filterbank output. However when pink noise was added to the stimuli, performance dropped more slowly as a function of signal-to-noise ratio when using the SAI-based AIM features, than when using either the filterbank-based features or the MFCCs, demonstrating the noise-robustness properties of the SAI representation. The properties of the auditory filterbank in AIM were also analysed. Three models of the cochlea were considered: the static gammatone filterbank, dynamic compressive gammachirp (dcGC) and the pole-zero filter cascade (PZFC). The dcGC and gammatone are standard filterbank models, whereas the PZFC is a filter cascade, which more accurately models signal propagation in the cochlea. However, while the architecture of the filterbanks is different, they have all been successfully fitted to psychophysical masking data from humans. The abilities of the filterbanks to measure pitch strength were assessed, using stimuli which evoke a weak pitch percept in humans, in order to ascertain whether there is any benefit in the use of the more computationally efficient PZFC.Finally, a complete sound effects search system using auditory features was constructed in collaboration with Google research. Features were computed from the SAI by sampling the SAI space with boxes of different scales. Vector quantization (VQ) was used to convert this multi-scale representation to a sparse code. The 'passive-aggressive model for image retrieval' (PAMIR) was used to learn the relationships between dictionary words and these auditory codewords. These auditory sparse codes were compared against sparse codes generated from MFCCs, and the best performance was found when using the auditory features. 006.3
2	Logic-based modelling of musical harmony for automatic characterisation and classification Anglade, Amélie January 2014 (has links) Harmony is the aspect of music concerned with the structure, progression, and relation of chords. In Western tonal music each period had different rules and practices of harmony. Similarly some composers and musicians are recognised for their characteristic harmonic patterns which differ from the chord sequences used by other musicians of the same period or genre. This thesis is concerned with the automatic induction of the harmony rules and patterns underlying a genre, a composer, or more generally a 'style'. Many of the existing approaches for music classification or pattern extraction make use of statistical methods which present several limitations. Typically they are black boxes, can not be fed with background knowledge, do not take into account the intricate temporal dimension of the musical data, and ignore rare but informative events. To overcome these limitations we adopt first-order logic representations of chord sequences and Inductive Logic Programming techniques to infer models of style. We introduce a fixed length representation of chord sequences similar to n-grams but based on first-order logic, and use it to characterise symbolic corpora of pop and jazz music. We extend our knowledge representation scheme using context-free definite-clause grammars, which support chord sequences of any length and allow to skip ornamental chords, and test it on genre classification problems, on both symbolic and audio data. Through these experiments we also compare various chord and harmony characteristics such as degree, root note, intervals between root notes, chord labels and assess their characterisation and classification accuracy, expressiveness, and computational cost. Moreover we extend a state- of-the-art genre classifier based on low-level audio features with such harmony-based models and prove that it can lead to statistically significant classification improvements. We show our logic-based modelling approach can not only compete with and improve on statistical approaches but also provides expressive, transparent and musicologically meaningful models of harmony which makes it suitable for knowledge discovery purposes. 781.2
3	An evolutionary AI-based decision support system for urban regeneration planning Yusuf, Syed Adnan January 2010 (has links) The renewal of derelict inner-city urban districts suffering from high levels of socio-economic deprivation and sustainability problems is one of the key research areas in urban planning and regeneration. Subject to a wide range of social, economical and environmental factors, decision support for an optimal allocation of residential and service lots within such districts is regarded as a complex task. Pre-assessment of various neighbourhood factors before the commencement of actual location allocation of various public services is considered paramount to the sutainable outcome of regeneration projects. Spatial assessment in such derelict built-up areas requires planning of lot assignment for residential buildings in a way to maximize accessibility to public services while minimizing the deprivation of built neighbourhood areas. However, the prediction of socio-economic deprivation impact on the regeneration districts in order to optimize the location-allocation of public service infrastructure is a complex task. This is generally due to the highly conflicting nature of various service structures with various socio-economic and environmental factors. In regards to the problem given above, this thesis presents the development of an evolutionary AI-based decision support systemto assist planners with the assessment and optimization of regeneration districts. The work develops an Adaptive Network Based Fuzzy Inference System (ANFIS) based module to assess neighbourhood districts for various deprivation factors. Additionally an evolutionary genetic algorithms based solution is implemented to optimize various urban regeneration layouts based upon the prior deprivation assessment model. The two-tiered framework initially assesses socio-cultural deprivation levels of employment, health, crime and transport accessibility in neighbourhood areas and produces a deprivation impact matrix overthe regeneration layout lots based upon a trained, network-based fuzzy inference system. Based upon this impact matrix a genetic algorithm is developed to optimize the placement of various public services (shopping malls, primary schools, GPs and post offices) in a way that maximize the accessibility of all services to regenerated residential units as well as contribute to minimize the measure of deprivation of surrounding neighbourhood areas. The outcome of this research is evaluated over two real-world case studies presenting highly coherent results. The work ultimately produces a smart urban regeneration toolkit which provides designer and planner decision support in the form of a simulation toolkit. 711
4	Unsupervised construction of 4D semantic maps in a long-term autonomy scenario Ambrus, Rares January 2017 (has links) Robots are operating for longer times and collecting much more data than just a few years ago. In this setting we are interested in exploring ways of modeling the environment, segmenting out areas of interest and keeping track of the segmentations over time, with the purpose of building 4D models (i.e. space and time) of the relevant parts of the environment. Our approach relies on repeatedly observing the environment and creating local maps at specific locations. The first question we address is how to choose where to build these local maps. Traditionally, an operator defines a set of waypoints on a pre-built map of the environment which the robot visits autonomously. Instead, we propose a method to automatically extract semantically meaningful regions from a point cloud representation of the environment. The resulting segmentation is purely geometric, and in the context of mobile robots operating in human environments, the semantic label associated with each segment (i.e. kitchen, office) can be of interest for a variety of applications. We therefore also look at how to obtain per-pixel semantic labels given the geometric segmentation, by fusing probabilistic distributions over scene and object types in a Conditional Random Field. For most robotic systems, the elements of interest in the environment are the ones which exhibit some dynamic properties (such as people, chairs, cups, etc.), and the ability to detect and segment such elements provides a very useful initial segmentation of the scene. We propose a method to iteratively build a static map from observations of the same scene acquired at different points in time. Dynamic elements are obtained by computing the difference between the static map and new observations. We address the problem of clustering together dynamic elements which correspond to the same physical object, observed at different points in time and in significantly different circumstances. To address some of the inherent limitations in the sensors used, we autonomously plan, navigate around and obtain additional views of the segmented dynamic elements. We look at methods of fusing the additional data and we show that both a combined point cloud model and a fused mesh representation can be used to more robustly recognize the dynamic object in future observations. In the case of the mesh representation, we also show how a Convolutional Neural Network can be trained for recognition by using mesh renderings. Finally, we present a number of methods to analyse the data acquired by the mobile robot autonomously and over extended time periods. First, we look at how the dynamic segmentations can be used to derive a probabilistic prior which can be used in the mapping process to further improve and reinforce the segmentation accuracy. We also investigate how to leverage spatial-temporal constraints in order to cluster dynamic elements observed at different points in time and under different circumstances. We show that by making a few simple assumptions we can increase the clustering accuracy even when the object appearance varies significantly between observations. The result of the clustering is a spatial-temporal footprint of the dynamic object, defining an area where the object is likely to be observed spatially as well as a set of time stamps corresponding to when the object was previously observed. Using this data, predictive models can be created and used to infer future times when the object is more likely to be observed. In an object search scenario, this model can be used to decrease the search time when looking for specific objects. / <p>QC 20171009</p> Mobile robotics autonomous systems perception computer vision RGB-D object segmentation modelling and recognition semantic segmentation long-term autonomy mapping temporal modeling

1

Page generated in 0.1108 seconds