Global ETD Search

311	Sound visualisation as an aid for the deaf : a new approach Soltani-Farani, A. A. January 1998 (has links) Visual translation of speech as an aid for the deaf has long been a subject of electronic research and development. This thesis is concerned with a technique of sound visualisation based upon the theory of the primacy of dynamic, rather than static, information in the perception of speech sounds. The goal is design and evaluation of a system to display the perceptually important features of an input sound in a dynamic format as similar as possible to the auditory representation of that sound. The human auditory system, as the most effective system of sound representation, is first studied. Then, based on the latest theories of hearing and techniques of auditory modelling, a simplified model of the human ear is developed. In this model, the outer and middle ears together are simulated by a high-pass filter, and the inner ear is modelled by a bank of band-pass filters the outputs of which, after rectification and compression, are applied to a visualiser block. To design an appropriate visualiser block, theories of sound and speech perception are reviewed. Then the perceptually important properties of sound, and their relations to the physical attributes of the sound pressure wave, are considered to map the outputs from the auditory model onto an informative and recognisable running image-like the one known as cochleagram. This conveyor-like image is then sampled by a window of 20 milliseconds duration at a rate of 50 samples per second, so that a sequence of phase-locked, rectangular images is produced. Animation of these images results in a novel method of spectrography displaying both the time-varying and the time-independent information of the underlying sound with a high resolution in real time. The resulting system translates a spoken word into a visual gesture, and displays a still picture when the input is a steady state sound. Finally the implementation of this visualiser system is evaluated through several experiments undertaken by normal-hearing subjects. In these experiments, recognition of the gestures of a number of spoken words, is examined through a set of two-word and multi-word forced-choice tests. The results of these preliminary experiments show a high recognition score (40-90 percent, where zero represents chance expectation) after only 10 learning trials. General conclusions from the results suggest: a potential quick learning of the gestures, language independence of the system, fidelity of the system in translating the auditory information, and persistence of the learned gestures in the long-term memory. The results are very promising and motivate further investigations. 610.28 Perception; Recognition
312	Investigating audio classification to automate the trimming of recorded lectures Govender, Devandran 01 February 2018 (has links) With the demand for recorded lectures to be made available as soon as possible, the University of Cape Town (UCT) needs to find innovative ways of removing bottlenecks in lecture capture workflow and thereby improving turn-around times from capture to publication. UCT utilises Opencast, which is an open source system to manage all the steps in the lecture-capture process. One of the steps involves manual trimming of unwanted segments from the beginning and end of video before it is published. These segments generally contain student chatter. The trimming step of the lecture-capture process has been identified as a bottleneck due to its dependence on staff availability. In this study, we investigate the potential of audio classification to automate this step. A classification model was trained to detect 2 classes: speech and non-speech. Speech represents a single dominant voice, for example, the lecturer, and non-speech represents student chatter, silence and other environmental sounds. In conjunction with the classification model, the first and last instances of the speech class together with their timestamps are detected. These timestamps are used to predict the start and end trim points for the recorded lecture. The classification model achieved a 97.8% accuracy rate at detecting speech from non-speech. The start trim point predictions were very positive, with an average difference of -11.22s from gold standard data. End trim point predictions showed a much greater deviation, with an average difference of 145.16s from gold standard data. Discussions between the lecturer and students, after the lecture, was predominantly the reason for this discrepancy. I.5 PATTERN RECOGNITION
313	Wavelet-based techniques for speech recognition Farooq, Omar January 2002 (has links) In this thesis, new wavelet-based techniques have been developed for the extraction of features from speech signals for the purpose of automatic speech recognition (ASR). One of the advantages of the wavelet transform over the short time Fourier transform (STFT) is its capability to process non-stationary signals. Since speech signals are not strictly stationary the wavelet transform is a better choice for time-frequency transformation of these signals. In addition it has compactly supported basis functions, thereby reducing the amount of computation as opposed to STFT where an overlapping window is needed. 621 Phoneme recognition
314	Lexical stress and lexical access : effects in read and spontaneous speech McAllister, Janice Margaret January 1989 (has links) This thesis examines three issues which are of importance in the study of auditory word recognition: the phonological unit which is used to access representations in the mental lexicon; the extent to which hearers can rely on words being identified before their acoustic offsets; and the role of context in auditory word recognition. Three hypotheses which are based on the predictions of the Cohort Model (Marslen-Wilson and Tyler 1980) are tested experimentally using the gating paradigm. First, the phonological access hypothesis claims that word onsets, rather than any other part of the word, are used to access representations in the mental lexicon. An alternative candidate which has been proposed as the initiator of lexical access is the stressed syllable. Second, the early recognition hypothesis states that polysyllabic words, and the majority of words heard in context, will be recognised before their acoustic offsets. Finally, the context-free hypothesis predicts that during the initial stages of the processing of words, no effects of context will be discernible. Experiment 1 tests all three predictions by manipulating aspects of carefully articulated, read speech. First, examination of the gating responses from three context conditions offers no support for the context-free hypothesis. Second, the high number of words which are identified before their acoustic offsets is consistent with the early recognition hypothesis. Finally, the phonological access hypothesis is tested by manipulation of the stress patterns of stimuli. The dependent variables which are examined relate to the processes of lexical access and lexical retrieval; stress differences are found on access measures but not on those relating to retrieval. When the experiment is replicated with a group of subjects whose level of literacy is lower than that of the undergraduates who took part in the original experiment, differences are found in measures relating to contextual processing. Experiment 2 continues to examine the phonological access hypothesis, by manipulating speech style (read versus conversational) as well as stress pattern. Gated words, excised from the speech of six speakers, are presented in isolation. Words excised from read speech and words stressed on the first syllable elicit a greater number of responses which match the stimuli than conversational tokens and words with unstressed initial syllables. Intelligibility differences among the four conditions are also reported. Experiment 3 aims to investigate the processing of read and spontaneous tokens heard in context, while maintaining the manipulation of stress pattern. A subset of the words from Experiment 2 are presented in their original sentence contexts: the test words themselves, plus up to three subsequent words, are gated. Although the presence of preceding context generally enhances intelligibility, some words remain unrecognised by the end of the third subsequent word. An interaction between stress and speech style may be explained in terms of the unintelligibility of the preceding context. Several issues arising from Experiments 1, 2 and 3 are considered further. The characteristics of words which fail to be recognised before their offsets are examined using the statistical technique of regression; the contributions of phonetic and phonological aspects of stressed syllables are assessed; and a further experiment is reported which explores top-down processing in spontaneous speech, and which offers support for the interpretation of the results of Experiment 3 offered earlier. 410 Auditory word recognition
315	Investigating and assessing comprehension ability Spooner, Alice L. R. January 2001 (has links) No description available. 153 Sentence recognition
316	The extraction and recognition of text from multimedia document images Smith, R. W. January 1987 (has links) No description available. 621.3994 Image recognition processes
317	From structure to function in biorganic molecules Nedderman, Angus N. R. January 1991 (has links) No description available. 547 Molecular recognition; Antibiotics
318	Image-based face recognition under varying pose and illuminations conditions Du, Shan 05 1900 (has links) Image-based face recognition has attained wide applications during the past decades in commerce and law enforcement areas, such as mug shot database matching, identity authentication, and access control. Existing face recognition techniques (e.g., Eigenface, Fisherface, and Elastic Bunch Graph Matching, etc.), however, do not perform well when the following case inevitably exists. The case is that, due to some variations in imaging conditions, e.g., pose and illumination changes, face images of the same person often have different appearances. These variations make face recognition techniques much challenging. With this concern in mind, the objective of my research is to develop robust face recognition techniques against variations. This thesis addresses two main variation problems in face recognition, i.e., pose and illumination variations. To improve the performance of face recognition systems, the following methods are proposed: (1) a face feature extraction and representation method using non-uniformly selected Gabor convolution features, (2) an illumination normalization method using adaptive region-based image enhancement for face recognition under variable illumination conditions, (3) an eye detection method in gray-scale face images under various illumination conditions, and (4) a virtual pose generation method for pose-invariant face recognition. The details of these proposed methods are explained in this thesis. In addition, we conduct a comprehensive survey of the existing face recognition methods. Future research directions are pointed out. / Applied Science, Faculty of / Electrical and Computer Engineering, Department of / Graduate Face recognition Image processing
319	Rozpoznávání aktivit v prostředí smart homes / Activity recognition in a smart home setting Fiklík, Vladimír January 2015 (has links) The aim of this work was to implement and compare several activity recognition algorithms which could be used in a smart home environment and would be able to determine the current activity of an observed subject (virtual agent) in the smart home using only data gathered by elementary observations of the environment. Such algorithms are useful in several areas, for example to improve behavior of various virtual agents, making them more aware of actions of the other agents. The algorithms used in this thesis are based on Dynamic Bayesian Networks and have ability to determine whether the observed activity has been completed or just interrupted. An easily extensible 3D interactive simulator of a smart home environment was created to meet the needs of activity recognition and used to gather data for the learning and testing phases of the algorithms. The test subjects were human-controlled virtual agents. activity recognition; smart home
320	A multi-agent planner for modelling dialogue Taylor, J. A. January 1994 (has links) No description available. 629.892 Plan recognition

Search results