Spelling suggestions: "subject:"filterbank""
1 |
Robust order N wavelet filterbanks to perform 2-D numerical integration directly from partial difference or gradient measurementsHampton, Peter John 14 June 2010 (has links)
In this dissertation, a new method for the numerical integration of two-dimensional partial differences is presented. The approach is based on obtaining an estimate of the 2-D Haar wavelet decomposition of the integrated differences by filtering and down-sampling the partial difference measurement data as an intermediate step. Then, this decomposition estimate is synthesized into an estimate of the integrated differences.
The filterbanks required for estimating this decomposition are derived directly from the 2-D Haar Wavelet Analysis Filterbank. The order of operations of this process is manipulated in a novel way so that gradient or partial difference data can be used as input to the filterbank instead of the image data. The original data can then be obtained from this decomposition estimate using unmodified 2-D Haar Wavelet Synthesis Filterbanks. This use of the wavelet decomposition brings a reduction in computation complexity to less than 10 operations per pixel of the result.
This dissertation shows that the data used for this algorithm may be calculated partial differences or discretely sampled gradient data measurements. This data set may have any-sized convex area of support as long as it is on a Cartesian grid. The method is stable as a component of a closed loop system as shown by simulations of a recently developed woofer-tweeter adaptive optics control system.
|
2 |
Auditory-based processing of communication soundsWalters, Thomas C. January 2011 (has links)
This thesis examines the possible benefits of adapting a biologically-inspired model of human auditory processing as part of a machine-hearing system. Features were generated by an auditory model, and used as input to machine learning systems to determine the content of the sound. Features were generated using the auditory image model (AIM) and were used for speech recognition and audio search. AIM comprises processing to simulate the human cochlea, and a 'strobed temporal integration' process which generates a stabilised auditory image (SAI) from the input sound. The communication sounds which are produced by humans, other animals, and many musical instruments take the form of a pulse-resonance signal: pulses excite resonances in the body, and the resonance following each pulse contains information both about the type of object producing the sound and its size. In the case of humans, vocal tract length (VTL) determines the size properties of the resonance. In the speech recognition experiments, an auditory filterbank was combined with a Gaussian fitting procedure to produce features which are invariant to changes in speaker VTL. These features were compared against standard mel-frequency cepstral coefficients (MFCCs) in a size-invariant syllable recognition task. The VTL-invariant representation was found to produce better results than MFCCs when the system was trained on syllables from simulated talkers of one range of VTLs and tested on those from simulated talkers with a different range of VTLs. The image stabilisation process of strobed temporal integration was analysed. Based on the properties of the auditory filterbank being used, theoretical constraints were placed on the properties of the dynamic thresholding function used to perform strobe detection. These constraints were used to specify a simple, yet robust, strobe detection algorithm. The syllable recognition system described above was then extended to produce features from profiles of the SAI and tested with the same syllable database as before. For clean speech, performance of the features was comparable to that of those generated from the filterbank output. However when pink noise was added to the stimuli, performance dropped more slowly as a function of signal-to-noise ratio when using the SAI-based AIM features, than when using either the filterbank-based features or the MFCCs, demonstrating the noise-robustness properties of the SAI representation. The properties of the auditory filterbank in AIM were also analysed. Three models of the cochlea were considered: the static gammatone filterbank, dynamic compressive gammachirp (dcGC) and the pole-zero filter cascade (PZFC). The dcGC and gammatone are standard filterbank models, whereas the PZFC is a filter cascade, which more accurately models signal propagation in the cochlea. However, while the architecture of the filterbanks is different, they have all been successfully fitted to psychophysical masking data from humans. The abilities of the filterbanks to measure pitch strength were assessed, using stimuli which evoke a weak pitch percept in humans, in order to ascertain whether there is any benefit in the use of the more computationally efficient PZFC.Finally, a complete sound effects search system using auditory features was constructed in collaboration with Google research. Features were computed from the SAI by sampling the SAI space with boxes of different scales. Vector quantization (VQ) was used to convert this multi-scale representation to a sparse code. The 'passive-aggressive model for image retrieval' (PAMIR) was used to learn the relationships between dictionary words and these auditory codewords. These auditory sparse codes were compared against sparse codes generated from MFCCs, and the best performance was found when using the auditory features.
|
3 |
Spectral And Temporal Zero-Crossings-Based Signal AnalysisShenoy, Ravi R 01 1900 (has links) (PDF)
We consider real zero-crossing analysis of the real/imaginary parts of the spectrum, namely, spectral zero-crossings (SZCs). The two major contributions are to show that: (i) SZCs provide enable temporal localization of transients; and (b) SZCs are suitable for modeling transient signals. We develop a spectral dual of Kedem’s result linking temporal zero-crossing rate (ZCR) to the spectral centroid. The key requirement is stationarity, which we achieve through random-phase modulations of the time-domain signal. Transient signals are not amenable to modelling in the time domain since they are bursts of energy localized in time and lack structure. We show that the spectrum of transient signals have a rich modulation structure, which leads to an amplitude-modulation – frequency-modulation (AM-FM) model of the spectrum.
We generalize Kedem’s arc-cosine formula for lags greater than one. For the specific case of a sinusoid in white Gaussian noise, He and Kedem devised an iterative filtering algorithm, which leads to a contraction mapping. An autoregressive filter of order one is employed and the location of the pole is the parameter that is updated based on the filtered output. We use the higher-order property, which relates the autocorrelation to the expected ZCR of the filtered process, between lagged ZCR and higher-lag autocorrelation to develop an iterative higher-order autoregressive-filtering scheme, which stabilizes the ZCR and consequently provides robust estimates of the autocorrelation at higher lags.
Next, we investigate ZC properties of critically sampled outputs of a maximally decimated M-channel power complementary analysis filterbank (PCAF) and derive the relationship between the ZCR of the input Gaussian process at lags that are integer multiples of M in terms of the subband ZCRs. Based on this result, we propose a robust autocorrelation estimator for a signal consisting of a sum of sinusoids of fixed amplitudes and uniformly distributed random phases. Robust subband ZCRs are obtained through iterative filtering and the subband variances are estimated using the method-of-moments estimator. We compare the performance of the proposed estimator with the sample auto-correlation estimate in terms of bias, variance, and mean-squared error, and show through simulations that the performance of the proposed estimator is better than the sample auto- correlation for medium to low SNR.
We then consider the ZC statistics of the real/imaginary parts of the discrete Fourier spectrum. We introduce the notion of the spectral zero-crossing rate (SZCR) and show that, for transients, it gives information regarding the location of the transient. We also demonstrate the utility of SZCR to estimate interaural time delay between the left and right head-related impulse responses. The accuracy of interaural time delay plays a vital role in binaural synthesis and a comparison of the performance of the SZCR estimates with that of the cross-correlation estimates illustrate that spectral zeros alone contain enough information for accurately estimating interaural time delay. We provide a mathematical formalism for establishing the dual of the link between zero-crossing rate and spectral centroid. Specifically, we show that the expected SZCR of a stationary spectrum is a temporal centroid. For a deterministic sequence, we obtain the stationary spectrum by modulating the sequence with a random phase unit amplitude sequence and then computing the spectrum. The notion of a stationary spectrum is necessary for deriving counterparts of the results available in temporal zero-crossings literature. The robustness of location information embedded in SZCR is analyzed in presence of a second transient within the observation window, and also in the presence of additive white Gaussian noise. A spectral-domain iterative filtering scheme based on autoregressive filters is presented and improvement in the robustness of the location estimates is demonstrated. As an application, we consider epoch estimation in voiced speech signals and show that the location information is accurately estimated using spectral zeros than other techniques.
The relationship between temporal centroid and SZCR also finds applications in frequency-domain linear prediction (FDLP), which is used in audio compression. The prediction coefficients are estimated by solving the Yule-Walker equations constructed from the spectral autocorrelation. We use the relationship between the spectral autocorrelation and temporal centroid to obtain the spectral autocorrelation directly by time-domain windowing without explicitly computing the spectrum. The proposed method leads to identical results as the standard FDLP method but with reduced computational load.
We then develop a SZCs-based spectral-envelope and group-delay (SEGD) model, which finds applications in modelling of non-stationary signals such as Castanets. Taking into account the modulation structure and spectral continuity, local polynomial regression is performed to estimate the GD from the real spectral zeros. The SE is estimated based on the phase function computed from the estimated GD. Since the GD estimate is parametric, the degree of smoothness can be controlled directly. Simulation results based on synthetic transient signals are presented to analyze the noise-robustness of the SE-GD model. Applications to castanet modeling, transient compression, and estimation of the glottal closure instants in speech are shown.
|
Page generated in 0.0371 seconds