Global ETD Search

1	Analogue VLSI implementation of a 2-D sound localisation system Grech, Ivan January 2002 (has links) The position of a sound source can be accurately determined in both azimuth and elevation through the use of localisation cues extracted from the incident audio signals. Compared to lateral localisation, 2-D hardware localisation is novel and requires the extraction of spectral cues in addition to time delay cues. The objective of this work is to develop an analogue VLSI system which extracts these cues from audio signals arriving at the left and right channels of the system, and then map these cues to the source position. The use of analogue hardware, which is broadly adapted from the biological auditory system, enables fast and low power computation. To obtain accurate 2-D localisation from the hardware-extracted cues a novel algorithm for the mapping process has been developed. The performance of this algorithm is evaluated via simulation under different environmental conditions. The effects of hardware non-idealities on the localisation accuracy, including mismatches and noise are also assessed. The analogue hardware implementation is divided into three main sections: a front-end for splitting the input signal into different frequency bands and extraction of spectral cues, an onset detector for distinguishing between the incident portion and the echo portion of the acoustic signal, and a correlator for determination of time delay cues. Novel building blocks have been designed using standard CMOS in order to enable low voltage low power operation of the differential architecture essential for the accuracy of the extracted cues. A novel feedback technique enables accurately controlled Class AB operation of a low voltage switched-current memory cell. A novel cross-coupling technique ensures correct Class AB operation of a log-domain bandpass filter. The five chips developed here operate at ± 0.9 V supply. The system has been tested by applying audio signals convolved with a position-dependent transfer function at the input, and then processing the resulting hardware-generated cues. Measurement results show that 2-D localisation within 5° accuracy is achievable using hardware extracted cues. Key words: sound localisation, analogue VLSI, silicon cochlea, log domain, switched capacitor, switched current, current mode, analogue processing. 621 Audio signals
2	20-Bit digitisation and computer modelling of capsule array microphone responses Lynch-Aird, N. J. January 1988 (has links) No description available. 621.3822 Coding audio signals][Audio recording
3	Computational tonality estimation : signal processing and hidden Markov models Noland, Katy C. January 2009 (has links) This thesis investigates computational musical tonality estimation from an audio signal. We present a hidden Markov model (HMM) in which relationships between chords and keys are expressed as probabilities of emitting observable chords from a hidden key sequence. The model is tested first using symbolic chord annotations as observations, and gives excellent global key recognition rates on a set of Beatles songs. The initial model is extended for audio input by using an existing chord recognition algorithm, which allows it to be tested on a much larger database. We show that a simple model of the upper partials in the signal improves percentage scores. We also present a variant of the HMM which has a continuous observation probability density, but show that the discrete version gives better performance. Then follows a detailed analysis of the effects on key estimation and computation time of changing the low level signal processing parameters. We find that much of the high frequency information can be omitted without loss of accuracy, and significant computational savings can be made by applying a threshold to the transform kernels. Results show that there is no single ideal set of parameters for all music, but that tuning the parameters can make a difference to accuracy. We discuss methods of evaluating more complex tonal changes than a single global key, and compare a metric that measures similarity to a ground truth to metrics that are rooted in music retrieval. We show that the two measures give different results, and so recommend that the choice of evaluation metric is determined by the intended application. Finally we draw together our conclusions and use them to suggest areas for continuation of this research, in the areas of tonality model development, feature extraction, evaluation methodology, and applications of computational tonality estimation. 621.382
4	How does asymmetric latency in a closed network affect audio signals and strategies for dealing with asymmetric latency Lundberg, Fredrik January 2018 (has links) This study investigates Audio over IP. A stress test was used to see what impact asymmetric latency had on the audio signal in a closed network. The study was constructed into two parts. The first part is the stress test where two AoIP solutions were tested. The two solutions where exposed in two forms of asymmetric latency. First a fixed value was used, next, a custom script was used to simulate changing values of asymmetric latency. The second part of this study involved interviews that where conducted with representatives from the audio industry that are working with audio over IP on a dayto-day usage. The goal for these interviews was to figure out what knowledge the audio industry had about asymmetric latency, if the industry had experienced problems related to latency and what general knowledge the industry has about networks. It was found in the interviews that the limitation in AoIP isn’t the technology in itself but rather missing knowledge with the people that are using the systems. Audio over IP Latency Closed network Affect on audio signals Media and Communication Technology Medieteknik
5	Caring More About EQ Than IQ : Automatic Equalizing of Audio Signals Axelson-Fisk, Magnus January 2018 (has links) In this bachelor thesis, the possiblity to correct for room acousticsbased on frequency analysis is studied. A software to calculate transferfunctions online was constructed and tested. This was done using a ver-sion of the Maximum Length Sequence method, which is a method thatrequires long sequences for rooms with long reverberation. During theproject, it was noted that zero padding the sequences improved the ac-curacy greatly, it was also noted that the length of the zero pad aectedthe results. The software was tested both in computer simulations andin practice. While testing in practice, it was noted that the system haslimitations on which rooms it would work in. All testsignals were recordedand afterwards, compared to the original recording. The constructed soft-ware showed, that it is possible to correct for unknown transfer functionsusing only frequency analysis, to some extent. Further, it does correct forthe room's transfer function, but it is dicult to say if it this is valid forall rooms and transfer functions. Adaptive filtering Room correction Audio signals Elektroteknik och elektronik
6	Représentation de signaux robuste aux bruits - Application à la détection et l'identification des signaux d'alarme / Signals representation robust to noise - Application to the detection and identification of alarm signals El jili, Fatimetou 17 December 2018 (has links) Ces travaux ont pour application la détection l'identification des signaux audio et particulièrement les signaux d'alarmes de voitures prioritaires. Dans un premier temps, nous proposons une méthode de détection des signaux d'alarme dans un environnement bruité, fondée sur des techniques d'analyse temps-fréquence des signaux. Cette méthode permet de détecter et d'identifier des signaux d'alarmes noyés dans du bruit, y compris pour des rapports signal à bruit négatifs. Puis nous proposons une quantification des signaux robuste aux bruits de transmission. Il s'agit de remplacer chaque niveau de bit d'un vecteur d'échantillons temporels ou fréquentiels par un mot binaire de même longueur fourni par un codeur correcteur d'erreur. Dans une première approche, chaque niveau de bits est quantifié indépendamment des autres selon le critère de minimisation de la distance de Hamming. Dans une seconde approche, pour réduire l'erreur de quantification à robustesse égale, les différents niveaux de bits sont quantifiés successivement selon un algorithme de type matching pursuit. Cette quantification donne aux signaux une forme spécifique permettant par la suite de les reconnaitre facilement parmi d'autres signaux. Nous proposons donc enfin deux méthodes de détection et d'identification des signaux fondées sur la quantification robuste, opérant dans le domaine temporel ou dans le domaine fréquentiel, par minimisation de la distance entre les signaux reçus restreints à leurs bits de poids fort et les signaux de référence. Ces méthodes permettent de détecter et d'identifier les signaux dans des environnements à rapport signal à bruit très faible et ceci grâce à la quantification. Par ailleurs, la première méthode, fondée sur la signature temps-fréquence, s'avère plus performante avec les signaux quantifiés. / This work targets the detection and identification of audio signals and in particular alarm signals from priority cars. First, we propose a method for detecting alarm signals in a noisy environment, based on time-frequency signal analysis. This method makes it possible to detect and identify alarm signals embedded in noise, even with negative signal-to-noise ratios. Then we propose a signal quantization robust against transmission noise. This involves replacing each bit level of a vector of time or frequency samples with a binary word of the same length provided by an error- correcting encoder. In a first approach, each bit level is quantized independently of the others according to the Hamming distance minimization criterion. In a second approach, to reduce the quantization error at equal robustness, the different bit levels are quantized successively by a matching pursuit algorithm. This quantization gives the signals a specific shape that allows them to be easily recognized among other signals. Finally, we propose two methods for detecting and identifying signals based on robust quantization, operating in the time domain or in the frequency domain, by minimizing the distance between the received signals restricted to their high-weight bits and the reference signals. These methods make it possible to detect and identify signals in environments with very low signal-to-noise ratios, thanks to quantization. In addition, the first method, based on the time-frequency signature, is more efficient with quantized signals. Détection et Identification Quantification robuste Matching pursuit Signaux audio Codage de canal Quantification vectorielle Dectection and Identification Robust quantization Matching pursuit Audio signals Channel coding Vectorial quantization 006.45
7	Compressed Domain Processing of MPEG Audio Anantharaman, B 03 1900 (has links) MPEG audio compression techniques significantly reduces the storage and transmission requirements for high quality digital audio. However, compression complicates the processing of audio in many applications. If a compressed audio signal is to be processed, a direct method would be to decode the compressed signal, process the decoded signal and re-encode it. This is computationally expensive due to the complexity of the MPEG filter bank. This thesis deals with processing of MPEG compressed audio. The main contributions of this thesis are a) Extracting wavelet coefficients in the MPEG compressed domain. b) Wavelet based pitch extraction in MPEG compressed domain. c) Time Scale Modifications of MPEG audio. d) Watermarking of MPEG audio. The research contributions starts with a technique for calculating several levels of wavelet coefficients from the output of the MPEG analysis filter bank. The technique exploits the toeplitz structure which arises when the MPEG and wavelet filter banks are represented in a matrix form, The computational complexity for extracting several levels of wavelet coefficients after decoding the compressed signal and directly from the output of the MPEG analysis filter bank are compared. The proposed technique is found to be computationally efficient for extracting higher levels of wavelet coefficients. Extracting pitch in the compressed domain becomes essential when large multimedia databases need to be indexed. For example one may be interested in listening to a particular speaker or to listen to male female audio segments in a multimedia document. For this application, pitch information is one of the very basic and important features required. Pitch is basically the time interval between two successive glottal closures. Glottal closures are accompanied by sharp transients in the speech signal which in turn gives rise to a local maxima in the wavelet coefficients. Pitch can be calculated by finding the time interval between two successive maxima in the wavelet coefficients. It is shown that the computational complexity for extracting pitch in the compressed domain is less than 7% of the uncompressed domain processing. An algorithm for extracting pitch in the compressed domain is proposed. The result of this algorithm for synthetic signals, and utterances of words by male/female is reported. In a number of important applications, one needs to modify an audio signal to render it more useful than its original. Typical applications include changing the time evolution of an audio signal (increase or decrease the rate of articulation of a speaker),or to adapt a given audio sequence to a given video sequence. In this thesis, time scale modifications are obtained in the subband domain such that when the modified subband signals are given to the MPEG synthesis filter bank, the desired time scale modification of the decoded signal is achieved. This is done by making use of sinusoidal modeling [I]. Here, each of the subband signal is modeled in terms of parameters such as amplitude phase and frequencies and are subsequently synthesised by using these parameters with Ls = k La where Ls is the length of the synthesis window , k is the time scale factor and La is the length of the analysis window. As the PCM version of the time scaled signal is not available, psychoacoustic model based bit allocation cannot be used. Hence a new bit allocation is done by using a subband coding algorithm. This method has been satisfactorily tested for time scale expansion and compression of speech and music signals. The recent growth of multimedia systems has increased the need for protecting digital media. Digital watermarking has been proposed as a method for protecting digital documents. The watermark needs to be added to the signal in such a way that it does not cause audible distortions. However the idea behind the lossy MPEC encoders is to remove or make insignificant those portions of the signal which does not affect human hearing. This renders the watermark insignificant and hence proving ownership of the signal becomes difficult when an audio signal is compressed. The existing compressed domain methods merely change the bits or the scale factors according to a key. Though simple, these methods are not robust to attacks. Further these methods require original signal to be available in the verification process. In this thesis we propose a watermarking method based on spread spectrum technique which does not require original signal during the verification process. It is also shown to be more robust than the existing methods. In our method the watermark is spread across many subband samples. Here two factors need to be considered, a) the watermark is to be embedded only in those subbands which will make the addition of the noise inaudible. b) The watermark should be added to those subbands which has sufficient bit allocation so that the watermark does not become insignificant due to lack of bit allocation. Embedding the watermark in the lower subbands would cause distortion and in the higher subbands would prove futile as the bit allocation in these subbands are practically zero. Considering a11 these factors, one can introduce noise to samples across many frames corresponding to subbands 4 to 8. In the verification process, it is sufficient to have the key/code and the possibly attacked signal. This method has been satisfactorily tested for robustness to scalefactor, LSB change and MPEG decoding and re-encoding. Electrical Communications MPEG Audio Coding Digital Technique Audio Signal Processing Least Significant Bit (LSB) Audio Signals Compression Wavelet Coefficients Time Scale Modifications Sinusoidal Model Compressed Domain Wavelet Based Pitch Extraction Audio Watermarking
8	Blind Detection Techniques For Spread Spectrum Audio Watermarking Krishna Kumar, S 10 1900 (has links) In spreads pectrum (SS)watermarking of audio signals, since the watermark acts as an additive noise to the host audio signal, the most important challenge is to maintain perceptual transparency. Human perception is a very sensitive apparatus, yet can be exploited to hide some information, reliably. SS watermark embedding has been proposed, in which psycho-acoustically shaped pseudo-random sequences are embedded directly into the time domain audio signal. However, these watermarking schemes use informed detection, in which the original signal is assumed available to the watermark detector. Blind detection of psycho-acoustically shaped SS watermarking is not well addressed in the literature. The problem is still interesting, because, blind detection is more practical for audio signals and, psycho-acoustically shaped watermarks embedding oﬀers the maximum possible watermark energy under requirements of perceptual transparency. In this thesis we study the blind detection of psycho-acoustically shaped SS watermarks in time domain audio signals. We focus on a class of watermark sequences known as random phase watermarks, where the watermark magnitude spectrum is deﬁned by the perceptual criteria and the randomness of the sequence lies in their phase spectrum. Blind watermark detectors, which do not have access to the original host signal, may seem handicapped, because an approximate watermark has to be re-derived from the watermarked signal. Since the comparison of blind detection with fully informed detection is unfair, a hypothetical detection scheme, denoted as semi-blind detection, is used as a reference benchmark. In semi-blind detection, the host signal as such is not available for detection, but it is assumed that suﬃcient information is available for deriving the exact watermark, which could be embedded in the given signal. Some reduction in performance is anticipated in blind detection over the semi-blind detection. Our experiments revealed that the statistical performance of the blind detector is better than that of the semi-blind detector. We analyze the watermark-to-host correlation (WHC) of random phase watermarks, and the results indicate that WHC is higher when a legitimate watermark is present in the audio signal, which leads to better detection performance. Based on these ﬁndings, we attempt to harness this increased correlation in order to further improve the performance. The analysis shows that uniformly distributed phase diﬀerence (between the host signal and the watermark) provides maximum advantage. This property is veriﬁed through experimentation over a variety of audio signals. In the second part, the correlated nature of audio signals is identiﬁed as a potential threat to reliable blind watermark detection, and audio pre-whitening methods are suggested as a possible remedy. A direct deterministic whitening (DDW) scheme is derived, from the frequency domain analysis of the time domain correlation process. Our experimental studies reveal that, the Savitzky-Golay Whitening (SGW), which is otherwise inferior to DDW technique, performs better when the audio signal is predominantly low pass. The novelty of this work lies in exploiting the complementary nature of the two whitening techniques and combining them to obtain a hybrid whitening (HbW) scheme. In the hybrid scheme the DDW and SGW techniques are selectively applied, based on short time spectral characteristics of the audio signal. The hybrid scheme extends the reliability of watermark detection to a wider range of audio signals. We also discuss enhancements to the HbW technique for robustness to temporal oﬀsets and ﬁltering. Robustness of SS watermark blind detection, with hybrid whitening, is determined through a set of experiments and the results are presented. It is seen that the watermarking scheme is robust to common signal processing operations such as additive noise, ﬁltering, lossy compression, etc. Sound Recordings - Security Audio Systems - Watermarking Random Phase Watermarks Spread Spectrum Audio Watermarking Blind Watermark Detection Watermark Embedding Audio Signals - Correlation Audio Pre-whitening Audio Watermarking Blind Audio Watermark Detection Increased Correlation Improved Detection Communcations Engineering
9	Music And Speech Analysis Using The 'Bach' Scale Filter-Bank Ananthakrishnan, G 04 1900 (has links) The aim of this thesis is to deﬁne a perceptual scale for the ‘Time-Frequency’ analysis of music signals. The equal tempered ‘Bach ’ scale is a suitable scale, since it covers most of the genres of music and the error is equally distributed for each semi-tone. However, it may be necessary to allow a tolerance of around 50 cents or half the interval of the Bach scale, so that the interval can accommodate other common intonation schemes. The thesis covers the formulation of the Bach scale ﬁlter-bank as a time-varying model. It makes a comparative study with other commonly used perceptual scales. Two applications for the Bach scale ﬁlter-bank are also proposed, namely automated segmentation of speech signals and transcription of singing voice for query-by-humming applications. Even though this ﬁlter-bank is suggested with a motivation from music, it could also be applied to speech. A method for automatically segmenting continuous speech into phonetic units is proposed. The results, obtained from the proposed method, show around 82% accuracy for the English and 85% accuracy for the Hindi databases. This is an improvement of around 2 -3% when the performance is compared with other popular methods in the literature. Interestingly, the Bach scale ﬁlters perform better than the ﬁlters designed for other common perceptual scales, such as Mel and Bark scales. ‘Musical transcription’ refers to the process of converting a musical rendering or performance into a set of symbols or notations. A query in a ‘query-by-humming system’ can be made in several ways, some of which are singing with words, or with arbitrary syllables, or whistling. Two algorithms are suggested to annotate a query. The algorithms are designed to be fairly robust for these various forms of queries. The ﬁrst algorithm is a frequency selection based method. It works on the basis of selecting the most likely frequency components at any given time instant. The second algorithm works on the basis of ﬁnding time-connected contours of high energy in the ‘Time-Frequency’ plane of the input signal. The time domain algorithm works better in terms of instantaneous pitch estimates. It results in an error of around 10 -15%, while the frequency domain method results in an error of around 12 -20%. A song rendered by two diﬀerent people will have quite a few diﬀerent properties. Their absolute pitches, rates of rendering, timbres based on voice quality and inaccuracies, may be diﬀerent. The thesis discusses a method to quantify the distance between two diﬀerent renderings of musical pieces. The distance function has been evaluated by attempting a search for a particular song from a database of a size of 315, made up of songs sung by both male and female singers and whistled queries. Around 90 % of the time, the correct song is found among the top ﬁve best choices picked. Thus, the Bach scale has been proposed as a suitable scale for representing the perception of music. It has been explored in two applications, namely automated segmentation of speech and transcription of singing voices. Using the transcription obtained, a measure of the distance between renderings of musical pieces has also been suggested. Speech Analysis Speech Processing Filter Bank Musical Transcription Speech Recognition Speech - Signal Processing Audio Signals Music - Pitch Tracking Algorithms Music Signals - Analysis Time Frequency Analysis Bach Scale Automated Speech Segmentation Computer Science
10	Objektivní měření a potlačování šumu v hudebním signálu / Objective assessment and reduction of noise in musical signal Rášo, Ondřej January 2013 (has links) The dissertation thesis focuses on objective assessment and reduction of disturbing background noise in a musical signal. In this work, a new algorithm for the assessment of background noise audibility is proposed. The listening tests performed show that this new algorithm better predicts the background noise audibility than the existing algorithms do. An advantage of this new algorithm is the fact that it can be used even in the case of a general audio signal and not only musical signal, i.e. in the case when the audibility of one sound on the background of another sound is assessed. The existing algorithms often fail in this case. The next part of the dissertation thesis deals with an adaptive segmentation scheme for the segmentation of long-term musical signals into short segments of different lengths. A new adaptive segmentation scheme is then introduced here. It has been shown that this new adaptive segmentation scheme significantly improves the subjectively perceived quality of the musical signal from the output of noise reduction systems which use this new adaptive segmentation scheme. The quality improvement is better than that achieved by other segmentation schemes tested.

Search results