• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 3
  • Tagged with
  • 6
  • 6
  • 4
  • 4
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Audio fingerprinting for speech reconstruction and recognition in noisy environments

Liu, Feng 13 April 2017 (has links)
Audio fingerprinting is a highly specific content-based audio retrieval technique. Given a short audio fragment as query, an audio fingerprinting system can identify the particular file that contains the fragment in a large library potentially consisting of millions of audio files. In this thesis, we investigate the possibility and feasibility of applying audio fingerprinting to do speech recognition in noisy environments based on speech reconstruction. To reconstruct noisy speech, the speech is divided into small segments of equal length at first. Then, audio fingerprinting is used to find the most similar segment in a large dataset consisting of clean speech files. If the similarity is above a threshold, the noisy segment is replaced with the clean segment. At last, all the segments, after conditional replacement, are concatenated to form the reconstructed speech, which is sent to a traditional speech recognition system. In the above procedure, a critical step is using audio fingerprinting to find the clean speech segment in a dataset. To test its performance, we build a landmark-based audio fingerprinting system. Experimental results show that this baseline system performs well in traditional applications, but its accuracy in this new application is not as good as we expected. Next, we propose three strategies to improve the system, resulting in better accuracy than the baseline system. Finally, we integrate the improved audio fingerprinting system into a traditional speech recognition system and evaluate the performance of the whole system. / Graduate
2

Comparison of two audio fingerprinting algorithms for advertisement identification / van Nieuwenhuizen H.A.

Van Nieuwenhuizen, Heinrich Abrie January 2011 (has links)
Although the identification of humans by fingerprints is a well–known technique in practice, the identification of an audio sample by means of a technique called audio fingerprinting is still under development. Audio fingerprinting can be used to identify different types of audio samples of which music and advertisements are the two most frequently encountered. Different audio fingerprinting techniques to identify audio samples appear seldom in the literature and direct comparisons of the techniques are not always available In this dissertation, the two audio fingerprinting techniques of Avery Wang and Haitsma and Kalker are compared in terms of accuracy, speed, versatility and scalability, with the goal of modifying the algorithms for optimal advertisement identification applications. To start the background of audio fingerprinting is summarised and different algorithms for audio fingerprinting are reviewed. Problems, issues to be addressed and research methodology are discussed. The research question is formulated as follows : “Can audio fingerprinting be applied successfully to advertisement monitoring, and if so, which existing audio fingerprinting algorithm is most suitable as a basis for a generic algorithm and how should the original algorithm be changed for this purpose?” The research question is followed by literature regarding the background of audio fingerprinting and different audio fingerprinting algorithms. Next, the importance of audio fingerprinting in the engineering field is motivated by the technical aspects related to audio fingerprinting. The technical aspects are not always necessary or part of the algorithm, but in most cases, the algorithms are pre–processed, filtered and downsampled. Other aspects include identifying unique features and storing them, on which each algorithm’s techniques differ. More detail on Haitsma and Kalker’s, Avery Wang’s and Microsoft’s RARE algorithms are then presented. Next, the desired interface for advertisement identification Graphical User Interface (GUI) is presented. Different solution architectures for advertisement identification are discussed. A design is presented and implemented which focuses on advertisement identification and helps with the validation process of the algorithm. The implementation is followed by the experimental setup and tests. Finally, the dissertation ends with results and comparisons, which verified and validated the algorithm and thus affirmed the first part of the research question. A short summary of the contribution made in the dissertation is given, followed by conclusions and recommendations for future work. / Thesis (M.Ing. (Computer and Electronical Engineering))--North-West University, Potchefstroom Campus, 2012.
3

Comparison of two audio fingerprinting algorithms for advertisement identification / van Nieuwenhuizen H.A.

Van Nieuwenhuizen, Heinrich Abrie January 2011 (has links)
Although the identification of humans by fingerprints is a well–known technique in practice, the identification of an audio sample by means of a technique called audio fingerprinting is still under development. Audio fingerprinting can be used to identify different types of audio samples of which music and advertisements are the two most frequently encountered. Different audio fingerprinting techniques to identify audio samples appear seldom in the literature and direct comparisons of the techniques are not always available In this dissertation, the two audio fingerprinting techniques of Avery Wang and Haitsma and Kalker are compared in terms of accuracy, speed, versatility and scalability, with the goal of modifying the algorithms for optimal advertisement identification applications. To start the background of audio fingerprinting is summarised and different algorithms for audio fingerprinting are reviewed. Problems, issues to be addressed and research methodology are discussed. The research question is formulated as follows : “Can audio fingerprinting be applied successfully to advertisement monitoring, and if so, which existing audio fingerprinting algorithm is most suitable as a basis for a generic algorithm and how should the original algorithm be changed for this purpose?” The research question is followed by literature regarding the background of audio fingerprinting and different audio fingerprinting algorithms. Next, the importance of audio fingerprinting in the engineering field is motivated by the technical aspects related to audio fingerprinting. The technical aspects are not always necessary or part of the algorithm, but in most cases, the algorithms are pre–processed, filtered and downsampled. Other aspects include identifying unique features and storing them, on which each algorithm’s techniques differ. More detail on Haitsma and Kalker’s, Avery Wang’s and Microsoft’s RARE algorithms are then presented. Next, the desired interface for advertisement identification Graphical User Interface (GUI) is presented. Different solution architectures for advertisement identification are discussed. A design is presented and implemented which focuses on advertisement identification and helps with the validation process of the algorithm. The implementation is followed by the experimental setup and tests. Finally, the dissertation ends with results and comparisons, which verified and validated the algorithm and thus affirmed the first part of the research question. A short summary of the contribution made in the dissertation is given, followed by conclusions and recommendations for future work. / Thesis (M.Ing. (Computer and Electronical Engineering))--North-West University, Potchefstroom Campus, 2012.
4

Hokua – A Wavelet Method for Audio Fingerprinting

Lutz, Steven S. 20 November 2009 (has links) (PDF)
In recent years, multimedia identification has become important as the volume of digital media has dramatically increased. With music files, one method of identification is audio fingerprinting. The underlying method for most algorithms is the Fourier transform. However, due to a lack of temporal resolution, these algorithms rely on the short-time Fourier transform. We propose an audio fingerprinting algorithm that uses a wavelet transform, which has good temporal resolution. In this thesis, we examine the basics of certain topics that are needed in understanding audio fingerprinting techniques. We also look at a brief history of work done in this field. We introduce a new algorithm, called the Hokua algorithm. We developed Hokua to take advantage of certain properties of the wavelet transform. The algorithm uses coefficient peaks of wavelet transforms to identify a sample query. The various algorithms are compared.
5

Content-based audio search: from fingerprinting to semantic audio retrieval

Cano Vila, Pedro 27 April 2007 (has links)
Aquesta tesi tracta de cercadors d'audio basats en contingut. Específicament, tracta de desenvolupar tecnologies que permetin fer més estret l'interval semàntic o --semantic gap' que, a avui dia, limita l'ús massiu de motors de cerca basats en contingut. Els motors de cerca d'àudio fan servir metadades, en la gran majoria generada per editors, per a gestionar col.leccions d'àudio. Tot i ser una tasca àrdua i procliu a errors, l'anotació manual és la pràctica més habitual. Els mètodes basats en contingut àudio, és a dir, aquells algorismes que extreuen automàticament etiquetes descriptives de fitxers d'àudio, no són generalment suficientment madurs per a permetre una interacció semàntica. En la gran majoria, els mètodes basats en contingut treballen amb descriptors de baix nivell, mentre que els descriptors d'alt nivell estan més enllà de les possibilitats actuals. En la tesi explorem mètodes, que considerem pas previs per a atacar l'interval semàntic. / This dissertation is about audio content-based search. Specifically, it is on developing technologies for bridging the semantic gap that currently prevents wide-deployment of audio content-based search engines.Audio search engines rely on metadata, mostly human generated, to manage collections of audio assets.Even though time-consuming and error-prone, human labeling is a common practice.Audio content-based methods, algorithms that automatically extract description from audio files, are generally not mature enough to provide a user friendly representation for interacting with audio content. Mostly, content-based methods are based on low-level descriptions, while high-level or semantic descriptions are beyond current capabilities. In this thesis we explore technologies that can help close the semantic gap.
6

A smart sound fingerprinting system for monitoring elderly people living alone

El Hassan, Salem January 2021 (has links)
There is a sharp increase in the number of old people living alone throughout the world. More often than not, such people require continuous and immediate care and attention in their everyday lives, hence the need for round the clock monitoring, albeit in a respectful, dignified and non-intrusive way. For example, continuous care is required when they become frail and less active, and immediate attention is required when they fall or remain in the same position for a long time. To this extent, various monitoring technologies have been developed, yet there are major improvements still to be realised. Current technologies include indoor positioning systems (IPSs) and health monitoring systems. The former relies on defined configurations of various sensors to capture a person's position within a given space in real-time. The functionality of the sensors varies depending on receiving appropriate data using WiFi, radio frequency identification (RFIO), ultrawide band (UWB), dead reckoning (OR), infrared indoor (IR), Bluetooth (BLE), acoustic signal, visible light detection, and sound signal monitoring. The systems use various algorithms to capture proximity, location detection, time of arrival, time difference of arrival angle, and received signal strength data. Health monitoring technologies capture important health data using accelerometers and gyroscope sensors. In some studies, audio fingerprinting has been used to detect indoor environment sound variation and have largely been based on recognising TV sound and songs. This has been achieved using various staging methods, including pre-processing, framing, windowing, time/frequency domain feature extraction, and post-processing. Time/frequency domain feature extraction tools used include Fourier Transforms (FTs}, Modified Discrete Cosine Transform (MDCT}, Principal Component Analysis (PCA), Mel-Frequency Cepstrum Coefficients (MFCCs), Constant Q Transform (CQT}, Local Energy centroid (LEC), and Wavelet transform. Artificial intelligence (Al) and probabilistic algorithms have also been used in IPSs to classify and predict different activities, with interesting applications in healthcare monitoring. Several tools have been applied in IPSs and audio fingerprinting. They include Radial Basis Kernel (RBF), Support Vector Machine (SVM), Decision Trees (DTs), Hidden Markov Models (HMMs), Na'ive Bayes (NB), Gaussian Mixture Modelling (GMM), Clustering algorithms, Artificial Neural Networks (ANNs), and Deep Learning (DL). Despite all these attempts, there is still a major gap for a completely non-intrusive system capable of monitoring what an elderly person living alone is doing, where and for how long, and providing a quick traffic-like risk score prompting, therefore immediate action or otherwise. In this thesis, a cost-effective and completely non-intrusive indoor positioning and activity-monitoring system for elderly people living alone has been developed, tested and validated in a typical residential living space. The proposed system works based on five phases: (1)Set-up phase that defines the typical activities of daily living (TADLs). (2)Configuration phase that optimises the implementation of the required sensors in exemplar flat No.1. (3)Learning phase whereby sounds and position data of the TADLs are collected and stored in a fingerprint reference data set. (4)Listening phase whereby real-time data is collected and compared against the reference data set to provide information as to what a person is doing, when, and for how long. (5)Alert phase whereby a health frailty score varying between O unwell to 10 healthy is generated in real-time. Two typical but different residential flats (referred to here are Flats No.1 and 2) are used in the study. The system is implemented in the bathroom, living room, and bedroom of flat No.1, which includes various floor types (carpet, tiles, laminate) to distinguish between various sounds generated upon walking on such floors. The data captured during the Learning Phase yields the reference data set and includes position and sound fingerprints. The latter is generated from tests of recording a specific TADL, thus providing time and frequency-based extracted features, frequency peak magnitude (FPM), Zero Crossing Rate (ZCR), and Root Mean Square Error (RMSE). The former is generated from distance measurement. The sampling rate of the recorded sound is 44.1kHz. Fast Fourier Transform (FFT) is applied on 0.1 seconds intervals of the recorded sound with minimisation of the spectral leakage using the Hamming window. The frequency peaks are detected from the spectrogram matrices to get the most appropriate FPM between the reference and sample data. The position detection of the monitored person is based on the distance between that captured from the learning and listening phases of the system in real-time. A typical furnished one-bedroom flat (flat No.2) is used to validate the system. The topologies and floorings of flats No.1 and No.2 are different. The validation is applied based on "happy" and "unusual" but typical behaviours. Happy ones include typical TADLs of a healthy elderly person living alone with a risk metric higher than 8. Unusual one's mimic acute or chronic activities (or lack thereof), for example, falling and remaining on the floor, or staying in bed for long periods, i.e., scenarios when an elderly person may be in a compromised situation which is detected by a sudden drop of the risk metric (lower than 4) in real-time. Machine learning classification algorithms are used to identify the location, activity, and time interval in real-time, with a promising early performance of 94% in detecting the right activity and the right room at the right time.

Page generated in 0.0785 seconds