• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 30
  • 6
  • 4
  • 2
  • 1
  • 1
  • Tagged with
  • 49
  • 49
  • 18
  • 15
  • 14
  • 13
  • 8
  • 8
  • 8
  • 7
  • 7
  • 7
  • 6
  • 6
  • 6
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

Zpracování obrazů raných smrkových kultur snímaných MR technikou / Processing of images of early spruce needles scanned by MR technology

Raichl, Jaroslav January 2009 (has links)
This semester project deals with filtering of the images detected by use of NMR obtained by NMR application measurement of nuclear magnetic resonance (NMR). This thesis includes the theory of nuclear magnetic resonance, digital filters, basic digital filter banks structures, theory of Wavelet transformation and description of Signal to Noise Ratio calculation. Basic procedure of the MR signal denoising is summarized in the theoretical part of the thesis. The denoising of the images detected by nuclear magnetic resonance is described. In experimental part filtering methods for images denoising are described, which are implemented in program Matlab. These methods are based on Wavelet transformation, digital filter banks with proper thresholding. Effectiveness of filtering methods designed was verified on 2D NMR images. All of these 2D images were measure on MR tomography in the Institute of Scientific Instruments Academy of Science of the Czech Republic in Brno.
32

Filtrace signálů EKG pomocí vlnkové transformace / Wavelet Filtering of ECG Signals

Mrázek, Jiří January 2012 (has links)
This thesis deals myopotential denoising of ECG signals with using wavelet transform. There was used wavelet denoising subsequently wiener wavelet filtering. In both cases were found the most suitable coeficients for the best denoising. It is meant mainly settings suitable parameters for ideal filtration setting value of threshold, number of decomposition level, selection of thresholding and type of filter. These parameters are tested on real signals. Denoising is realized in Matlab version R2009b.
33

Impacts des non-linéarités dans les systèmes multi-porteuses de type FBMC-OQAM / OFDM-FBMC performance in presence of non-linear high power amplifier

Bouhadda, Hanen 22 March 2016 (has links)
Dans cette thèse une étude des performances des systèmes OFDM et FBMC/OQAM en présence d'amplificateur de puissance sans mémoire en terme de TEB est présentée. Ensuite, nous avons proposé une technique de linéarisation d'AP par pré-distorsion adaptative neuronale. Aussi, nous avons proposé deux techniques de correction des non-linéarités au niveau du récepteur. / In our work, we have studied the impact of in-band non linear distortions caused by PA on both OFDM and FBMC/OQAM systems. A theoretical approach was proposed to evaluate the BER performance for the two systems. This approach is based on modeling the in-band non-linear distortion with a complex gain and an uncorrelated additive white Gaussian noise, given by the Bussgang theorem. Then, we have proposed different techniques to compensate this NLD either on the transmitter or the receiver sides.
34

High Data Rate Signal Processing Architectures and Compilation Strategies for Scalable, Multi-Gigabit Digital Systems

Nybo, Daniel Alexander 12 April 2024 (has links) (PDF)
In this study we present a high-performance computing architecture and hardware acceleration strategy for a heterogeneous multi-gigabit computing system. The system architecture integrates a BeeGFS distributed file system, capable of achieving 80 Gbps of sustained write throughput across five nodes, essential for managing the high data volumes generated by a 25 high performance computer (HPC) compute cluster. To ensure operational efficiency and scalability, the tasks performed on the Linux compute cluster consisting of 30 nodes are automated using Ansible, facilitating seamless deployment, management, and updates. We present compilation strategies for a hardware accelerated Polyphase Filter Bank (PFB) channelization routine optimized for Xilinx Ultrascale+ FPGAs, capable of simultaneously processing 2048 channels per 12 input streams. This setup shows the efficiency of High Level Sysnthesis of FPGA-based signal processing in handling demanding data analysis tasks. We also present the implementation and verification of a 1.6 Gsps Direct Memory Access (DMA) transfer from DDR4 memory to a modern Radio Frequency System on Chip (RFSoC) digital to analog converter. The combination of a high-throughput file system, streamlined automation, and advanced signal processing capabilities shows these system's ability to meet the needs of complex, real-time data analysis and processing applications, advancing the field of computational research.
35

Audio editing in the time-frequency domain using the Gabor Wavelet Transform

Hammarqvist, Ulf January 2011 (has links)
Visualization, processing and editing of audio, directly on a time-frequency surface, is the scope of this thesis. More precisely the scalogram produced by a Gabor Wavelet transform is used, which is a powerful alternative to traditional techinques where the wave form is the main visual aid and editting is performed by parametric filters. Reconstruction properties, scalogram design and enhancements as well audio manipulation algorithms are investigated for this audio representation.The scalogram is designed to allow a flexible choice of time-frequency ratio, while maintaining high quality reconstruction. For this mean, the Loglet is used, which is observed to be the most suitable filter choice.  Re-assignmentare tested, and a novel weighting function using partial derivatives of phase is proposed.  An audio interpolation procedure is developed and shown to perform well in listening tests.The feasibility to use the transform coefficients directly for various purposes is investigated. It is concluded that Pitch shifts are hard to describe in the framework while noise thresh holding works well. A downsampling scheme is suggested that saves on operations and memory consumption as well as it speeds up real world implementations significantly. Finally, a Scalogram 'compression' procedure is developed, allowing the caching of an approximate scalogram.
36

Subband Adaptive Filtering Algorithms And Applications

Sridharan, M K 06 1900 (has links)
In system identification scenario, the linear approximation of the system modelled by its impulse response, is estimated in real time by gradient type Least Mean Square (LMS) or Recursive Least Squares (RLS) algorithms. In recent applications like acoustic echo cancellation, the order of the impulse response to be estimated is very high, and these traditional approaches are inefficient and real time implementation becomes difficult. Alternatively, the system is modelled by a set of shorter adaptive filters operating in parallel on subsampled signals. This approach, referred to as subband adaptive filtering, is expected to reduce not only the computational complexity but also to improve the convergence rate of the adaptive algorithm. But in practice, different subband adaptive algorithms have to be used to enhance the performance with respect to complexity, convergence rate and processing delay. A single subband adaptive filtering algorithm which outperforms the full band scheme in all applications is yet to be realized. This thesis is intended to study the subband adaptive filtering techniques and explore the possibilities of better algorithms for performance improvement. Three different subband adaptive algorithms have been proposed and their performance have been verified through simulations. These algorithms have been applied to acoustic echo cancellation and EEG artefact minimization problems. Details of the work To start with, the fast FIR filtering scheme introduced by Mou and Duhamel has been generalized. The Perfect Reconstruction Filter Bank (PRFB) is used to model the linear FIR system. The structure offers efficient implementation with reduced arithmetic complexity. By using a PRFB with non adjacent filters non overlapping, many channel filters can be eliminated from the structure. This helps in reducing the complexity of the structure further, but introduces approximation in the model. The modelling error depends on the stop band attenuation of the filters of the PRFB. The error introduced due to approximation is tolerable for applications like acoustic echo cancellation. The filtered output of the modified generalized fast filtering structure is given by (formula) where, Pk(z) is the main channel output, Pk,, k+1 (z) is the output of auxiliary channel filters at the reduced rate, Gk (z) is the kth synthesis filter and M the number of channels in the PRFB. An adaptation scheme is developed for adapting the main channel filters. Auxiliary channel filters are derived from main channel filters. Secondly, the aliasing problem of the classical structure is reduced without using the cross filters. Aliasing components in the estimated signal results in very poor steady state performance in the classical structure. Attempts to eliminate the aliasing have reduced the computation gain margin and the convergence rate. Any attempt to estimate the subband reference signals from the aliased subband input signals results in aliasing. The analysis filter Hk(z) having the following antialiasing property (formula) can avoid aliasing in the input subband signal. The asymmetry of the frequency response prevents the use of real analysis filters. In the investigation presented in this thesis, complex analysis filters and real'synthesis filters are used in the classical structure, to reduce the aliasing errors and to achieve superior convergence rate. PRFB is traditionally used in implementing Interpolated FIR (IFIR) structure. These filters may not be ideal for processing an input signal for an adaptive algorithm. As third contribution, the IFIR structure is modified using discrete finite frames. The model of an FIR filter s is given by Fc, with c = Hs. The columns of the matrix F forms a frame with rows of H as its dual frame. The matrix elements can be arbitrary except that the transformation should be implementable as a filter bank. This freedom is used to optimize the filter bank, with the knowledge of the input statistics, for initial convergence rate enhancement . Next, the proposed subband adaptive algorithms are applied to acoustic echo cancellation problem with realistic parameters. Speech input and sufficiently long Room Impulse Response (RIR) are used in the simulations. The Echo Return Loss Enhancement (ERLE)and the steady state error spectrum are used as performance measures to compare these algorithms with the full band scheme and other representative subband implementations. Finally, Subband adaptive algorithm is used in minimization of EOG (Electrooculogram) artefacts from measured EEG (Electroencephalogram) signal. An IIR filterbank providing sufficient isolation between the frequency bands is used in the modified IFIR structure and this structure has been employed in the artefact minimization scheme. The estimation error in the high frequency range has been reduced and the output signal to noise ratio has been increased by a couple of dB over that of the fullband scheme. Conclusions Efforts to find elegant Subband adaptive filtering algorithms will continue in the future. However, in this thesis, the generalized filtering algorithm could offer gain in filtering complexity of the order of M/2 and reduced misadjustment . The complex classical scheme offered improved convergence rate, reduced misadjustment and computational gains of the order of M/4 . The modifications of the IFIR structure using discrete finite frames made it possible to eliminate the processing delay and enhance the convergence rate. Typical performance of the complex classical case for speech input in a realistic scenario (8 channel case), offers ERLE of more than 45dB. The subband approach to EOG artefact minimization in EEG signal was found to be superior to their fullband counterpart. (Refer PDF file for Formulas)
37

Music And Speech Analysis Using The 'Bach' Scale Filter-Bank

Ananthakrishnan, G 04 1900 (has links)
The aim of this thesis is to define a perceptual scale for the ‘Time-Frequency’ analysis of music signals. The equal tempered ‘Bach ’ scale is a suitable scale, since it covers most of the genres of music and the error is equally distributed for each semi-tone. However, it may be necessary to allow a tolerance of around 50 cents or half the interval of the Bach scale, so that the interval can accommodate other common intonation schemes. The thesis covers the formulation of the Bach scale filter-bank as a time-varying model. It makes a comparative study with other commonly used perceptual scales. Two applications for the Bach scale filter-bank are also proposed, namely automated segmentation of speech signals and transcription of singing voice for query-by-humming applications. Even though this filter-bank is suggested with a motivation from music, it could also be applied to speech. A method for automatically segmenting continuous speech into phonetic units is proposed. The results, obtained from the proposed method, show around 82% accuracy for the English and 85% accuracy for the Hindi databases. This is an improvement of around 2 -3% when the performance is compared with other popular methods in the literature. Interestingly, the Bach scale filters perform better than the filters designed for other common perceptual scales, such as Mel and Bark scales. ‘Musical transcription’ refers to the process of converting a musical rendering or performance into a set of symbols or notations. A query in a ‘query-by-humming system’ can be made in several ways, some of which are singing with words, or with arbitrary syllables, or whistling. Two algorithms are suggested to annotate a query. The algorithms are designed to be fairly robust for these various forms of queries. The first algorithm is a frequency selection based method. It works on the basis of selecting the most likely frequency components at any given time instant. The second algorithm works on the basis of finding time-connected contours of high energy in the ‘Time-Frequency’ plane of the input signal. The time domain algorithm works better in terms of instantaneous pitch estimates. It results in an error of around 10 -15%, while the frequency domain method results in an error of around 12 -20%. A song rendered by two different people will have quite a few different properties. Their absolute pitches, rates of rendering, timbres based on voice quality and inaccuracies, may be different. The thesis discusses a method to quantify the distance between two different renderings of musical pieces. The distance function has been evaluated by attempting a search for a particular song from a database of a size of 315, made up of songs sung by both male and female singers and whistled queries. Around 90 % of the time, the correct song is found among the top five best choices picked. Thus, the Bach scale has been proposed as a suitable scale for representing the perception of music. It has been explored in two applications, namely automated segmentation of speech and transcription of singing voices. Using the transcription obtained, a measure of the distance between renderings of musical pieces has also been suggested.
38

Array Signal Processing for Beamforming and Blind Source Separation

Moazzen, Iman 30 April 2013 (has links)
A new broadband beamformer composed of nested arrays (NAs), multi-dimensional (MD) filters, and multirate techniques is proposed for both linear and planar arrays. It is shown that this combination results in frequency-invariant response. For a given number of sensors, the advantage of using NAs is that the effective aperture for low temporal frequencies is larger than in the case of using uniform arrays. This leads to high spatial selectivity for low frequencies. For a given aperture size, the proposed beamformer can be implemented with significantly fewer sensors and less computation than uniform arrays with a slight deterioration in performance. Taking advantage of the Noble identity and polyphase structures, the proposed method can be efficiently implemented. Simulation results demonstrate the good performance of the proposed beamformer in terms of frequency-invariant response and computational requirements. The broadband beamformer requires a filter bank with a non-compatible set of sampling rates which is challenging to be designed. To address this issue, a filter bank design approach is presented. The approach is based on formulating the design problem as an optimization problem with a performance index which consists of a term depending on perfect reconstruction (PR) and a term depending on the magnitude specifications of the analysis filters. The design objectives are to achieve almost perfect reconstruction (PR) and have the analysis filters satisfying some prescribed frequency specifications. Several design examples are considered to show the satisfactory performance of the proposed method. A new blind multi-stage space-time equalizer (STE) is proposed which can separate narrowband sources from a mixed signal. Neither the direction of arrival (DOA) nor a training sequence is assumed to be available for the receiver. The beamformer and equalizer are jointly updated to combat both co-channel interference (CCI) and inter-symbol interference (ISI) effectively. Using subarray beamformers, the DOA, possibly time-varying, of the captured signal is estimated and tracked. The estimated DOA is used by the beamformer to provide strong CCI cancellation. In order to alleviate inter-stage error propagation significantly, a mean-square-error sorting algorithm is used which assigns detected sources to different stages according to the reconstruction error at different stages. Further, to speed up the convergence, a simple-yet-efficient DOA estimation algorithm is proposed which can provide good initial DOAs for the multi-stage STE. Simulation results illustrate the good performance of the proposed STE and show that it can effectively deal with changing DOAs and time variant channels. / Graduate / 0544 / imanmoaz@uvic.ca
39

Estudo da técnica FBMC aplicada em Power line communication

Franzin, Renato Pivesso 27 October 2017 (has links)
Submitted by Marta Toyoda (1144061@mackenzie.br) on 2018-02-08T19:53:42Z No. of bitstreams: 2 RENATO PIVESSO FRANZIN.pdf: 4022365 bytes, checksum: 9a9f1f649dcaff054b2cc033a7c69dba (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Approved for entry into archive by Paola Damato (repositorio@mackenzie.br) on 2018-02-22T13:30:38Z (GMT) No. of bitstreams: 2 RENATO PIVESSO FRANZIN.pdf: 4022365 bytes, checksum: 9a9f1f649dcaff054b2cc033a7c69dba (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Made available in DSpace on 2018-02-22T13:30:38Z (GMT). No. of bitstreams: 2 RENATO PIVESSO FRANZIN.pdf: 4022365 bytes, checksum: 9a9f1f649dcaff054b2cc033a7c69dba (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Previous issue date: 2017-10-27 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / The present dissertation presented a comparative analysis between OFDM and FBMC modulation techniques, applied in Power Line Communication (PLC) technology, considering realistic channel models. With the growing demand for access to broadband data networks, there is a need to integrate the various data communication technologies. In this scenario, PLC networks can o er a viable alternative as a provider of network access, since they use the infrastructure of the transmission lines. However, the electrical network is a hostile medium for data transmission, presenting impedance mismatches, noise interference and signal propagation in multipath, characterizing the PLC channel model. With the objective of increasing the data transmission rate, as well as to obtain a better utilization of available bandwidth, the present work proposed to replace the OFDM technique by FBMC in PLC networks. For this, a study of the channel model was carried out to obtain the necessary parameters for the computational simulations through Matlab software. The OFDM and FBMC techniques were implemented according to IEEE 1901 standard technical speci cations. With the results obtained in the simulations, it was veri ed that the FBMC is more robust the channel interferences, presenting gains of up to 8 dB in the bit error rate, and an increase in the data transmission rate and spectral e ciency of up to 25% in relation to OFDM. Therefore, the FBMC technique can be implemented at the physical layer of the IEEE 1901 standard, replacing OFDM. / A presente dissertação apresentou uma análise comparativa entre as t_ecnicas de modulação OFDM e FBMC, aplicadas na tecnologia Power Line Communication (PLC), considerando modelos realísticos de canais. Com a crescente demanda ao acesso das redes de dados em banda larga, há uma necessidade de integração das diversas tecnologias de comunicação de dados. Nesse cenário, as redes PLC podem oferecer uma alternativa viável como provedora de acesso à rede, pois utilizam a infraestrutura das linhas de transmissão de energia elétrica. Entretanto, a rede elétrica é um meio hostil para transmissão de dados, apresentando desajustes de impedância, interferência de ruído e propagação do sinal em multipercursos, caracterizando o modelo do canal PLC. Com o objetivo de aumentar a taxa de transmissão de dados, como também obter um melhor aproveitamento da largura de banda disponível, o presente trabalho propôs substituir a técnica OFDM pela FBMC em redes PLC. Para isso, foi realizado um estudo do modelo do canal, para obter os parâmetros necessários para as simulações computacionais por meio do software Matlab. As técnicas OFDM e FBMC foram implementadas de acordo com especificações técnicas do padrão IEEE 1901. Com os resultados obtidos nas simulações, constatou que o FBMC é mais robusto as interferências do canal, apresentando ganhos de até 8 dB na taxa de erro de bit, e um incremento na taxa de transmissão de dados e eficiência espectral de até 25% em relação ao OFDM. Portanto, a técnica FBMC pode ser implementada na camada física do padrão IEEE 1901, substituindo o OFDM.
40

Applications of perceptual sparse representation (Spikegram) for copyright protection of audio signals / Applications de la représentation parcimonieuse perceptuelle par graphe de décharges (Spikegramme) pour la protection du droit d’auteur des signaux sonores

Erfani, Yousof January 2016 (has links)
Chaque année, le piratage mondial de la musique coûte plusieurs milliards de dollars en pertes économiques, pertes d’emplois et pertes de gains des travailleurs ainsi que la perte de millions de dollars en recettes fiscales. La plupart du piratage de la musique est dû à la croissance rapide et à la facilité des technologies actuelles pour la copie, le partage, la manipulation et la distribution de données musicales [Domingo, 2015], [Siwek, 2007]. Le tatouage des signaux sonores a été proposé pour protéger les droit des auteurs et pour permettre la localisation des instants où le signal sonore a été falsifié. Dans cette thèse, nous proposons d’utiliser la représentation parcimonieuse bio-inspirée par graphe de décharges (spikegramme), pour concevoir une nouvelle méthode permettant la localisation de la falsification dans les signaux sonores. Aussi, une nouvelle méthode de protection du droit d’auteur. Finalement, une nouvelle attaque perceptuelle, en utilisant le spikegramme, pour attaquer des systèmes de tatouage sonore. Nous proposons tout d’abord une technique de localisation des falsifications (‘tampering’) des signaux sonores. Pour cela nous combinons une méthode à spectre étendu modifié (‘modified spread spectrum’, MSS) avec une représentation parcimonieuse. Nous utilisons une technique de poursuite perceptive adaptée (perceptual marching pursuit, PMP [Hossein Najaf-Zadeh, 2008]) pour générer une représentation parcimonieuse (spikegramme) du signal sonore d’entrée qui est invariante au décalage temporel [E. C. Smith, 2006] et qui prend en compte les phénomènes de masquage tels qu’ils sont observés en audition. Un code d’authentification est inséré à l’intérieur des coefficients de la représentation en spikegramme. Puis ceux-ci sont combinés aux seuils de masquage. Le signal tatoué est resynthétisé à partir des coefficients modifiés, et le signal ainsi obtenu est transmis au décodeur. Au décodeur, pour identifier un segment falsifié du signal sonore, les codes d’authentification de tous les segments intacts sont analysés. Si les codes ne peuvent être détectés correctement, on sait qu’alors le segment aura été falsifié. Nous proposons de tatouer selon le principe à spectre étendu (appelé MSS) afin d’obtenir une grande capacité en nombre de bits de tatouage introduits. Dans les situations où il y a désynchronisation entre le codeur et le décodeur, notre méthode permet quand même de détecter des pièces falsifiées. Par rapport à l’état de l’art, notre approche a le taux d’erreur le plus bas pour ce qui est de détecter les pièces falsifiées. Nous avons utilisé le test de l’opinion moyenne (‘MOS’) pour mesurer la qualité des systèmes tatoués. Nous évaluons la méthode de tatouage semi-fragile par le taux d’erreur (nombre de bits erronés divisé par tous les bits soumis) suite à plusieurs attaques. Les résultats confirment la supériorité de notre approche pour la localisation des pièces falsifiées dans les signaux sonores tout en préservant la qualité des signaux. Ensuite nous proposons une nouvelle technique pour la protection des signaux sonores. Cette technique est basée sur la représentation par spikegrammes des signaux sonores et utilise deux dictionnaires (TDA pour Two-Dictionary Approach). Le spikegramme est utilisé pour coder le signal hôte en utilisant un dictionnaire de filtres gammatones. Pour le tatouage, nous utilisons deux dictionnaires différents qui sont sélectionnés en fonction du bit d’entrée à tatouer et du contenu du signal. Notre approche trouve les gammatones appropriés (appelés noyaux de tatouage) sur la base de la valeur du bit à tatouer, et incorpore les bits de tatouage dans la phase des gammatones du tatouage. De plus, il est montré que la TDA est libre d’erreur dans le cas d’aucune situation d’attaque. Il est démontré que la décorrélation des noyaux de tatouage permet la conception d’une méthode de tatouage sonore très robuste. Les expériences ont montré la meilleure robustesse pour la méthode proposée lorsque le signal tatoué est corrompu par une compression MP3 à 32 kbits par seconde avec une charge utile de 56.5 bps par rapport à plusieurs techniques récentes. De plus nous avons étudié la robustesse du tatouage lorsque les nouveaux codec USAC (Unified Audion and Speech Coding) à 24kbps sont utilisés. La charge utile est alors comprise entre 5 et 15 bps. Finalement, nous utilisons les spikegrammes pour proposer trois nouvelles méthodes d’attaques. Nous les comparons aux méthodes récentes d’attaques telles que 32 kbps MP3 et 24 kbps USAC. Ces attaques comprennent l’attaque par PMP, l’attaque par bruit inaudible et l’attaque de remplacement parcimonieuse. Dans le cas de l’attaque par PMP, le signal de tatouage est représenté et resynthétisé avec un spikegramme. Dans le cas de l’attaque par bruit inaudible, celui-ci est généré et ajouté aux coefficients du spikegramme. Dans le cas de l’attaque de remplacement parcimonieuse, dans chaque segment du signal, les caractéristiques spectro-temporelles du signal (les décharges temporelles ;‘time spikes’) se trouvent en utilisant le spikegramme et les spikes temporelles et similaires sont remplacés par une autre. Pour comparer l’efficacité des attaques proposées, nous les comparons au décodeur du tatouage à spectre étendu. Il est démontré que l’attaque par remplacement parcimonieux réduit la corrélation normalisée du décodeur de spectre étendu avec un plus grand facteur par rapport à la situation où le décodeur de spectre étendu est attaqué par la transformation MP3 (32 kbps) et 24 kbps USAC. / Abstract : Every year global music piracy is making billion dollars of economic, job, workers’ earnings losses and also million dollars loss in tax revenues. Most of the music piracy is because of rapid growth and easiness of current technologies for copying, sharing, manipulating and distributing musical data [Domingo, 2015], [Siwek, 2007]. Audio watermarking has been proposed as one approach for copyright protection and tamper localization of audio signals to prevent music piracy. In this thesis, we use the spikegram- which is a bio-inspired sparse representation- to propose a novel approach to design an audio tamper localization method as well as an audio copyright protection method and also a new perceptual attack against any audio watermarking system. First, we propose a tampering localization method for audio signal, based on a Modified Spread Spectrum (MSS) approach. Perceptual Matching Pursuit (PMP) is used to compute the spikegram (which is a sparse and time-shift invariant representation of audio signals) as well as 2-D masking thresholds. Then, an authentication code (which includes an Identity Number, ID) is inserted inside the sparse coefficients. For high quality watermarking, the watermark data are multiplied with masking thresholds. The time domain watermarked signal is re-synthesized from the modified coefficients and the signal is sent to the decoder. To localize a tampered segment of the audio signal, at the decoder, the ID’s associated to intact segments are detected correctly, while the ID associated to a tampered segment is mis-detected or not detected. To achieve high capacity, we propose a modified version of the improved spread spectrum watermarking called MSS (Modified Spread Spectrum). We performed a mean opinion test to measure the quality of the proposed watermarking system. Also, the bit error rates for the presented tamper localization method are computed under several attacks. In comparison to conventional methods, the proposed tamper localization method has the smallest number of mis-detected tampered frames, when only one frame is tampered. In addition, the mean opinion test experiments confirms that the proposed method preserves the high quality of input audio signals. Moreover, we introduce a new audio watermarking technique based on a kernel-based representation of audio signals. A perceptive sparse representation (spikegram) is combined with a dictionary of gammatone kernels to construct a robust representation of sounds. Compared to traditional phase embedding methods where the phase of signal’s Fourier coefficients are modified, in this method, the watermark bit stream is inserted by modifying the phase of gammatone kernels. Moreover, the watermark is automatically embedded only into kernels with high amplitudes where all masked (non-meaningful) gammatones have been already removed. Two embedding methods are proposed, one based on the watermark embedding into the sign of gammatones (one dictionary method) and another one based on watermark embedding into both sign and phase of gammatone kernels (two-dictionary method). The robustness of the proposed method is shown against 32 kbps MP3 with an embedding rate of 56.5 bps while the state of the art payload for 32 kbps MP3 robust iii iv watermarking is lower than 50.3 bps. Also, we showed that the proposed method is robust against unified speech and audio codec (24 kbps USAC, Linear predictive and Fourier domain modes) with an average payload of 5 − 15 bps. Moreover, it is shown that the proposed method is robust against a variety of signal processing transforms while preserving quality. Finally, three perceptual attacks are proposed in the perceptual sparse domain using spikegram. These attacks are called PMP, inaudible noise adding and the sparse replacement attacks. In PMP attack, the host signals are represented and re-synthesized with spikegram. In inaudible noise attack, the inaudible noise is generated and added to the spikegram coefficients. In sparse replacement attack, each specific frame of the spikegram representation - when possible - is replaced with a combination of similar frames located in other parts of the spikegram. It is shown than the PMP and inaudible noise attacks have roughly the same efficiency as the 32 kbps MP3 attack, while the replacement attack reduces the normalized correlation of the spread spectrum decoder with a greater factor than when attacking with 32 kbps MP3 or 24 kbps unified speech and audio coding (USAC).

Page generated in 0.0562 seconds