Spelling suggestions: "subject:"filter back"" "subject:"filter band""
41 |
Applications of perceptual sparse representation (Spikegram) for copyright protection of audio signals / Applications de la représentation parcimonieuse perceptuelle par graphe de décharges (Spikegramme) pour la protection du droit d’auteur des signaux sonoresErfani, Yousof January 2016 (has links)
Chaque année, le piratage mondial de la musique coûte plusieurs milliards de dollars en
pertes économiques, pertes d’emplois et pertes de gains des travailleurs ainsi que la perte
de millions de dollars en recettes fiscales. La plupart du piratage de la musique est dû
à la croissance rapide et à la facilité des technologies actuelles pour la copie, le partage,
la manipulation et la distribution de données musicales [Domingo, 2015], [Siwek, 2007].
Le tatouage des signaux sonores a été proposé pour protéger les droit des auteurs et
pour permettre la localisation des instants où le signal sonore a été falsifié. Dans cette
thèse, nous proposons d’utiliser la représentation parcimonieuse bio-inspirée par graphe de
décharges (spikegramme), pour concevoir une nouvelle méthode permettant la localisation
de la falsification dans les signaux sonores. Aussi, une nouvelle méthode de protection du
droit d’auteur. Finalement, une nouvelle attaque perceptuelle, en utilisant le spikegramme,
pour attaquer des systèmes de tatouage sonore.
Nous proposons tout d’abord une technique de localisation des falsifications (‘tampering’)
des signaux sonores. Pour cela nous combinons une méthode à spectre étendu modifié
(‘modified spread spectrum’, MSS) avec une représentation parcimonieuse. Nous utilisons
une technique de poursuite perceptive adaptée (perceptual marching pursuit, PMP [Hossein
Najaf-Zadeh, 2008]) pour générer une représentation parcimonieuse (spikegramme) du
signal sonore d’entrée qui est invariante au décalage temporel [E. C. Smith, 2006] et qui
prend en compte les phénomènes de masquage tels qu’ils sont observés en audition. Un code
d’authentification est inséré à l’intérieur des coefficients de la représentation en spikegramme.
Puis ceux-ci sont combinés aux seuils de masquage. Le signal tatoué est resynthétisé à
partir des coefficients modifiés, et le signal ainsi obtenu est transmis au décodeur. Au
décodeur, pour identifier un segment falsifié du signal sonore, les codes d’authentification de
tous les segments intacts sont analysés. Si les codes ne peuvent être détectés correctement,
on sait qu’alors le segment aura été falsifié. Nous proposons de tatouer selon le principe
à spectre étendu (appelé MSS) afin d’obtenir une grande capacité en nombre de bits de
tatouage introduits. Dans les situations où il y a désynchronisation entre le codeur et le
décodeur, notre méthode permet quand même de détecter des pièces falsifiées. Par rapport
à l’état de l’art, notre approche a le taux d’erreur le plus bas pour ce qui est de détecter
les pièces falsifiées. Nous avons utilisé le test de l’opinion moyenne (‘MOS’) pour mesurer
la qualité des systèmes tatoués. Nous évaluons la méthode de tatouage semi-fragile par
le taux d’erreur (nombre de bits erronés divisé par tous les bits soumis) suite à plusieurs
attaques. Les résultats confirment la supériorité de notre approche pour la localisation des
pièces falsifiées dans les signaux sonores tout en préservant la qualité des signaux.
Ensuite nous proposons une nouvelle technique pour la protection des signaux sonores.
Cette technique est basée sur la représentation par spikegrammes des signaux sonores
et utilise deux dictionnaires (TDA pour Two-Dictionary Approach). Le spikegramme est
utilisé pour coder le signal hôte en utilisant un dictionnaire de filtres gammatones. Pour
le tatouage, nous utilisons deux dictionnaires différents qui sont sélectionnés en fonction
du bit d’entrée à tatouer et du contenu du signal. Notre approche trouve les gammatones appropriés (appelés noyaux de tatouage) sur la base de la valeur du bit à tatouer, et
incorpore les bits de tatouage dans la phase des gammatones du tatouage. De plus, il
est montré que la TDA est libre d’erreur dans le cas d’aucune situation d’attaque. Il est
démontré que la décorrélation des noyaux de tatouage permet la conception d’une méthode
de tatouage sonore très robuste.
Les expériences ont montré la meilleure robustesse pour la méthode proposée lorsque le
signal tatoué est corrompu par une compression MP3 à 32 kbits par seconde avec une
charge utile de 56.5 bps par rapport à plusieurs techniques récentes. De plus nous avons
étudié la robustesse du tatouage lorsque les nouveaux codec USAC (Unified Audion and
Speech Coding) à 24kbps sont utilisés. La charge utile est alors comprise entre 5 et 15 bps.
Finalement, nous utilisons les spikegrammes pour proposer trois nouvelles méthodes
d’attaques. Nous les comparons aux méthodes récentes d’attaques telles que 32 kbps MP3
et 24 kbps USAC. Ces attaques comprennent l’attaque par PMP, l’attaque par bruit
inaudible et l’attaque de remplacement parcimonieuse. Dans le cas de l’attaque par PMP,
le signal de tatouage est représenté et resynthétisé avec un spikegramme. Dans le cas de
l’attaque par bruit inaudible, celui-ci est généré et ajouté aux coefficients du spikegramme.
Dans le cas de l’attaque de remplacement parcimonieuse, dans chaque segment du signal,
les caractéristiques spectro-temporelles du signal (les décharges temporelles ;‘time spikes’)
se trouvent en utilisant le spikegramme et les spikes temporelles et similaires sont remplacés
par une autre.
Pour comparer l’efficacité des attaques proposées, nous les comparons au décodeur du
tatouage à spectre étendu. Il est démontré que l’attaque par remplacement parcimonieux
réduit la corrélation normalisée du décodeur de spectre étendu avec un plus grand facteur
par rapport à la situation où le décodeur de spectre étendu est attaqué par la transformation MP3 (32 kbps) et 24 kbps USAC. / Abstract : Every year global music piracy is making billion dollars of economic, job, workers’ earnings
losses and also million dollars loss in tax revenues. Most of the music piracy is because of
rapid growth and easiness of current technologies for copying, sharing, manipulating and
distributing musical data [Domingo, 2015], [Siwek, 2007]. Audio watermarking has been
proposed as one approach for copyright protection and tamper localization of audio signals
to prevent music piracy. In this thesis, we use the spikegram- which is a bio-inspired sparse
representation- to propose a novel approach to design an audio tamper localization method
as well as an audio copyright protection method and also a new perceptual attack against
any audio watermarking system.
First, we propose a tampering localization method for audio signal, based on a Modified
Spread Spectrum (MSS) approach. Perceptual Matching Pursuit (PMP) is used to compute
the spikegram (which is a sparse and time-shift invariant representation of audio signals) as
well as 2-D masking thresholds. Then, an authentication code (which includes an Identity
Number, ID) is inserted inside the sparse coefficients. For high quality watermarking, the
watermark data are multiplied with masking thresholds. The time domain watermarked
signal is re-synthesized from the modified coefficients and the signal is sent to the decoder.
To localize a tampered segment of the audio signal, at the decoder, the ID’s associated to
intact segments are detected correctly, while the ID associated to a tampered segment is
mis-detected or not detected. To achieve high capacity, we propose a modified version of
the improved spread spectrum watermarking called MSS (Modified Spread Spectrum). We
performed a mean opinion test to measure the quality of the proposed watermarking system.
Also, the bit error rates for the presented tamper localization method are computed under
several attacks. In comparison to conventional methods, the proposed tamper localization
method has the smallest number of mis-detected tampered frames, when only one frame
is tampered. In addition, the mean opinion test experiments confirms that the proposed
method preserves the high quality of input audio signals.
Moreover, we introduce a new audio watermarking technique based on a kernel-based
representation of audio signals. A perceptive sparse representation (spikegram) is combined
with a dictionary of gammatone kernels to construct a robust representation of sounds.
Compared to traditional phase embedding methods where the phase of signal’s Fourier
coefficients are modified, in this method, the watermark bit stream is inserted by modifying
the phase of gammatone kernels. Moreover, the watermark is automatically embedded only
into kernels with high amplitudes where all masked (non-meaningful) gammatones have
been already removed. Two embedding methods are proposed, one based on the watermark
embedding into the sign of gammatones (one dictionary method) and another one based
on watermark embedding into both sign and phase of gammatone kernels (two-dictionary
method). The robustness of the proposed method is shown against 32 kbps MP3 with
an embedding rate of 56.5 bps while the state of the art payload for 32 kbps MP3 robust
iii
iv
watermarking is lower than 50.3 bps. Also, we showed that the proposed method is robust
against unified speech and audio codec (24 kbps USAC, Linear predictive and Fourier
domain modes) with an average payload of 5 − 15 bps. Moreover, it is shown that the
proposed method is robust against a variety of signal processing transforms while preserving
quality.
Finally, three perceptual attacks are proposed in the perceptual sparse domain using
spikegram. These attacks are called PMP, inaudible noise adding and the sparse replacement
attacks. In PMP attack, the host signals are represented and re-synthesized with
spikegram. In inaudible noise attack, the inaudible noise is generated and added to the
spikegram coefficients. In sparse replacement attack, each specific frame of the spikegram
representation - when possible - is replaced with a combination of similar frames located
in other parts of the spikegram. It is shown than the PMP and inaudible noise attacks
have roughly the same efficiency as the 32 kbps MP3 attack, while the replacement attack
reduces the normalized correlation of the spread spectrum decoder with a greater factor
than when attacking with 32 kbps MP3 or 24 kbps unified speech and audio coding (USAC).
|
42 |
On Adaptive Filtering Using Delayless IFIR Structure : Analysis, Experiments And Application To Active Noise Control And Acoustic Echo CancellationVenkataraman, S 09 1900 (has links) (PDF)
No description available.
|
43 |
Kvadraturní zrcadlové banky filtrů se sigma-delta modulátory / Quadrature Mirror Digital Filtr Banks with Sigma-delta ModulatorsVrána, Jaroslav January 2008 (has links)
Dissertation thesis is focused on real digital signal processing by quadrature mirror digital filter banks. In the first part a dual channel quadrature mirror digital filter bank is described briefly. Mainly transfer functions for distorted subband signals are described. In the next part generalized sigma-delta modulator and its linear model are described. Subsequently the generalized sigma-delta modulator is used in decomposition part of quadrature mirror digital filter bank. Designed structure is analyzed. Two design method of transfer functions are designed for the structure on the basis of analysis results. The first method is suitable for hand-made design by intuitive distribution of zeros and poles. The second method is more suitable for computer design. It is iterative method based on correlation. Transfer functions for quadrature mirror digital filter bank with sigma-delta modulators design examples are also part of thesis. Application of designed structure of quadrature mirror digital filter bank can lead to bigger compression ration in lossy data compression.
|
44 |
Décodage neuronal dans le système auditif central à l'aide d'un modèle bilinéaire généralisé et de représentations spectro-temporelles bio-inspirées / Neural decoding in the central auditory system using bio-inspired spectro-temporal representations and a generalized bilinear modelSiahpoush, Shadi January 2015 (has links)
Résumé : Dans ce projet, un décodage neuronal bayésien est effectué sur le colliculus inférieur du cochon d'Inde. Premièrement, On lit les potentiels évoqués grâce aux électrodes et ensuite on en déduit les potentiels d'actions à l'aide de technique de classification des décharges des neurones.
Ensuite, un modèle linéaire généralisé (GLM) est entraîné en associant un stimulus acoustique en même temps que les mesures de potentiel qui sont effectuées.
Enfin, nous faisons le décodage neuronal de l'activité des neurones en utilisant une méthode d'estimation statistique par maximum à posteriori afin de reconstituer la représentation spectro-temporelle du signal acoustique qui correspond au stimulus acoustique.
Dans ce projet, nous étudions l'impact de différents modèles de codage neuronal ainsi que de différentes représentations spectro-temporelles (qu'elles sont supposé représenter le stimulus acoustique équivalent) sur la précision du décodage bayésien de l'activité neuronale enregistrée par le système auditif central. En fait, le modèle va associer une représentation spectro-temporelle équivalente au stimulus acoustique à partir des mesures faites dans le cerveau. Deux modèles de codage sont comparés: un GLM et un modèle bilinéaire généralisé (GBM), chacun avec trois différentes représentations spectro-temporelles des stimuli d'entrée soit un spectrogramme ainsi que deux représentations bio-inspirées: un banc de filtres gammatones et un spikegramme. Les paramètres des GLM et GBM, soit le champ récepteur spectro-temporel, le filtre post décharge et l'entrée non linéaire (seulement pour le GBM) sont adaptés en utilisant un algorithme d'optimisation par maximum de vraisemblance (ML). Le rapport signal sur bruit entre la représentation reconstruite et la représentation originale est utilisé pour évaluer le décodage, c'est-à-dire la précision de la reconstruction. Nous montrons expérimentalement que la précision de la reconstruction est meilleure avec une représentation par spikegramme qu'avec une représentation par spectrogramme et, en outre, que l'utilisation d'un GBM au lieu d'un GLM augmente la précision de la reconstruction. En fait, nos résultats montrent que le rapport signal à bruit de la reconstruction d'un spikegramme avec le modèle GBM est supérieur de 3.3 dB au rapport signal à bruit de la reconstruction d'un spectrogramme avec le modèle GLM. / Abstract : In this project, Bayesian neural decoding is performed on the neural activity recorded from the inferior colliculus of the guinea pig following the presentation of a vocalization. In particular, we study the impact of different encoding models on the accuracy of reconstruction of different spectro-temporal representations of the input stimulus. First voltages recorded from the inferior colliculus of the guinea pig are read and the spike trains are obtained. Then, we fit an encoding model to the stimulus and associated spike trains. Finally, we do neural decoding on the pairs of stimuli and neural activities using the maximum a posteriori optimization method to obtain the reconstructed spectro-temporal representation of the signal. Two encoding models, a generalized linear model (GLM) and a generalized bilinear model (GBM), are compared along with three different spectro-temporal representations of the input stimuli: a spectrogram and two bio-inspired representations, i.e. a gammatone filter bank (GFB) and a spikegram. The parameters of the GLM and GBM including spectro-temporal receptive field, post spike filter and input non linearity (only for the GBM) are fitted using the maximum likelihood optimization (ML) algorithm. Signal to noise ratios between the reconstructed and original representations are used to evaluate the decoding, or reconstruction accuracy. We experimentally show that the reconstruction accuracy is better with the spikegram representation than with the spectrogram and GFB representation. Furthermore, using a GBM instead of a GLM significantly increases the reconstruction accuracy. In fact, our results show that the spikegram reconstruction accuracy with a GBM fitting yields an SNR that is 3.3 dB better than when using the standard decoding approach of reconstructing a spectrogram with GLM fitting.
|
45 |
A DSP embedded optical naviagtion systemGunnam, Kiran Kumar 30 September 2004 (has links)
Spacecraft missions such as spacecraft docking and formation flying require high precision relative position and attitude data. Although Global Positioining Systems can provide this capability near the earth, deep space missions require the use of alternative technologies. One such technology is the vision-based navigation (VISNAV) sensor system developed at Texas A&M University. VISNAV comprises an electro-optical sensor combined with light sources or beacons. This patented sensor has an analog detector in the focal plane with a rise time of a few microseconds. Accuracies better than one part in 2000 of the field of view have been obtained. This research presents a new approach involving simultaneous activation of beacons with frequency division multiplexing as part of the VISNAV sensor system. In addition, it discusses the synchronous demodulation process using digital heterodyning and decimating filter banks on a low-power fixed point DSP, which improves the accuracy of the sensor measurements and the reliability of the system. This research also presents an optimal and computationally efficient six-degree-of-freedom estimation algorithm using a new measurement model based on the attitude representation of Modified Rodrigues Parameters.
|
46 |
Receiver Channelizer For FBWA System Confirming To WiMAX StandardHoda, Nazmul 02 1900 (has links)
Fixed Broadband Wireless Access (FBWA) is a technology aimed at providing high-speed wireless Internet access, over a wide area, from devices such as personal computers and laptops. FBWA channels are defined in the range of 1-20 MHz which makes the RF front end (RFE) design extremely challenging. In its pursuit to standardize the Broadband Wireless Access (BWA) technologies, IEEE working group 802.16 for Broadband Wireless Access has released the fixed BWA standard IEEE 802.16 – 2004 in 2004. This standard is further backed by a consortium, of leading wireless vendors, chip manufacturers and service providers, officially known as Wireless Interoperability for Microwave Access (WiMAX).
In general, any wireless base station (BS), supporting a number of contiguous Frequency Division Multiplexed (FDM) channels has to incorporate an RF front end (RFE) for each RF channel. The precise job of the RFE is to filter the desired channel from a group of RF channels, digitize it and present it to the subsequent baseband system at the proper sampling rate. The system essentially has a bandpass filter (BPF) tuned to the channel of interest followed by a multiplier which brings the channel to a suitable intermediate frequency (IF). The IF output is digitized by an ADC and then brought to the baseband by an appropriate digital multiplier. The baseband samples, thus generated, are at the ADC sampling rate which is significantly higher than the target sampling rate, which is defined by the wireless protocol in use. As a result a sampling rate conversion (SRC) is performed on these baseband samples to bring the channel back to the target sampling rate. Since the input sampling rate need not be an integer multiple of the target sampling rate, Fractional SRC (FSRC) is required in most of the cases. Instead of using a separate ADC and IF section for each individual channels, most systems use a common IF section, followed by a wideband ADC, which operates over a wide frequency band containing a group of contiguous FDM channels. In this case a channelizer is employed to digitally extract the individual channels from the digital IF samples. We formally call this system a receiver channelizer. Such an implementation presents considerable challenge in terms of the computational requirement and of course the cost of the BS. The computational complexity further goes up for FBWA system where channel bandwidth is in the order of several MHz. Though such a system has been analyzed for narrow band wireless systems like GSM, to the best of our knowledge no analysis seems to have been carried out for a wideband system such as WiMAX.
In this work, we focus on design of a receiver channelizer for WiMAX BS, which can simultaneously extract a group of contiguous FDM RF channels supported by the BS. The main goal is to obtain a simple, low cost channelizer architecture, which can be implemented in an FPGA. There are a number of techniques available in the literature, from Direct Digital Conversion to Polyphase FFT Filter Banks (PFFB), which can do the job of channelization. But each of them operates with certain constraints and, as a result, suits best to a particular application. Further all of these techniques are generic in nature, in the sense that their structure is independent of any particular standard. With regard to computational requirement of these techniques, PFFB is the best, with respect to the number of complex multiplications required for its implementation. But it needs two very stringent conditions to be satisfied, viz. the number of channels to be extracted is equal to the decimation factor and the sampling rate is a power of 2 times baseband bandwidth. Clearly these conditions may not be satisfied by different wireless communication standards, and in fact, this is not satisfied by the WiMAX standard.
This gives us the motivation to analyze the receiver channelizer for WiMAX BS and to find an efficient and low cost architecture of the same. We demonstrate that even though the conditions required by PFFB are not satisfied by the WiMAX standard, we can modify the overall architecture to include the PFFB structure. This is achieved by dividing the receiver channelizer into two blocks. The first block uses the PFFB structure to separate the desired number of channels from the input samples. This process also achieves an integer SRC by a factor that is equal to the number of channels being extracted. This block generates baseband outputs whose sampling rates are related to their target sampling rate by a fractional multiplication factor. In order to bring the channels to their target sampling rate, each output from the PFFB block is fed to a FSRC block, whose job is to use an efficient FSRC algorithm to generate the samples at the target sampling rate. We show that the computational complexity, as compared to the direct implementation, is reduced by a factor, which is approximately equal to the square of the number of channels.
After mathematically formulating the receiver channelizer for WiMAX BS, we perform the simulation of the system using a software tool. There are two basic motives behind the simulation of the system which has a mathematical model. Firstly, the software simulation will give an idea whether the designed system is physically realizable. Secondly, this will help in designing the logic for different blocks of the system. Once these individual blocks are simulated and tested, they can be smoothly ported onto an FPGA.
For simulation purpose, we parameterize the receiver channelizer in such a way that it can be reconfigured for different ADC sampling rates and IF frequencies, by changing the input clock rate. The system is also reconfigurable in terms of the supported channel bandwidth. This is achieved by storing all the filter coefficients pertaining to each channel type, and loading the required coefficients into the computational engine. Using this methodology we simulate the system for three different IF frequencies (and the corresponding ADC sampling rates) and three different channel types, thus leading to nine different system configurations. The simulation results are in agreement with the mathematical model of the system.
Further, we also discuss some important implementation issues for the reconfigurable receiver channelizer. We estimate the memory requirement for implementing the system in an FPGA. The implementation delay is estimated in terms of number of samples.
The thesis is organized in five chapters. Chapter 1 gives a brief introduction about the WiMAX system and different existing channelization architecture followed by the outline of the proposed receiver channelizer. In chapter 2, we analyze the proposed receiver channelizer for WiMAX BS and evaluate its computational requirements. Chapter 3 outlines the procedure to generate the WiMAX test signal and specification of the all the filters used in the system. It also lists the simulation parameters and records the results of the simulation. Chapter 4 presents the details of a possible FPGA implementation. We present the concluding remarks and future research directions in the final chapter.
|
47 |
Explicit Segmentation Of Speech For Indian LanguagesRanjani, H G 03 1900 (has links)
Speech segmentation is the process of identifying the boundaries between words, syllables or phones in the recorded waveforms of spoken natural languages. The lowest level of speech segmentation is the breakup and classification of the sound signal into a string of phones. The difficulty of this problem is compounded by the phenomenon of co-articulation of speech sounds.
The classical solution to this problem is to manually label and segment spectrograms. In the first step of this two step process, a trained person listens to a speech signal, recognizes the word and phone sequence, and roughly determines the position of each phonetic boundary. The second step involves examining several features of the speech signal to place a boundary mark at the point where these features best satisfy a certain set of conditions specific for that kind of phonetic boundary. Manual segmentation of speech into phones is a highly time-consuming and painstaking process. Required for a variety of applications, such as acoustic analysis, or building speech synthesis databases for high-quality speech output systems, the time required to carry out this process for even relatively small speech databases can rapidly accumulate to prohibitive levels. This calls for automating the segmentation process.
The state-of-art segmentation techniques use Hidden Markov Models (HMM) for phone states. They give an average accuracy of over 95% within 20 ms of manually obtained boundaries. However, HMM based methods require large training data for good performance. Another major disadvantage of such speech recognition based segmentation techniques is that they cannot handle very long utterances, Which are necessary for prosody modeling in speech synthesis applications.
Development of Text to Speech (TTS) systems in Indian languages has been difficult till date owing to the non-availability of sizeable segmented speech databases of good quality. Further, no prosody models exist for most of the Indian languages. Therefore, long utterances (at the paragraph level and monologues) have been recorded, as part of this work, for creating the databases.
This thesis aims at automating segmentation of very long speech sentences recorded for the application of corpus-based TTS synthesis for multiple Indian languages. In this explicit segmentation problem, we need to force align boundaries in any utterance from its known phonetic transcription.
The major disadvantage of forcing boundary alignments on the entire speech waveform of a long utterance is the accumulation of boundary errors. To overcome this, we force boundaries between 2 known phones (here, 2 successive stop consonants are chosen) at a time. Here, the approach used is silence detection as a marker for stop consonants. This method gives around 89% (for Hindi database) accuracy and is language independent and training free. These stop consonants act as anchor points for the next stage.
Two methods for explicit segmentation have been proposed. Both the methods rely on the accuracy of the above stop consonant detection stage.
Another common stage is the recently proposed implicit method which uses Bach scale filter bank to obtain the feature vectors. The Euclidean Distance of the Mean of the Logarithm (EDML) of these feature vectors shows peaks at the point where the spectrum changes. The method performs with an accuracy of 87% within 20 ms of manually obtained boundaries and also achieves a low deletion and insertion rate of 3.2% and 21.4% respectively, for 100 sentences of Hindi database.
The first method is a three stage approach. The first is the stop consonant detection stage followed by the next, which uses Quatieri’s sinusoidal model to classify sounds as voiced/unvoiced within 2 successive stop consonants. The final stage uses the EDML function of Bach scale feature vectors to further obtain boundaries within the voiced and unvoiced regions. It gives a Frame Error Rate (FER) of 26.1% for Hindi database.
The second method proposed uses duration statistics of the phones of the language. It again uses the EDML function of Bach scale filter bank to obtain the peaks at the phone transitions and uses the duration statistics to assign probability to each peak being a boundary. In this method, the FER performance improves to 22.8% for the Hindi database.
Both the methods are equally promising for the fact that they give low frame error rates. Results show that the second method outperforms the first, because it incorporates the knowledge of durations.
For the proposed approaches to be useful, manual interventions are required at the output of each stage. However, this intervention is less tedious and reduces the time taken to segment each sentence by around 60% as compared to the time taken for manual segmentation. The approaches have been successfully tested on 3 different languages, 100 sentences each -Kannada, Tamil and English (we have used TIMIT database for validating the algorithms).
In conclusion, a practical solution to the segmentation problem is proposed. Also, the algorithm being training free, language independent (ES-SABSF method) and speaker independent makes it useful in developing TTS systems for multiple languages reducing the segmentation overhead. This method is currently being used in the lab for segmenting long Kannada utterances, spoken by reading a set of 1115 phonetically rich sentences.
|
48 |
A DSP embedded optical naviagtion systemGunnam, Kiran Kumar 30 September 2004 (has links)
Spacecraft missions such as spacecraft docking and formation flying require high precision relative position and attitude data. Although Global Positioining Systems can provide this capability near the earth, deep space missions require the use of alternative technologies. One such technology is the vision-based navigation (VISNAV) sensor system developed at Texas A&M University. VISNAV comprises an electro-optical sensor combined with light sources or beacons. This patented sensor has an analog detector in the focal plane with a rise time of a few microseconds. Accuracies better than one part in 2000 of the field of view have been obtained. This research presents a new approach involving simultaneous activation of beacons with frequency division multiplexing as part of the VISNAV sensor system. In addition, it discusses the synchronous demodulation process using digital heterodyning and decimating filter banks on a low-power fixed point DSP, which improves the accuracy of the sensor measurements and the reliability of the system. This research also presents an optimal and computationally efficient six-degree-of-freedom estimation algorithm using a new measurement model based on the attitude representation of Modified Rodrigues Parameters.
|
49 |
Performance Analysis and PAPR Reduction Techniques for Filter-Bank based Multi-Carrier Systems with Non-Linear Power Amplifiers / Réduction du PAPR pour les systèmes utilisant la modulation FBMC/OQAM en présence d’amplificateur de puissance non linéaireBulusu, Sri Satish Krishna Chaitanya 29 April 2016 (has links)
Cette thèse a été effectuée dans le cadre du projet européen FP7 EMPHATIC (Enhanced Multicarrier Techniques for Professional Ad-Hoc and Cell-Based Communications). Plusieurs universités européennes et deux partenaires industriels: THALES Communications Security et CASSIDIAN ont participé à ce projet. L'objectif de ce projet est de développer, d'évaluer et de démontrer les apports des techniques multi-porteuses avancées, permettant une meilleure utilisation des bandes de fréquences radio existantes en fournissant des services de données à large bande, en coexistence avec les services traditionnels à bande étroite. Le projet porte sur l'application de radiocommunications mobiles professionnelles (Professional Mobile Radio : PMR). L'idée principale de ce projet est d'analyser la viabilité des systèmes à large bande utilisant des bancs de filtres (Filter Bank Multi Carrier : FBMC) conjointement avec une modulation d'amplitude en quadrature avec décalage (Offset Quadrature Amplitude Modulation : OQAM) dans le cadre de la 5ème génération (5G) des systèmes radio-mobiles. La modulation FBMC-OQAM se positionne comme candidate potentielle pour les futurs systèmes de communication. Cette modulation avancée offre de nombreux avantages tels que l’excellente localisation fréquentielle de sa densité spectrale de puissance (DSP), une robustesse au bruit de phase, aux décalages de fréquence ainsi qu’à l’asynchronisme entre les utilisateurs. Ces atouts, la rendent plus attrayant qu’OFDM pour l’application PMR, la radio cognitive (CR) et la 5G. Cependant, comme toute autre technique de modulation muti-porteuses, FBMC-OQAM souffre d’un facteur de crête ou d’un PAPR (pour Peak to Average Power Ratio) élevé. Lorsque l'amplificateur de puissance (AP), utilisé au niveau de l’émetteur, est opéré proche de sa zone non-linéaire (NL), ce qui est le cas dans la pratique, la bonne localisation fréquentielle de la DSP du système FBMC/OQAM est sérieusement compromise, en raison des remontées spectrales. Le premier objectif de cette thèse est de prédire l'étendue des remontées spectrales dans les systèmes FBMC-OQAM, introduites par la non-linéarité AP. Le deuxième objectif de ce travail est de proposer des techniques, pour les systèmes FBMC-OQAM, permettant la réduction du PAPR et la linéarisation de l’AP, afin d'atténuer les effets NL. L’utilisation des cumulants, a permis de prédire les remontées spectrales pour les signaux FBMC-OQAM après amplification NL. En outre, certains algorithmes de réduction du PAPR, basées sur des approches probabilistes et des techniques d'ajout de signaux, ont été proposés. La capacité de coexistence du système à large bande utilisant FBMC-OQAM avec des systèmes PMR à bande étroite en présence de PA a été analysée et il a été démontré que la coexistence est possible, à condition qu'il y est une bonne combinaison entre le recul du signal à l’entrée de l’AP (Input Back-Off : IBO), la réduction du PAPR et la linéarisation de l’AP. Enfin, une nouvelle technique de linéarisation de l’AP a été proposée pour le système FBMC-OQAM. / This thesis is part of the European FP7 EMPHATIC project (Enhanced Multicarrier Techniques for Professional Ad-Hoc and Cell-Based Communications) including various European universities and two main industrial partners: THALES Communications Security and CASSIDIAN. The EMPHATIC objective is to develop, evaluate and demonstrate the capability of enhanced multi-carrier techniques to make better use of the existing radio frequency bands in providing broadband data services in coexistence with narrowband legacy services. The project addresses the Professional Mobile Radio (PMR) application. The main idea is to analyze the viability of broadband systems based on filter-bank multi-carrier (FBMC) clubbed with o ffset quadrature amplitude modulation (OQAM) in the context of the future 5th Generation (5G) radio access technology (RAT). Increasingly, the FBMC-OQAM systems are gaining appeal in the probe for advanced multi-carrier modulation (MCM) waveforms for future communication systems. This advanced modulation scheme o ers numerous advantages such as excellent frequency localization in its power spectral density (PSD), a robustness to phase noise, frequency off sets and also to the multi-user asynchronism; making it more appealing than OFDM for PMR, cognitive radio (CR) and 5G RAT. However, like any other MCM technique, FBMC-OQAM suff ers from high PAPR. When the power amplifi er (PA) non-linearity, which is realistic radio-frequency impairment, is taken into account; the good frequency localization property is severely compromised, due to the spectral regrowth. The first objective of this PhD thesis is, to predict the extent of the spectral regrowth in FBMC-OQAM systems, due to the PA non-linearity. The second objective is to probe techniques for FBMC-OQAM systems, such as PAPR reduction and PA linearization, in order to mitigate the NL eff ects of PA. By cumulant analysis, spectral regrowth prediction has been done for FBMC-OQAM systems. Also, some algorithms for PAPR reduction, which are based on probabilistic approach and adding signal methods, have been proposed. The coexistence capability of the FBMC-OQAM based broadband system with the narrowband PMR systems in the presence of PA has been analyzed and it has been found that coexistence is possible, provided there is a symbiotic combination of PA Input Back-off (IBO), PAPR reduction and PA linearization. Finally, a novel PA linearization technique has been proposed for FBMC-OQAM.
|
50 |
Praktické ukázky zpracování signálů / Practical examples of signal processingHanzálek, Pavel January 2019 (has links)
The thesis focuses on the issue of signal processing. Using practical examples, it tries to show the use of individual signal processing operations from a practical point of view. For each of the selected signal processing operations, an application is created in MATLAB, including a graphical interface for easier operation. The division of the thesis is such that each chapter is first analyzed from a theoretical point of view, then it is shown using a practical demonstration of what the operation is used in practice. Individual applications are described here, mainly in terms of how they are handled and their possible results. The results of the practical part are presented in the attachment of the thesis.
|
Page generated in 0.0739 seconds