21 |
Υλοποίηση κωδικοποιητή πηγής τύπου ADPCM στον επεξεργαστή σήματος TMS320C6711 / ADPCM source coding implementation in the TMS320C6711 digital signal processorΑλεξανδρόπουλος, Γεώργιος 21 March 2011 (has links)
Στα πλαίσια αυτής της διπλωματικής εργασίας υλοποιήθηκε η σύσταση G.721 της International Telecommunication Union (ITU-CCITT) η οποία περιγράφει την προσαρμοστική διαφορική παλμοκωδική διαμόρφωση (Adaptive Differential Pulse Code Modulation-ADPCM) για κανάλια 32 Kbps με συχνότητα δειγματοληψίας 8 KHz. Η διαμόρφωση αυτή χρησιμοποιείται για συμπίεση δεδομένων σε πραγματικό χρόνο, ιδίως φωνής, κατά τη μετάδοση σε ένα τηλεπικοινωνιακό κανάλι. Πρόκειται για μια από τις παλαιότερες τεχνικές κωδικοποίησης φωνής, η οποία εκμεταλλεύεται την υψηλή συσχέτιση των φωνητικών σημάτων παρέχοντας υψηλή απόδοση.
Η υλοποίηση πραγματοποιήθηκε στην αναπτυξιακή κάρτα C6211/C6711 DSK, πυρήνας της οποίας είναι ο ψηφιακός επεξεργαστής σήματος κινητής υποδιαστολής TMS320C6711 της Texas Instruments. Ένα από τα βασικά χαρακτηριστικά της οικογένειας TMS320 στην οποία αυτός ανήκει είναι η προχωρημένη Very Long Instruction Word (VLIW) αρχιτεκτονική, VelociTITM, η οποία παρέχει υψηλό παραλληλισμό πολλών βαθμίδων για την εκτέλεση πολλών εντολών στη διάρκεια ενός ωρολογιακού κύκλου. Η υψηλή απόδοση αυτού του επεξεργαστή, η ύπαρξη μετατροπέων A/D και D/A που εξασφαλίζουν την εύκολη είσοδο και έξοδο πραγματικών σημάτων, η ύπαρξη ενός πλήρους συνόλου αναπτυξιακών εργαλείων λογισμικού για εύκολο προγραμματισμό (Code Composer Studio v.1.23) κι η εύκολη διασύνδεση της αναπτυξιακής κάρτας με προσωπικό υπολογιστή, μέσω της παράλληλης θύρας επικοινωνιών, την καθιστούν ένα ισχυρό εργαλείο για την ανάπτυξη εφαρμογών της ψηφιακής επεξεργασίας σημάτων, της επεξεργασίας και συμπίεσης φωνής, τηλεπικοινωνιακών εφαρμογών κ.ά.
Η εργασία αυτή δομείται σε τρία κεφάλαια. Στο 1ο κεφάλαιο περιγράφονται τα βασικά χαρακτηριστικά του επεξεργαστή TMS320C6711, στο 2ο κεφάλαιο, η σύσταση G.721 και στο 3ο περιγράφεται η υλοποίηση του αλγορίθμου μαζί με τις πειραματικές μετρήσεις και τα συμπεράσματα. Τέλος, στο παράρτημα, παρατίθενται όλοι οι χρησιμοποιούμενοι κώδικες που αφορούν τον αλγόριθμο και την αναπαράσταση των αποτελεσμάτων. / Recommendation G.721 of the International Telecommunication Union (ITU-CCITT) that describes Adaptive Differential Pulse Code Modulation (ADPCM) for 32 Kbps channels with 8 KHz sampling frequency has been implemented within the framework of this diploma thesis. This modulation is utilized for real-time data compression, especially voice data, during transmission in telecommunications channels. It is one of the oldest well-known voice coding techniques that exploits high correlation inherit in speech signals and provides high performance.
The implementation of ADPCM voice coding has been carried out in the C6211/C6711 Digital signal processing Starter Kit (DSK), which core processor is the floating point Digital Signal Processor (DSP) TMS320C6711 of Texas Instruments. This processor belongs to the TMS320 DSP family which one of the main characteristics is the advanced Very Long Instruction Word (VLIW) architecture, VelociTITM. The latter architecture provides high multistage parallelism for executing many commands during a clocking cycle. DSK’s easy connection with personal computers through the parallel communications port, DSP’s TMS320C6711 high performance, the existence of A/D and D/A converters that ensure simple input and output of real signals and the supporting of solid software development tools for easy programming (Code Composer Studio v.1.23) render DSK a powerful tool for implementing digital signal processing applications, speech processing and compression, telecommunications applications etc.
This thesis is structured in three chapters. In Chapter 1, the basic characteristics of DSP TMS320C6711 are presented, whereas in Chapter 2, recommendation G.721 is described. The implementation of ADPCM source coding is presented in Chapter 3 along with several simulation results and conclusions. In the appendix, all source files are included.
|
22 |
Speech Coder using Line Spectral Frequencies of Cascaded Second Order PredictorsNamburu, Visala 14 November 2001 (has links)
A major objective in speech coding is to represent speech with as few bits as possible. Usual transmission parameters include auto regressive parameters, pitch parameters, excitation signals and excitation gains. The pitch predictor makes these coders sensitive to channel errors. Aiming for robustness to channel errors, we do not use pitch prediction and compensate for its lack with a better representation of the excitation signal. We propose a new speech coding approach, Vector Sum Excited Cascaded Linear Prediction (VSECLP), based on code excited linear prediction.
We implement forward linear prediction using five cascaded second order sections - parameterized in terms of line spectral frequency - in place of the conventional tenth order filter. The line spectral frequency parameters estimated by the Direct Line Spectral Frequency (DLSF) adaptation algorithm are closer to the true values than those estimated by the Cascaded Recursive Least Squares - Subsection algorithm. A simplified version of DLSF is proposed to further reduce computational complexity.
Split vector quantization is used to quantize the line spectral frequency parameters and vector sum codebooks to quantize the excitation signals. The effect on reconstructed speech quality and transmission rate, of an increased number of bits and differently split combinations, is analyzed by testing VSECLP on the TIMIT database. The quantization of the excitation vectors using the discrete cosine transform resulted in segmental signal to noise ratio of 4 dB at 20.95 kbps, whereas the same quality was obtained at 9.6 kbps using vector sum codebooks. / Master of Science
|
23 |
Évaluation subjective de la qualité : proposition d'un système de référence pour les codecs en bande élargie / Subjective quality assessment : proposal of a reference system for Wideband codecsZango, Tiraogo Abdoulaye Yves 06 February 2013 (has links)
L'évolution des systèmes de télécommunications conduit à la conception de codecs de la parole et du son de plus en plus sophistiqués, accroissant ainsi la concurrence de l'industrie de l'audio et accordant une importance grandissante à la qualité de service. Si l'évaluation de la qualité des codecs peut s'opérer suivant des mesures objectives ou subjectives, les secondes restent les plus fiables dans la mesure où la qualité perçue par les utilisateurs est intrinsèquement subjective. Toutefois, les tests subjectifs requièrent des signaux d'ancrage, i.e. des signaux artificiels visant la reproduction des défauts perceptifs des codecs de sorte que les dégradations provoquées soient aisément contrôlables. Le système de référence actuellement normalisé par l'Union Internationale des Télécommunications est le MNRU (Modulated Noise Reference Unit) qui simule le bruit de quantification introduit par les premiers codecs en forme d'onde. L'évolution de la technologie rend aujourd'hui ce système obsolète, et il s'agit donc de concevoir un nouveau système d'ancrage plus adapté aux codecs actuels. En considérant la qualité audio comme un objet multidimensionnel, nous avons mis en évidence un espace perceptif à quatre dimensions, et ce à partir de deux approches de réduction de dimensionnalité, l'AFM (Analyse Factorielle Multiple) et la MDS 3–voies (MultiDimensional Scaling). A partir des quatre dimensions identifiées – « Réduction de la largeur de bande », « Bruit de fond », « Écho/Réverbération » et « Distorsion de la parole » –, nous avons modélisé puis validé les signaux d'ancrage des trois premières dimensions et proposé deux modèles de signaux d'ancrage pour la quatrième. / The evolution of technology led to the design of very sophisticated speech and audio codecs. Accordingly, the competition in audio devices manufacturing has increased and today the quality of service becomes crucial for telecommunications operators. Quality of codecs is assessed through objective and subjective measures, the second ones being the most reliable since the quality perceived by users is inherently subjective. Nevertheless, subjective tests require anchor signals corresponding to artificial signals, which reproduce the perceptual impairments of codecs in such a manner that the amount of degradation can be easily controlled. The reference system currently standardized by the International Telecommunication Union is the Modulated Noise Reference Unit (MNRU), which simulates the quantization noise of the first generation of waveform codecs. Due to the evolution of codecs, the MNRU system became obsolete and researchers aim at designing a new reference system of anchor signals more suited to current codecs. Assuming that speech and audio quality is multidimensional, we first identified four perceptual dimensions using two dimensionality reduction techniques – the MFA (Multiple Factor Analysis) and the 3–way MDS (MultiDimensional Scaling). From the identified dimensions, namely “Bandwidth limitation”, “Background noise”, “Echo/Reverberation” and “Speech distortion”, we succeeded in modeling and validating anchor signals for three of them and we suggested two models of anchor signals for the last one.
|
24 |
Codificador preditivo de voz por análise mediante síntese. / Analysis-by-synthesis linear predictive speech coder.Ramirez, Miguel Arjona 18 December 1992 (has links)
Os codificadores preditivos de voz por analise-mediante-síntese vem sendo amplamente aplicados em telefonia móvel celular e em telecomunicações sigilosas. A predição linear do sinal de voz e as técnicas de análise-mediante-síntese são apresentadas de forma a relacionar algumas características perceptivas da audição humana as técnicas e parâmetros usados no processamento de sinais. Esta classe de codificadores e descrita no contexto do codificador preditivo excitado por códigos. Estruturas especiais do codificador tais como livros de códigos adaptativos, esparsos e definidos por base vetorial são abordadas bem como melhoramentos de processamento tais quais as buscas com ortogonalidade. Propõe-se um novo codificador, o codificador preditivo linear com excitação decomposta em vetores singulares, que complementa uma representação recentemente anunciada da excitação da voz com buscas em livros de códigos adaptativos. Os resultados de um estudo de codificadores principais desta classe são apresentados. A analise comparativa baseia-se em medidas objetivas temporais e espectrais. Um estudo suplementar de seleção espectral das características da excitação e de quantização do conjunto completo de parâmetros do codificador proposto revelou resultados interessantes sobre a representação espectral adaptativa e sobre a sensibilidade a quantização das características da excitação. / Analysis-by-synthesis linear predictive speech coders are widely applied in mobile and secure telecommunications. Linear prediction of speech signals and analysis-by-synthesis techniques are presented so that some perceptual features of human hearing may be related to signal processing techniques and parameters. The basic operation of this class of coders is described in the framework of the code-excited predictive coder. Special coder structures such as adaptive, sparse and vector-basis codebooks are introduced as well as processing enhancements such as orthogonal searches. A recently introduced representation of voice excitation is complemented by adaptive codebook searches to give rise to the new proposed coder, the singular-vector-decomposed excitation linear predictive coder. The sults of a study of some important coders in this class is present. The coders are compared on the basis of waveform and spectral objective distortion measures. A further study of spectral selection of excitation features, and quantization of the whole set of parameters is performed on the proposed coder. Some interesting results are described concerning the adaptive spectral representation and the sensitivity to quantization of the excitation features.
|
25 |
Codificador preditivo de voz por análise mediante síntese. / Analysis-by-synthesis linear predictive speech coder.Miguel Arjona Ramirez 18 December 1992 (has links)
Os codificadores preditivos de voz por analise-mediante-síntese vem sendo amplamente aplicados em telefonia móvel celular e em telecomunicações sigilosas. A predição linear do sinal de voz e as técnicas de análise-mediante-síntese são apresentadas de forma a relacionar algumas características perceptivas da audição humana as técnicas e parâmetros usados no processamento de sinais. Esta classe de codificadores e descrita no contexto do codificador preditivo excitado por códigos. Estruturas especiais do codificador tais como livros de códigos adaptativos, esparsos e definidos por base vetorial são abordadas bem como melhoramentos de processamento tais quais as buscas com ortogonalidade. Propõe-se um novo codificador, o codificador preditivo linear com excitação decomposta em vetores singulares, que complementa uma representação recentemente anunciada da excitação da voz com buscas em livros de códigos adaptativos. Os resultados de um estudo de codificadores principais desta classe são apresentados. A analise comparativa baseia-se em medidas objetivas temporais e espectrais. Um estudo suplementar de seleção espectral das características da excitação e de quantização do conjunto completo de parâmetros do codificador proposto revelou resultados interessantes sobre a representação espectral adaptativa e sobre a sensibilidade a quantização das características da excitação. / Analysis-by-synthesis linear predictive speech coders are widely applied in mobile and secure telecommunications. Linear prediction of speech signals and analysis-by-synthesis techniques are presented so that some perceptual features of human hearing may be related to signal processing techniques and parameters. The basic operation of this class of coders is described in the framework of the code-excited predictive coder. Special coder structures such as adaptive, sparse and vector-basis codebooks are introduced as well as processing enhancements such as orthogonal searches. A recently introduced representation of voice excitation is complemented by adaptive codebook searches to give rise to the new proposed coder, the singular-vector-decomposed excitation linear predictive coder. The sults of a study of some important coders in this class is present. The coders are compared on the basis of waveform and spectral objective distortion measures. A further study of spectral selection of excitation features, and quantization of the whole set of parameters is performed on the proposed coder. Some interesting results are described concerning the adaptive spectral representation and the sensitivity to quantization of the excitation features.
|
26 |
Estudo de algoritmos de quantização vetorial aplicados a sinais de fala / Study of vector quantization algorithms applied to speech signalsViolato, Ricardo Paranhos Velloso 07 August 2010 (has links)
Orientador: Fernando José Von Zuben / Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de Computação / Made available in DSpace on 2018-08-16T10:52:32Z (GMT). No. of bitstreams: 1
Violato_RicardoParanhosVelloso_M.pdf: 5520106 bytes, checksum: 47f6f741b5c013a3252e50dddb37923c (MD5)
Previous issue date: 2010 / Resumo: Este trabalho apresenta um estudo comparativo de três algoritmos de quantização vetorial, aplicados para a compressão de sinais de fala: k-médias, NG (do inglês Neural-Gas) e ARIA. Na técnica de compressão utilizada, os sinais são primeiramente parametrizados e quantizados, para serem armazenados e/ou transmitidos. Para recompor o sinal, os vetores quantizados são mapeados em quadros de fala, que são, por sua vez, concatenados, através de uma técnica de síntese concatenativa. Esse sistema pressupõe a existência de um dicionário (codebook) de vetores-padrão (codevectors), os quais são utilizados na etapa de codificação, e de um dicionário de quadros, que é utilizado na etapa de decodificação. Tais dicionários são gerados aplicando-se um algoritmo de quantização vetorial juntoa uma base de treinamento. Em particular, deseja-se avaliar o algoritmo imuno-inspirado denominado ARIA e sua capacidade de preservação da densidade da distribuição dos dados. São testados também diferentes conjuntos de parâmetros para identificar aquele que produz os melhores resultados. Por fim, são propostas modificações no algoritmo ARIA visando ganho de desempenho tanto na preservação de densidade quanto na qualidade do sinal sintetizado / Abstract: This work presents a comparative study of three algorithms for vector quantization, applied for the compression of speech signals: k-means, NG (Neural-Gas) and ARIA. In the compression technique used, the signals are first parameterized and quantized to be stored and/or transmitted. To reconstruct the signal, the quantized vectors are mapped into speech frames, which are concatenated through a concatenative synthesis technique. This system assumes the existence of a dictionary (codebook) of reference vectors (codevectors), which is used in the coding step, and a dictionary of frames, which is used in the decoding step. These dictionaries are generated by applying a vector quantization algorithm within a training database. In particular, we want to evaluate the immune-inspired algorithm called ARIA and its ability to preserve the density of data distribution. Different sets of parameters are also tested in order to identify the one that produces the best results. Finally, modifications to the ARIA algorithm are proposed aiming at obtaining gain in performance in both the preservation of density and the quality of the synthesized signal / Mestrado / Engenharia de Computação / Mestre em Engenharia Elétrica
|
27 |
Voice Activity Detection in the Tiger PlatformThorell, Hampus January 2006 (has links)
<p>Sectra Communications AB has developed a terminal for encrypted communication called the Tiger platform. During voice communication delays have sometimes been experienced resulting in conversational complications.</p><p>A solution to this problem, as was proposed by Sectra, would be to introduce voice activity detection, which means a separation of speech parts and non-speech parts of the input signal, to the Tiger platform. By only transferring the speech parts to the receiver, the bandwidth needed should be dramatically decreased. A lower bandwidth needed implies that the delays slowly should disappear. The problem is then to come up with a method that manages to distinguish the speech parts from the input signal. Fortunately a lot of theory on the subject has been done and numerous voice activity methods exist today.</p><p>In this thesis the theory of voice activity detection has been studied. A review of voice activity detectors that exist on the market today followed by an evaluation of some of these was performed in order to select a suitable candidate for the Tiger platform. This evaluation would later become the foundation for the selection of a voice activity detector for implementation.</p><p>Finally, the implementation of the chosen voice activity detector, including a comfort noise generator, was done on the platform. This implementation was based on the special requirements of the platform. Tests of the implementation in office environments show that possible delays are steadily being reduced during periods of speech inactivity, while the active speech quality is preserved.</p>
|
28 |
Voice Activity Detection in the Tiger PlatformThorell, Hampus January 2006 (has links)
Sectra Communications AB has developed a terminal for encrypted communication called the Tiger platform. During voice communication delays have sometimes been experienced resulting in conversational complications. A solution to this problem, as was proposed by Sectra, would be to introduce voice activity detection, which means a separation of speech parts and non-speech parts of the input signal, to the Tiger platform. By only transferring the speech parts to the receiver, the bandwidth needed should be dramatically decreased. A lower bandwidth needed implies that the delays slowly should disappear. The problem is then to come up with a method that manages to distinguish the speech parts from the input signal. Fortunately a lot of theory on the subject has been done and numerous voice activity methods exist today. In this thesis the theory of voice activity detection has been studied. A review of voice activity detectors that exist on the market today followed by an evaluation of some of these was performed in order to select a suitable candidate for the Tiger platform. This evaluation would later become the foundation for the selection of a voice activity detector for implementation. Finally, the implementation of the chosen voice activity detector, including a comfort noise generator, was done on the platform. This implementation was based on the special requirements of the platform. Tests of the implementation in office environments show that possible delays are steadily being reduced during periods of speech inactivity, while the active speech quality is preserved.
|
29 |
Neural representations of natural speech in a chinchilla model of noise-induced hearing lossSatyabrata Parida (9759374) 14 December 2020 (has links)
<div>Hearing loss hinders the communication ability of many individuals despite state-of-the-art interventions. Animal models of different hearing-loss etiologies can help improve the clinical outcomes of these interventions; however, several gaps exist. First, translational aspects of animal models are currently limited because anatomically and physiologically specific data obtained from animals are analyzed differently compared to noninvasive evoked responses that can be recorded from humans. Second, we lack a comprehensive understanding of the neural representation of everyday sounds (e.g., naturally spoken speech) in real-life settings (e.g., in background noise). This is even true at the level of the auditory nerve, which is the first bottleneck of auditory information flow to the brain and the first neural site to exhibit crucial effects of hearing-loss. </div><div><br></div><div>To address these gaps, we developed a unifying framework that allows direct comparison of invasive spike-train data and noninvasive far-field data in response to stationary and nonstationary sounds. We applied this framework to recordings from single auditory-nerve fibers and frequency-following responses from the scalp of anesthetized chinchillas with either normal hearing or noise-induced mild-moderate hearing loss in response to a speech sentence in noise. Key results for speech coding following hearing loss include: (1) coding deficits for voiced speech manifest as tonotopic distortions without a significant change in driven rate or spike-time precision, (2) linear amplification aimed at countering audiometric threshold shift is insufficient to restore neural activity for low-intensity consonants, (3) susceptibility to background noise increases as a direct result of distorted tonotopic mapping following acoustic trauma, and (4) temporal-place representation of pitch is also degraded. Finally, we developed a noninvasive metric to potentially diagnose distorted tonotopy in humans. These findings help explain the neural origins of common perceptual difficulties that listeners with hearing impairment experience, offer several insights to make hearing-aids more individualized, and highlight the importance of better clinical diagnostics and noise-reduction algorithms. </div>
|
30 |
Nouvelles méthodes multi-échelles pour l'analyse non-linéaire de la parole / Novel multiscale methods for nonlinear speech analysisKhanagha, Vahid 16 January 2013 (has links)
Cette thèse présente une recherche exploratoire sur l'application du Formalisme Microcanonique Multiéchelles (FMM) à l'analyse de la parole. Dérivé de principes issus en physique statistique, le FMM permet une analyse géométrique précise de la dynamique non linéaire des signaux complexes. Il est fondé sur l'estimation des paramètres géométriques locaux (les exposants de singularité) qui quantifient le degré de prédictibilité à chaque point du signal. Si correctement définis est estimés, ils fournissent des informations précieuses sur la dynamique locale de signaux complexes. Nous démontrons le potentiel du FMM dans l'analyse de la parole en développant: un algorithme performant pour la segmentation phonétique, un nouveau codeur, un algorithme robuste pour la détection précise des instants de fermeture glottale, un algorithme rapide pour l’analyse par prédiction linéaire parcimonieuse et une solution efficace pour l’approximation multipulse du signal source d'excitation. / This thesis presents an exploratory research on the application of a nonlinear multiscale formalism, called the Microcanonical Multiscale Formalism (the MMF), to the analysis of speech signals. Derived from principles in Statistical Physics, the MMF allows accurate analysis of the nonlinear dynamics of complex signals. It relies on the estimation of local geometrical parameters, the singularity exponents (SE), which quantify the degree of predictability at each point of the signal domain. When correctly defined and estimated, these exponents can provide valuable information about the local dynamics of complex signals and has been successfully used in many applications ranging from signal representation to inference and prediction.We show the relevance of the MMF to speech analysis and develop several applications to show the strength and potential of the formalism. Using the MMF, in this thesis we introduce: a novel and accurate text-independent phonetic segmentation algorithm, a novel waveform coder, a robust accurate algorithm for detection of the Glottal Closure Instants, a closed-form solution for the problem of sparse linear prediction analysis and finally, an efficient algorithm for estimation of the excitation source signal.
|
Page generated in 0.084 seconds