51 |
Rate-distortion based video coding with adaptive mean-removed vector quantizationHamzaoui, Raouf, Saupe, Dietmar, Wagner, Marcel 01 February 2019 (has links)
In this paper we improve the rate-distortion performance of a previously proposed video coder based on frame replenishment and adaptive mean-removed vector quantization. This is realized by determining for each block of a given frame the optimal encoding mode in the rate-distortion sense. The algorithm is a new contribution to very low bit rate video coding with adaptive vector quantization suitable for
videophone applications. Experimental results comparing the two coders for several test sequences at different bit rates are provided.
|
52 |
Self-Organizing Neural Networks for Sequence ProcessingStrickert, Marc 27 January 2005 (has links)
This work investigates the self-organizing representation of temporal data in prototype-based neural networks. Extensions of the supervised learning vector quantization (LVQ) and the unsupervised self-organizing map (SOM) are considered in detail. The principle of Hebbian learning through prototypes yields compact data models that can be easily interpreted by similarity reasoning. In order to obtain a robust prototype dynamic, LVQ is extended by neighborhood cooperation between neurons to prevent a strong dependence on the initial prototype locations. Additionally, implementations of more general, adaptive metrics are studied with a particular focus on the built-in detection of data attributes involved for a given classifcation task. For unsupervised sequence processing, two modifcations of SOM are pursued: the SOM for structured data (SOMSD) realizing an efficient back-reference to the previous best matching neuron in a triangular low-dimensional neural lattice, and the merge SOM (MSOM) expressing the temporal context as a fractal combination of the previously most active neuron and its context. The first SOMSD extension tackles data dimension reduction and planar visualization, the second MSOM is designed for obtaining higher quantization accuracy. The supplied experiments underline the data modeling quality of the presented methods.
|
53 |
Rate-Distortion Performance And Complexity Optimized Structured Vector QuantizationChatterjee, Saikat 07 1900 (has links)
Although vector quantization (VQ) is an established topic in communication, its practical utility has been limited due to (i) prohibitive complexity for higher quality and bit-rate, (ii) structured VQ methods which are not analyzed for optimum performance, (iii) difficulty of mapping theoretical performance of mean square error (MSE) to perceptual measures. However, an ever increasing demand for various source signal compression, points to VQ as the inevitable choice for high efficiency. This thesis addresses all the three above issues, utilizing the power of parametric stochastic modeling of the signal source, viz., Gaussian mixture model (GMM) and proposes new solutions. Addressing some of the new requirements of source coding in network applications, the thesis also presents solutions for scalable bit-rate, rate-independent complexity and decoder scalability.
While structured VQ is a necessity to reduce the complexity, we have developed, analyzed and compared three different schemes of compensation for the loss due to structured VQ. Focusing on the widely used methods of split VQ (SVQ) and KLT based transform domain scalar quantization (TrSQ), we develop expressions for their optimum performance using high rate quantization theory. We propose the use of conditional PDF based SVQ (CSVQ) to compensate for the split loss in SVQ and analytically show that it achieves coding gain over SVQ. Using the analytical expressions of complexity, an algorithm to choose the optimum splits is proposed. We analyze these techniques for their complexity as well as perceptual distortion measure, considering the specific case of quantizing the wide band speech line spectrum frequency (LSF) parameters. Using natural speech data, it is shown that the new conditional PDF based methods provide better perceptual distortion performance than the traditional methods.
Exploring the use of GMMs for the source, we take the approach of separately estimating the GMM parameters and then use the high rate quantization theory in a simplified manner to derive closed form expressions for optimum MSE performance. This has led to the development of non-linear prediction for compensating the split loss (in contrast to the linear prediction using a Gaussian model). We show that the GMM approach can improve the recently proposed adaptive VQ scheme of switched SVQ (SSVQ). We derive the optimum performance expressions for SSVQ, in both variable bit rate and fixed bit rate formats, using the simplified approach of GMM in high rate theory.
As a third scheme for recovering the split loss in SVQ and reduce the complexity, we propose a two stage SVQ (TsSVQ), which is analyzed for minimum complexity as well as perceptual distortion. Utilizing the low complexity of transform domain SVQ (TrSVQ) as well as the two stage approach in a universal coding framework, it is shown that we can achieve low complexity as well as better performance than SSVQ. Further, the combination of GMM and universal coding led to the development of a highly scalable coder which can provide both bit-rate scalability, decoder scalability and rate-independent low complexity. Also, the perceptual distortion performance is comparable to that of SSVQ.
Since GMM is a generic source model, we develop a new method of predicting the performance bound for perceptual distortion using VQ. Applying this method to LSF quantization, the minimum bit rates for quantizing telephone band LSF (TB-LSF) and wideband LSF (WB-LSF) are derived.
|
54 |
Integration of Auxiliary Data Knowledge in Prototype Based Vector Quantization and Classification ModelsKaden, Marika 14 July 2016 (has links) (PDF)
This thesis deals with the integration of auxiliary data knowledge into machine learning methods especially prototype based classification models. The problem of classification is diverse and evaluation of the result by using only the accuracy is not adequate in many applications. Therefore, the classification tasks are analyzed more deeply. Possibilities to extend prototype based methods to integrate extra knowledge about the data or the classification goal is presented to obtain problem adequate models. One of the proposed extensions is Generalized Learning Vector Quantization for direct optimization of statistical measurements besides the classification accuracy. But also modifying the metric adaptation of the Generalized Learning Vector Quantization for functional data, i. e. data with lateral dependencies in the features, is considered.
|
55 |
ON THE CONVERGENCE AND APPLICATIONS OF MEAN SHIFT TYPE ALGORITHMSAliyari Ghassabeh, Youness 01 October 2013 (has links)
Mean shift (MS) and subspace constrained mean shift (SCMS) algorithms are non-parametric, iterative methods to find a representation of a high dimensional data set on a principal curve or surface embedded in a high dimensional space. The representation of high dimensional data on a principal curve or surface, the class of mean shift type algorithms and their properties, and applications of these algorithms are the main focus of this dissertation. Although MS and SCMS algorithms have been used in many applications, a rigorous study of their convergence is still missing. This dissertation aims to fill some of the gaps between theory and practice by investigating some convergence properties of these algorithms. In particular, we propose a sufficient condition for a kernel density estimate with a Gaussian kernel to have isolated stationary points to guarantee the convergence of the MS algorithm. We also show that the SCMS algorithm inherits some of the important convergence properties of the MS algorithm. In particular, the monotonicity and convergence of the density estimate values along the sequence of output values of the algorithm are shown. We also show that the distance between consecutive points of the output sequence converges to zero, as does the projection of the gradient vector onto the subspace spanned by the D-d eigenvectors corresponding to the D-d largest eigenvalues of the local inverse covariance matrix. Furthermore, three new variations of the SCMS algorithm are proposed and the running times and performance of the resulting algorithms are compared with original SCMS algorithm. We also propose an adaptive version of the SCMS algorithm to consider the effect of new incoming samples without running the algorithm on the whole data set. As well, we develop some new potential applications of the MS and SCMS algorithm. These applications involve finding straight lines in digital images; pre-processing data before applying locally linear embedding (LLE) and ISOMAP for dimensionality reduction; noisy source vector quantization where the clean data need to be estimated before the quanization step; improving the performance of kernel regression in certain situations; and skeletonization of digitally stored handwritten characters. / Thesis (Ph.D, Mathematics & Statistics) -- Queen's University, 2013-09-30 18:01:12.959
|
56 |
Reconhecimento automático de locutor em modo independente de texto por Self-Organizing Maps. / Text independent automatic speaker recognition using Self-Organizing Maps.Mafra, Alexandre Teixeira 18 December 2002 (has links)
Projetar máquinas capazes identificar pessoas é um problema cuja solução encontra uma grande quantidade de aplicações. Implementações em software de sistemas baseados em medições de características físicas pessoais (biométricos), estão começando a ser produzidos em escala comercial. Nesta categoria estão os sistemas de Reconhecimento Automático de Locutor, que se usam da voz como característica identificadora. No presente momento, os métodos mais populares são baseados na extração de coeficientes mel-cepstrais (MFCCs) das locuções, seguidos da identificação do locutor através de Hidden Markov Models (HMMs), Gaussian Mixture Models (GMMs) ou quantização vetorial. Esta preferência se justifica pela qualidade dos resultados obtidos. Fazer com que estes sistemas sejam robustos, mantendo sua eficiência em ambientes ruidosos, é uma das grandes questões atuais. Igualmente relevantes são os problemas relativos à degradação de performance em aplicações envolvendo um grande número de locutores, e a possibilidade de fraude baseada em vozes gravadas. Outro ponto importante é embarcar estes sistemas como sub-sistemas de equipamentos já existentes, tornando-os capazes de funcionar de acordo com o seu operador. Este trabalho expõe os conceitos e algoritmos envolvidos na implementação de um software de Reconhecimento Automático de Locutor independente de texto. Inicialmente é tratado o processamento dos sinais de voz e a extração dos atributos essenciais deste sinal para o reconhecimento. Após isto, é descrita a forma pela qual a voz de cada locutor é modelada através de uma rede neural de arquitetura Self-Organizing Map (SOM) e o método de comparação entre as respostas dos modelos quando apresentada uma locução de um locutor desconhecido. Por fim, são apresentados o processo de construção do corpus de vozes usado para o treinamento e teste dos modelos, as arquiteturas de redes testadas e os resultados experimentais obtidos numa tarefa de identificação de locutor. / The design of machines that can identify people is a problem whose solution has a wide range of applications. Software systems, based on personal phisical attributes measurements (biometrics), are in the beginning of commercial scale production. Automatic Speaker Recognition systems fall into this cathegory, using voice as the identifying attribute. At present, the most popular methods are based on the extraction of mel-frequency cepstral coefficients (MFCCs), followed by speaker identification by Hidden Markov Models (HMMs), Gaussian Mixture Models (GMMs) or vector quantization. This preference is motivated by the quality of the results obtained by the use of these methods. Making these systems robust, able to keep themselves efficient in noisy environments, is now a major concern. Just as relevant are the problems related to performance degradation in applications with a large number of speakers involved, and the issues related to the possibility of fraud by the use of recorded voices. Another important subject is to embed these systems as sub-systems of existing devices, enabling them to work according to the operator. This work presents the relevant concepts and algorithms concerning the implementation of a text-independent Automatic Speaker Recognition software system. First, the voice signal processing and the extraction of its essential features for recognition are treated. After this, it is described the way each speaker\'s voice is represented by a Self-Organizing Map (SOM) neural network, and the comparison method of the models responses when a new utterance from an unknown speaker is presented. At last, it is described the construction of the speech corpus used for training and testing the models, the neural network architectures tested, and the experimental results obtained in a speaker identification task.
|
57 |
Reconhecimento automático de locutor em modo independente de texto por Self-Organizing Maps. / Text independent automatic speaker recognition using Self-Organizing Maps.Alexandre Teixeira Mafra 18 December 2002 (has links)
Projetar máquinas capazes identificar pessoas é um problema cuja solução encontra uma grande quantidade de aplicações. Implementações em software de sistemas baseados em medições de características físicas pessoais (biométricos), estão começando a ser produzidos em escala comercial. Nesta categoria estão os sistemas de Reconhecimento Automático de Locutor, que se usam da voz como característica identificadora. No presente momento, os métodos mais populares são baseados na extração de coeficientes mel-cepstrais (MFCCs) das locuções, seguidos da identificação do locutor através de Hidden Markov Models (HMMs), Gaussian Mixture Models (GMMs) ou quantização vetorial. Esta preferência se justifica pela qualidade dos resultados obtidos. Fazer com que estes sistemas sejam robustos, mantendo sua eficiência em ambientes ruidosos, é uma das grandes questões atuais. Igualmente relevantes são os problemas relativos à degradação de performance em aplicações envolvendo um grande número de locutores, e a possibilidade de fraude baseada em vozes gravadas. Outro ponto importante é embarcar estes sistemas como sub-sistemas de equipamentos já existentes, tornando-os capazes de funcionar de acordo com o seu operador. Este trabalho expõe os conceitos e algoritmos envolvidos na implementação de um software de Reconhecimento Automático de Locutor independente de texto. Inicialmente é tratado o processamento dos sinais de voz e a extração dos atributos essenciais deste sinal para o reconhecimento. Após isto, é descrita a forma pela qual a voz de cada locutor é modelada através de uma rede neural de arquitetura Self-Organizing Map (SOM) e o método de comparação entre as respostas dos modelos quando apresentada uma locução de um locutor desconhecido. Por fim, são apresentados o processo de construção do corpus de vozes usado para o treinamento e teste dos modelos, as arquiteturas de redes testadas e os resultados experimentais obtidos numa tarefa de identificação de locutor. / The design of machines that can identify people is a problem whose solution has a wide range of applications. Software systems, based on personal phisical attributes measurements (biometrics), are in the beginning of commercial scale production. Automatic Speaker Recognition systems fall into this cathegory, using voice as the identifying attribute. At present, the most popular methods are based on the extraction of mel-frequency cepstral coefficients (MFCCs), followed by speaker identification by Hidden Markov Models (HMMs), Gaussian Mixture Models (GMMs) or vector quantization. This preference is motivated by the quality of the results obtained by the use of these methods. Making these systems robust, able to keep themselves efficient in noisy environments, is now a major concern. Just as relevant are the problems related to performance degradation in applications with a large number of speakers involved, and the issues related to the possibility of fraud by the use of recorded voices. Another important subject is to embed these systems as sub-systems of existing devices, enabling them to work according to the operator. This work presents the relevant concepts and algorithms concerning the implementation of a text-independent Automatic Speaker Recognition software system. First, the voice signal processing and the extraction of its essential features for recognition are treated. After this, it is described the way each speaker\'s voice is represented by a Self-Organizing Map (SOM) neural network, and the comparison method of the models responses when a new utterance from an unknown speaker is presented. At last, it is described the construction of the speech corpus used for training and testing the models, the neural network architectures tested, and the experimental results obtained in a speaker identification task.
|
58 |
AvaliaÃÃo de redes neurais competitivas em tarefas de quantizaÃÃo vetorial:um estudo comparativo / Evaluation of competitive neural networks in tasks of vector quantization (VQ): a comparative studyMagnus Alencar da cruz 06 October 2007 (has links)
nÃo hà / Esta dissertaÃÃo tem como principal meta realizar um estudo comparativo do desempenho de algoritmos de redes neurais competitivas nÃo-supervisionadas em problemas de quantizaÃÃo vetorial (QV) e aplicaÃÃes correlatas, tais como anÃlise de agrupamentos (clustering) e compressÃo de imagens. A motivaÃÃo para tanto parte da percepÃÃo de que hà uma relativa escassez de estudos comparativos sistemÃticos entre algoritmos neurais e nÃo-neurais de anÃlise de agrupamentos na literatura especializada. Um total de sete algoritmos sÃo avaliados, a saber: algoritmo K -mÃdias e as redes WTA, FSCL, SOM, Neural-Gas, FuzzyCL e RPCL. De particular interesse à a seleÃÃo do nÃmero Ãtimo de neurÃnios. NÃo hà um mÃtodo que funcione para todas as situaÃÃes, restando portanto avaliar a influÃncia que cada tipo de mÃtrica exerce sobre algoritmo em estudo. Por exemplo, os algoritmos de QV supracitados sÃo bastante usados em tarefas de clustering. Neste tipo de aplicaÃÃo, a validaÃÃo dos agrupamentos à feita com base em Ãndices que quantificam os graus de compacidade e separabilidade dos agrupamentos encontrados, tais como Ãndice Dunn e Ãndice Davies-Bouldin (DB). Jà em tarefas de compressÃo de imagens, determinado algoritmo de QV à avaliado em funÃÃo da qualidade da informaÃÃo reconstruÃda, daà as mÃtricas mais usadas serem o erro quadrÃtico mÃdio de quantizaÃÃo (EQMQ) ou a relaÃÃo sinal-ruÃdo de pico (PSNR). Empiricamente verificou-se que, enquanto o Ãndice DB favorece arquiteturas com poucos protÃtipos e o Dunn com muitos, as mÃtricas EQMQ e PSNR sempre favorecem nÃmeros ainda maiores. Nenhuma das mÃtricas supracitadas leva em consideraÃÃo o nÃmero de parÃmetros do modelo. Em funÃÃo disso, esta dissertaÃÃo propÃe o uso do critÃrio de informaÃÃo de Akaike (AIC) e o critÃrio do comprimento descritivo mÃnimo (MDL) de Rissanen para selecionar o nÃmero Ãtimo de protÃtipos. Este tipo de mÃtrica mostra-se Ãtil na busca do nÃmero de protÃtipos que satisfaÃa simultaneamente critÃrios opostos, ou seja, critÃrios que buscam o menor erro de reconstruÃÃo a todo custo (MSE e PSNR) e critÃrios que buscam clusters mais compactos e coesos (Ãndices Dunn e DB). Como conseqÃÃncia, o nÃmero de protÃtipos obtidos pelas mÃtricas AIC e MDL à geralmente um valor intermediÃrio, i.e. nem tÃo baixo quanto o sugerido pelos Ãndices Dunn e DB, nem tÃo altos quanto o sugerido pelas mÃtricas MSE e PSNR. Outra conclusÃo importante à que nÃo necessariamente os algoritmos mais sofisticados do ponto de vista da modelagem, tais como as redes SOM e Neural-Gas, sÃo os que apresentam melhores desempenhos em tarefas de clustering e quantizaÃÃo vetorial. Os algoritmos FSCL e FuzzyCL sÃo os que apresentam melhores resultados em tarefas de quantizaÃÃo vetorial, com a rede FSCL apresentando melhor relaÃÃo custo-benefÃcio, em funÃÃo do seu menor custo computacional. Para finalizar, vale ressaltar que qualquer que seja o algoritmo escolhido, se o mesmo tiver seus parÃmetros devidamente ajustados e seus desempenhos devidamente avaliados, as diferenÃas de performance entre os mesmos sÃo desprezÃveis, ficando como critÃrio de desempate o custo computacional. / The main goal of this master thesis was to carry out a comparative study of the performance of algorithms of unsupervised competitive neural networks in problems of vector quantization (VQ) tasks and related applications, such as cluster analysis and image compression. This study is mainly motivated by the relative scarcity of systematic comparisons between neural and nonneural algorithms for VQ in specialized literature. A total of seven algorithms are evaluated, namely: K-means, WTA, FSCL, SOM, Neural-Gas, FuzzyCL and RPCL. Of particular interest is the problem of selecting an adequate number of neurons given a particular vector quantization problem. Since there is no widespread method that works satisfactorily for all applications, the remaining alternative is to evaluate the influence that each type of evaluation metric has on a specific algorithm. For example, the aforementioned vector quantization algorithms are widely used in clustering-related tasks. For this type of application, cluster validation is based on indexes that quantify the degrees of compactness and separability among clusters, such as the Dunn Index and the Davies- Bouldin (DB) Index. In image compression tasks, however, a given vector quantization algorithm is evaluated in terms of the quality of the reconstructed information, so that the most used evaluation metrics are the mean squared quantization error (MSQE) and the peak signal-to-noise ratio (PSNR). This work verifies empirically that, while the indices Dunn and DB or favors architectures with many prototypes (Dunn) or with few prototypes (DB), metrics MSE and PSNR always favor architectures with well bigger amounts. None of the evaluation metrics cited previously takes into account the number of parameters of the model. Thus, this thesis evaluates the feasibility of the use of the Akaikeâs information criterion (AIC) and Rissanenâs minimum description length (MDL) criterion to select the optimal number of prototypes. This type of evaluation metric indeed reveals itself useful in the search of the number of prototypes that simultaneously satisfies conflicting criteria, i.e. those favoring more compact and cohesive clusters (Dunn and DB indices) versus those searching for very low reconstruction errors (MSE and PSNR). Thus, the number of prototypes suggested by AIC and MDL is generally an intermediate value, i.e nor so low as much suggested for the indexes Dunn and DB, nor so high as much suggested one for metric MSE and PSNR. Another important conclusion is that sophisticated models, such as the SOM and Neural- Gas networks, not necessarily have the best performances in clustering and VQ tasks. For example, the algorithms FSCL and FuzzyCL present better results in terms of the the of the reconstructed information, with the FSCL presenting better cost-benefit ratio due to its lower computational cost. As a final remark, it is worth emphasizing that if a given algorithm has its parameters suitably tuned and its performance fairly evaluated, the differences in performance compared to others prototype-based algorithms is minimum, with the coputational cost being used to break ties.
|
59 |
ProposiÃÃo e avaliaÃÃo de algoritmos de filtragem adaptativa baseados na rede de kohonen / Proposition and evaluation of the adaptive filtering algorithms basad on the kohonenLuis Gustavo Mota Souza 02 June 2007 (has links)
nÃo hà / A Rede Auto-OrganizÃvel de Kohonen (Self-Organizing Map - SOM), por empregar um algoritmo de aprendizado nÃo supervisionado, vem sendo tradicionalmente aplicada na Ãrea de processamento de sinais em tarefas de quantizaÃÃo vetorial, enquanto que redes MLP (Multi-layer Perceptron) e RBF (Radial Basis Function) dominam as aplicaÃÃes que exigem a aproximaÃÃo de mapeamentos entrada-saÃda. Este tipo de aplicaÃÃo à comumente encontrada em tarefas de filtragem adaptativa que podem ser formatadas segundo a Ãtica da modelagem direta e inversa de sistemas, tais como identificaÃÃo equalizaÃÃo de canais de comunicaÃÃo. Nesta dissertaÃÃo, a gama de aplicaÃÃes da rede SOM à estendida atravÃs da proposiÃÃo de filtros adaptativos neurais baseados nesta rede, mostrando que os mesmos sÃo alternativas viÃveis aos filtros nÃo-lineares baseados nas redes MLP e RBF. Isto torna-se possÃvel graÃas ao uso de uma tÃcnica recentemente proposta, Quantized Temporal Associative Memory - VQTAM), que basicamente usa a filosofia de chamada MemÃria Associativa Temporal por QuantizaÃÃo Vetorial (Vector )treinamento da rede SOM para realizar a quantizaÃÃo vetorial simultÃnea dos espaÃos de entrada e de saÃda relativos ao problema de filtragem analisado. A partir da tÃcnica VQTAM, sÃo propostos trÃs arquiteturas de filtros adaptativos baseadas na rede SOM, cujos desempenhos foram avaliados em tarefas de identificaÃÃo e equalizaÃÃo de canais nÃolineares. O canal usado nas simulaÃÃes foi modelado como um processo auto-regressivo de Gauss-Markov de primeira ordem, contaminado com ruÃdo branco gaussiano e dotado de nÃo-linearidade do tipo saturaÃÃo (sigmoidal). Os resultados obtidos mostram que filtros adaptativos baseados na rede SOM tÃm desempenho equivalente ou superior aos tradicionais filtros transversais lineares e aos filtros nÃo-lineares baseados na rede MLP.
|
60 |
Representation Of Covariance Matrices In Track Fusion ProblemsGunay, Melih 01 November 2007 (has links) (PDF)
Covariance Matrix in target tracking algorithms has a critical role at multi-
sensor track fusion systems. This matrix reveals the uncertainty of state es-
timates that are obtained from diferent sensors. So, many subproblems of
track fusion usually utilize this matrix to get more accurate results. That is
why this matrix should be interchanged between the nodes of the multi-sensor
tracking system. This thesis mainly deals with analysis of approximations of
the covariance matrix that can best represent this matrix in order to efectively
transmit this matrix to the demanding site. Kullback-Leibler (KL) Distance
is exploited to derive some of the representations for Gaussian case. Also com-
parison of these representations is another objective of this work and this is
based on the fusion performance of the representations and the performance
is measured for a system of a 2-radar track fusion system.
|
Page generated in 0.127 seconds