Global ETD Search

71	Získávání znalostí z multimediálních databází / Knowledge Discovery in Multimedia Databases Jurčák, Petr January 2009 (has links) This master's thesis is dedicated to theme of knowledge discovery in Multimedia Databases, especially basic methods of classification and prediction used for data mining. The other part described about extraction of low level features from video data and images and summarizes information about content-based search in multimedia content and indexing this type of data. Final part is dedicated to implementation Gaussian mixtures model for classification and compare the final result with other method SVM.
72	Machine learning multicriteria optimization in radiation therapy treatment planning / Flermålsoptimering med maskininlärning inom strålterapiplanering Zhang, Tianfang January 2019 (has links) In radiation therapy treatment planning, recent works have used machine learning based on historically delivered plans to automate the process of producing clinically acceptable plans. Compared to traditional approaches such as repeated weighted-sum optimization or multicriteria optimization (MCO), automated planning methods have, in general, the benefits of low computational times and minimal user interaction, but on the other hand lack the flexibility associated with general-purpose frameworks such as MCO. Machine learning approaches can be especially sensitive to deviations in their dose prediction due to certain properties of the optimization functions usually used for dose mimicking and, moreover, suffer from the fact that there exists no general causality between prediction accuracy and optimized plan quality.In this thesis, we present a means of unifying ideas from machine learning planning methods with the well-established MCO framework. More precisely, given prior knowledge in the form of either a previously optimized plan or a set of historically delivered clinical plans, we are able to automatically generate Pareto optimal plans spanning a dose region corresponding to plans which are achievable as well as clinically acceptable. For the former case, this is achieved by introducing dose--volume constraints; for the latter case, this is achieved by fitting a weighted-data Gaussian mixture model on pre-defined dose statistics using the expectation--maximization algorithm, modifying it with exponential tilting and using specially developed optimization functions to take into account prediction uncertainties.Numerical results for conceptual demonstration are obtained for a prostate cancer case with treatment delivered by a volumetric-modulated arc therapy technique, where it is shown that the methods developed in the thesis are successful in automatically generating Pareto optimal plans of satisfactory quality and diversity, while excluding clinically irrelevant dose regions. For the case of using historical plans as prior knowledge, the computational times are significantly shorter than those typical of conventional MCO. / Inom strålterapiplanering har den senaste forskningen använt maskininlärning baserat på historiskt levererade planer för att automatisera den process i vilken kliniskt acceptabla planer produceras. Jämfört med traditionella angreppssätt, såsom upprepad optimering av en viktad målfunktion eller flermålsoptimering (MCO), har automatiska planeringsmetoder generellt sett fördelarna av lägre beräkningstider och minimal användarinteraktion, men saknar däremot flexibiliteten hos allmänna ramverk som exempelvis MCO. Maskininlärningsmetoder kan vara speciellt känsliga för avvikelser i dosprediktionssteget på grund av särskilda egenskaper hos de optimeringsfunktioner som vanligtvis används för att återskapa dosfördelningar, och lider dessutom av problemet att det inte finns något allmängiltigt orsakssamband mellan prediktionsnoggrannhet och kvalitet hos optimerad plan. I detta arbete presenterar vi ett sätt att förena idéer från maskininlärningsbaserade planeringsmetoder med det väletablerade MCO-ramverket. Mer precist kan vi, givet förkunskaper i form av antingen en tidigare optimerad plan eller en uppsättning av historiskt levererade kliniska planer, automatiskt generera Paretooptimala planer som täcker en dosregion motsvarande uppnåeliga såväl som kliniskt acceptabla planer. I det förra fallet görs detta genom att introducera dos--volym-bivillkor; i det senare fallet görs detta genom att anpassa en gaussisk blandningsmodell med viktade data med förväntning--maximering-algoritmen, modifiera den med exponentiell lutning och sedan använda speciellt utvecklade optimeringsfunktioner för att ta hänsyn till prediktionsosäkerheter.Numeriska resultat för konceptuell demonstration erhålls för ett fall av prostatacancer varvid behandlingen levererades med volymetriskt modulerad bågterapi, där det visas att metoderna utvecklade i detta arbete är framgångsrika i att automatiskt generera Paretooptimala planer med tillfredsställande kvalitet och variation medan kliniskt irrelevanta dosregioner utesluts. I fallet då historiska planer används som förkunskap är beräkningstiderna markant kortare än för konventionell MCO. Radiation therapy automated planning machine learning multicriteria optimization dose mimicking dose--volume criteria Gaussian mixture model expectation--maximization exponential tilting Strålterapi automatisk planering maskininlärning flermålsoptimering dosåterskapande dos--volym-kriterier gaussisk blandningsmodell förväntning--maximering exponentiell lutning Mathematics Matematik
73	Adaptive Estimation using Gaussian Mixtures Pfeifer, Tim 25 October 2023 (has links) This thesis offers a probabilistic solution to robust estimation using a novel adaptive estimator. Reliable state estimation is a mandatory prerequisite for autonomous systems interacting with the real world. The presence of outliers challenges the Gaussian assumption of numerous estimation algorithms, resulting in a potentially skewed estimate that compromises reliability. Many approaches attempt to mitigate erroneous measurements by using a robust loss function – which often comes with a trade-off between robustness and numerical stability. The proposed approach is purely probabilistic and enables adaptive large-scale estimation with non-Gaussian error models. The introduced Adaptive Mixture algorithm combines a nonlinear least squares backend with Gaussian mixtures as the measurement error model. Factor graphs as graphical representations allow an efficient and flexible application to real-world problems, such as simultaneous localization and mapping or satellite navigation. The proposed algorithms are constructed using an approximate expectation-maximization approach, which justifies their design probabilistically. This expectation-maximization is further generalized to enable adaptive estimation with arbitrary probabilistic models. Evaluating the proposed Adaptive Mixture algorithm in simulated and real-world scenarios demonstrates its versatility and robustness. A synthetic range-based localization shows that it provides reliable estimation results, even under extreme outlier ratios. Real-world satellite navigation experiments prove its robustness in harsh urban environments. The evaluation on indoor simultaneous localization and mapping datasets extends these results to typical robotic use cases. The proposed adaptive estimator provides robust and reliable estimation under various instances of non-Gaussian measurement errors. info:eu-repo/classification/ddc/629 ddc:629 info:eu-repo/classification/ddc/519 ddc:519
74	Incorporating Metadata Into the Active Learning Cycle for 2D Object Detection / Inkorporera metadata i aktiv inlärning för 2D objektdetektering Stadler, Karsten January 2021 (has links) In the past years, Deep Convolutional Neural Networks have proven to be very useful for 2D Object Detection in many applications. These types of networks require large amounts of labeled data, which can be increasingly costly for companies deploying these detectors in practice if the data quality is lacking. Pool-based Active Learning is an iterative process of collecting subsets of data to be labeled by a human annotator and used for training to optimize performance per labeled image. The detectors used in Active Learning cycles are conventionally pre-trained with a small subset, approximately 2% of available data labeled uniformly at random. This is something I challenged in this thesis by using image metadata. With the motivation of many Machine Learning models being a "jack of all trades, master of none", thus it is hard to train models such that they generalize to all of the data domain, it can be interesting to develop a detector for a certain target metadata domain. A simple Monte Carlo method, Rejection Sampling, can be implemented to sample according to a metadata target domain. This would require a target and proposal metadata distribution. The proposal metadata distribution would be a parametric model in the form of a Gaussian Mixture Model learned from the training metadata. The parametric model for the target distribution could be learned in a similar manner, however from a target dataset. In this way, only the training images with metadata most similar to the target metadata distribution can be sampled. This sampling approach was employed and tested with a 2D Object Detector: Faster-RCNN with ResNet-50 backbone. The Rejection Sampling approach was tested against conventional random uniform sampling and a classical Active Learning baseline: Min Entropy Sampling. The performance was measured and compared on two different target metadata distributions that were inferred from a specific target dataset. With a labeling budget of 2% for each cycle, the max Mean Average Precision at 0.5 Intersection Over Union for the target set each cycle was calculated. My proposed approach has a 40 % relative performance advantage over random uniform sampling for the first cycle, and 10% after 9 cycles. Overall, my approach only required 37 % of the labeled data to beat the next best-tested sampler: the conventional uniform random sampling. / De senaste åren har Djupa Neurala Faltningsnätverk visat sig vara mycket användbara för 2D Objektdetektering i många applikationer. De här typen av nätverk behöver stora mängder av etiketterat data, något som kan innebära ökad kostnad för företag som distribuerar dem, om kvaliteten på etiketterna är bristfällig. Pool-baserad Aktiv Inlärning är en iterativ process som innebär insamling av delmängder data som ska etiketteras av en människa och användas för träning, för att optimera prestanda per etiketterat data. Detektorerna som används i Aktiv Inlärning är konventionellt sätt förtränade med en mindre delmängd data, ungefär 2% av all tillgänglig data, etiketterat enligt slumpen. Det här är något jag utmanade i det här arbetet genom att använda bild metadata. Med motiveringen att många Maskininlärningsmodeller presterar sämre på större datadomäner, eftersom det kan vara svårt att lära detektorer stora datadomäner, kan det vara intressant att utveckla en detektor för ett särskild metadata mål-domän. För att samla in data enligt en metadata måldomän, kan en enkel Monte Carlo metod, Rejection Sampling implementeras. Det skulle behövas en mål-metadata-distribution och en faktisk metadata distribution. den faktiska metadata distributionen skulle vara en parametrisk modell i formen av en Gaussisk blandningsmodell som är tränad på träningsdata. Den parametriska modellen för mål-metadata-distributionen skulle kunna vara tränad på liknande sätt, fast ifrån mål-datasetet. På detta sätt, skulle endast träningsbilder med metadata mest lik mål-datadistributionen kunna samlas in. Den här samplings-metoden utvecklades och testades med en 2D objektdetektor: Faster R-CNN med ResNet-50 bildegenskapextraktor. Rejection sampling metoden blev testad mot konventionell likformig slumpmässig sampling av data och en klassisk Aktiv Inlärnings metod: Minimum Entropi sampling. Prestandan mättes och jämfördes mellan två olika mål-metadatadistributioner som var framtagna från specifika mål-metadataset. Med en etiketteringsbudget på 2%för varje cykel, så beräknades medelvärdesprecisionen om 0.5 snitt över union för mål-datasetet. Min metod har 40%bättre prestanda än slumpmässig likformig insamling i första cykeln, och 10 % efter 9 cykler. Överlag behövde min metod endast 37 % av den etiketterade data för att slå den näst basta samplingsmetoden: slumpmässig likformig insamling. Active learning Deep Learning Object detection Metadata Nuscenes Nuimages Gaussian mixture model Rejection sampling Monte-Carlo methods Aktiv Inlärning Djupinlärning Objektdetektering metadata Nuscenes Nuimages Gaussisk blandingsmodell Rejection sampling Monte-Carlo metoder Computer and Information Sciences Data- och informationsvetenskap
75	Nuevas contribuciones a la teoría y aplicación del procesado de señal sobre grafos Belda Valls, Jordi 16 January 2023 (has links) [ES] El procesado de señal sobre grafos es un campo emergente de técnicas que combinan conceptos de dos áreas muy consolidadas: el procesado de señal y la teoría de grafos. Desde la perspectiva del procesado de señal puede obtenerse una definición de la señal mucho más general asignando cada valor de la misma a un vértice de un grafo. Las señales convencionales pueden considerarse casos particulares en los que los valores de cada muestra se asignan a una cuadrícula uniforme (temporal o espacial). Desde la perspectiva de la teoría de grafos, se pueden definir nuevas transformaciones del grafo de forma que se extiendan los conceptos clásicos del procesado de la señal como el filtrado, la predicción y el análisis espectral. Además, el procesado de señales sobre grafos está encontrando nuevas aplicaciones en las áreas de detección y clasificación debido a su flexibilidad para modelar dependencias generales entre variables. En esta tesis se realizan nuevas contribuciones al procesado de señales sobre grafos. En primer lugar, se plantea el problema de estimación de la matriz Laplaciana asociada a un grafo, que determina la relación entre nodos. Los métodos convencionales se basan en la matriz de precisión, donde se asume implícitamente Gaussianidad. En esta tesis se proponen nuevos métodos para estimar la matriz Laplaciana a partir de las correlaciones parciales asumiendo respectivamente dos modelos no Gaussianos diferentes en el espacio de las observaciones: mezclas gaussianas y análisis de componentes independientes. Los métodos propuestos han sido probados con datos simulados y con datos reales en algunas aplicaciones biomédicas seleccionadas. Se demuestra que pueden obtenerse mejores estimaciones de la matriz Laplaciana con los nuevos métodos propuestos en los casos en que la Gaussianidad no es una suposición correcta. También se ha considerado la generación de señales sintéticas en escenarios donde la escasez de señales reales puede ser un problema. Los modelos sobre grafos permiten modelos de dependencia por pares más generales entre muestras de señal. Así, se propone un nuevo método basado en la Transformada de Fourier Compleja sobre Grafos y en el concepto de subrogación. Se ha aplicado en el desafiante problema del reconocimiento de gestos con las manos. Se ha demostrado que la extensión del conjunto de entrenamiento original con réplicas sustitutas generadas con los métodos sobre grafos, mejora significativamente la precisión del clasificador de gestos con las manos. / [CAT] El processament de senyal sobre grafs és un camp emergent de tècniques que combinen conceptes de dues àrees molt consolidades: el processament de senyal i la teoria de grafs. Des de la perspectiva del processament de senyal pot obtindre's una definició del senyal molt més general assignant cada valor de la mateixa a un vèrtex d'un graf. Els senyals convencionals poden considerar-se casos particulars en els quals els valors de la mostra s'assignen a una quadrícula uniforme (temporal o espacial). Des de la perspectiva de la teoria de grafs, es poden definir noves transformacions del graf de manera que s'estenguen els conceptes clàssics del processament del senyal com el filtrat, la predicció i l'anàlisi espectral. A més, el processament de senyals sobre grafs està trobant noves aplicacions en les àrees de detecció i classificació a causa de la seua flexibilitat per a modelar dependències generals entre variables. En aquesta tesi es donen noves contribucions al processament de senyals sobre grafs. En primer lloc, es planteja el problema d'estimació de la matriu Laplaciana associada a un graf, que determina la relació entre nodes. Els mètodes convencionals es basen en la matriu de precisió, on s'assumeix implícitament la gaussianitat. En aquesta tesi es proposen nous mètodes per a estimar la matriu Laplaciana a partir de les correlacions parcials assumint respectivament dos models no gaussians diferents en l'espai d'observació: mescles gaussianes i anàlisis de components independents. Els mètodes proposats han sigut provats amb dades simulades i amb dades reals en algunes aplicacions biomèdiques seleccionades. Es demostra que poden obtindre's millors estimacions de la matriu Laplaciana amb els nous mètodes proposats en els casos en què la gaussianitat no és una suposició correcta. També s'ha considerat el problema de generar senyals sintètics en escenaris on l'escassetat de senyals reals pot ser un problema. Els models sobre grafs permeten models de dependència per parells més generals entre mostres de senyal. Així, es proposa un nou mètode basat en la Transformada de Fourier Complexa sobre Grafs i en el concepte de subrogació. S'ha aplicat en el desafiador problema del reconeixement de gestos amb les mans. S'ha demostrat que l'extensió del conjunt d'entrenament original amb rèpliques substitutes generades amb mètodes sobre grafs, millora significativament la precisió del classificador de gestos amb les mans. / [EN] Graph signal processing appears as an emerging field of techniques that combine concepts from two highly consolidated areas: signal processing and graph theory. From the perspective of signal processing, it is possible to achieve a more general signal definition by assigning each value of the signal to a vertex of a graph. Conventional signals can be considered particular cases where the sample values are assigned to a uniform (temporal or spatial) grid. From the perspective of graph theory, new transformations of the graph can be defined in such a way that they extend the classical concepts of signal processing such as filtering, prediction and spectral analysis. Furthermore, graph signal processing is finding new applications in detection and classification areas due to its flexibility to model general dependencies between variables. In this thesis, new contributions are given to graph signal processing. Firstly, it is considered the problem of estimating the Laplacian matrix associated with a graph, which determines the relationship between nodes. Conventional methods are based on the precision matrix, where Gaussianity is implicitly assumed. In this thesis, new methods to estimate the Laplacian matrix from the partial correlations are proposed respectively assuming two different non-Gaussian models in the observation space: Gaussian Mixtures and Independent Component Analysis. The proposed methods have been tested with simulated data and with real data in some selected biomedical applications. It is demonstrate that better estimates of the Laplacian matrix can be obtained with the new proposed methods in cases where Gaussianity is not a correct assumption. The problem of generating synthetic signal in scenarios where real signals scarcity can be an issue has also been considered. Graph models allow more general pairwise dependence models between signal samples. Thus a new method based on the Complex Graph Fourier Transform and on the concept of subrogation is proposed. It has been applied in the challenging problem of hand gesture recognition. It has been demonstrated that extending the original training set with graph surrogate replicas, significantly improves the accuracy of the hand gesture classifier. / Belda Valls, J. (2022). Nuevas contribuciones a la teoría y aplicación del procesado de señal sobre grafos [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/191333 Signal processing on graph Graph theory Partial correlation Laplacian matrix Independent component analysis (ICA) Subrogation algorithms Gaussian mixture model Teoría de grafos Procesado de señal sobre grafo Correlación parcial Matriz Laplaciana Análisis de componentes independientes Algoritmos de subrogación Modelo de mezclas gaussianas TEORÍA DE LA SEÑAL Y COMUNICACIONES
76	Speaker Diarization System for Call-center data Li, Yi January 2020 (has links) To answer the question who spoke when, speaker diarization (SD) is a critical step for many speech applications in practice. The task of our project is building a MFCC-vector based speaker diarization system on top of a speaker verification system (SV), which is an existing Call-centers application to check the customer’s identity from a phone call. Our speaker diarization system uses 13-Dimensional MFCCs as Features, performs Voice Active Detection (VAD), segmentation, Linear Clustering and the Hierarchical Clustering based on GMM and the BIC score. By applying it, we decrease the Equal Error Rate (EER) of the SV from 18.1% in the baseline experiment to 3.26% on the general call-center conversations. To better analyze and evaluate the system, we also simulated a set of call-center data based on the public audio databases ICSI corpus. / För att svara på frågan vem som talade när är högtalardarisering (SD) ett kritiskt steg för många talapplikationer i praktiken. Uppdraget med vårt projekt är att bygga ett MFCC-vektorbaserat högtalar-diariseringssystem ovanpå ett högtalarverifieringssystem (SV), som är ett befintligt Call-center-program för att kontrollera kundens identitet från ett telefonsamtal. Vårt högtalarsystem använder 13-dimensionella MFCC: er som funktioner, utför Voice Active Detection (VAD), segmentering, linjär gruppering och hierarkisk gruppering baserat på GMM och BIC-poäng. Genom att tillämpa den minskar vi EER (Equal Error Rate) från 18,1 % i baslinjeexperimentet till 3,26 % för de allmänna samtalscentret. För att bättre analysera och utvärdera systemet simulerade vi också en uppsättning callcenter-data baserat på de offentliga ljuddatabaserna ICSI corpus. MFCC-vector Speaker Diarization Speaker Verification Voice Active Detection Gaussian Mixture Model Hierarchy Clustering MFCC-vektor Högtalardarisering Högtalarverifiering Röstaktiv detektering Gaussisk blandningsmodell Hierarkikluster Elektroteknik och elektronik
77	Automatic Speech Quality Assessment in Unified Communication : A Case Study / Automatisk utvärdering av samtalskvalitet inom integrerad kommunikation : en fallstudie Larsson Alm, Kevin January 2019 (has links) Speech as a medium for communication has always been important in its ability to convey our ideas, personality and emotions. It is therefore not strange that Quality of Experience (QoE) becomes central to any business relying on voice communication. Using Unified Communication (UC) systems, users can communicate with each other in several ways using many different devices, making QoE an important aspect for such systems. For this thesis, automatic methods for assessing speech quality of the voice calls in Briteback’s UC application is studied, including a comparison of the researched methods. Three methods all using a Gaussian Mixture Model (GMM) as a regressor, paired with extraction of Human Factor Cepstral Coefficients (HFCC), Gammatone Frequency Cepstral Coefficients (GFCC) and Modified Mel Frequency Cepstrum Coefficients (MMFCC) features respectively is studied. The method based on HFCC feature extraction shows better performance in general compared to the two other methods, but all methods show comparatively low performance compared to literature. This most likely stems from implementation errors, showing the difference between theory and practice in the literature, together with the lack of reference implementations. Further work with practical aspects in mind, such as reference implementations or verification tools can make the field more popular and increase its use in the real world. speech voice communication qoe quality of experience unified communication uc speech quality assessment speech quality voice calls gaussian mixture model gmm gaussian mixture regression gmr mel frequency cepstrum coefficients mfcc human feature cepstrum coefficients hfcc gfcc Software Engineering Programvaruteknik
78	Speaker adaptation of deep neural network acoustic models using Gaussian mixture model framework in automatic speech recognition systems / Utilisation de modèles gaussiens pour l'adaptation au locuteur de réseaux de neurones profonds dans un contexte de modélisation acoustique pour la reconnaissance de la parole Tomashenko, Natalia 01 December 2017 (has links) Les différences entre conditions d'apprentissage et conditions de test peuvent considérablement dégrader la qualité des transcriptions produites par un système de reconnaissance automatique de la parole (RAP). L'adaptation est un moyen efficace pour réduire l'inadéquation entre les modèles du système et les données liées à un locuteur ou un canal acoustique particulier. Il existe deux types dominants de modèles acoustiques utilisés en RAP : les modèles de mélanges gaussiens (GMM) et les réseaux de neurones profonds (DNN). L'approche par modèles de Markov cachés (HMM) combinés à des GMM (GMM-HMM) a été l'une des techniques les plus utilisées dans les systèmes de RAP pendant de nombreuses décennies. Plusieurs techniques d'adaptation ont été développées pour ce type de modèles. Les modèles acoustiques combinant HMM et DNN (DNN-HMM) ont récemment permis de grandes avancées et surpassé les modèles GMM-HMM pour diverses tâches de RAP, mais l'adaptation au locuteur reste très difficile pour les modèles DNN-HMM. L'objectif principal de cette thèse est de développer une méthode de transfert efficace des algorithmes d'adaptation des modèles GMM aux modèles DNN. Une nouvelle approche pour l'adaptation au locuteur des modèles acoustiques de type DNN est proposée et étudiée : elle s'appuie sur l'utilisation de fonctions dérivées de GMM comme entrée d'un DNN. La technique proposée fournit un cadre général pour le transfert des algorithmes d'adaptation développés pour les GMM à l'adaptation des DNN. Elle est étudiée pour différents systèmes de RAP à l'état de l'art et s'avère efficace par rapport à d'autres techniques d'adaptation au locuteur, ainsi que complémentaire. / Differences between training and testing conditions may significantly degrade recognition accuracy in automatic speech recognition (ASR) systems. Adaptation is an efficient way to reduce the mismatch between models and data from a particular speaker or channel. There are two dominant types of acoustic models (AMs) used in ASR: Gaussian mixture models (GMMs) and deep neural networks (DNNs). The GMM hidden Markov model (GMM-HMM) approach has been one of the most common technique in ASR systems for many decades. Speaker adaptation is very effective for these AMs and various adaptation techniques have been developed for them. On the other hand, DNN-HMM AMs have recently achieved big advances and outperformed GMM-HMM models for various ASR tasks. However, speaker adaptation is still very challenging for these AMs. Many adaptation algorithms that work well for GMMs systems cannot be easily applied to DNNs because of the different nature of these models. The main purpose of this thesis is to develop a method for efficient transfer of adaptation algorithms from the GMM framework to DNN models. A novel approach for speaker adaptation of DNN AMs is proposed and investigated. The idea of this approach is based on using so-called GMM-derived features as input to a DNN. The proposed technique provides a general framework for transferring adaptation algorithms, developed for GMMs, to DNN adaptation. It is explored for various state-of-the-art ASR systems and is shown to be effective in comparison with other speaker adaptation techniques and complementary to them. Adaptation au locuteur Réseaux de neurones profonds Modèles de mélanges Gaussiens (GMM) Modèles acoustiques Apprentissage profond Speaker adaptation Speaker adaptive training Deep neural network (DNN) Gaussian mixture model (GMM) GMM-derived (GMMD) features Automatic speech recognition (ASR) Acoustic models Deep learning 006.454
79	Rate-Distortion Performance And Complexity Optimized Structured Vector Quantization Chatterjee, Saikat 07 1900 (has links) Although vector quantization (VQ) is an established topic in communication, its practical utility has been limited due to (i) prohibitive complexity for higher quality and bit-rate, (ii) structured VQ methods which are not analyzed for optimum performance, (iii) difficulty of mapping theoretical performance of mean square error (MSE) to perceptual measures. However, an ever increasing demand for various source signal compression, points to VQ as the inevitable choice for high efficiency. This thesis addresses all the three above issues, utilizing the power of parametric stochastic modeling of the signal source, viz., Gaussian mixture model (GMM) and proposes new solutions. Addressing some of the new requirements of source coding in network applications, the thesis also presents solutions for scalable bit-rate, rate-independent complexity and decoder scalability. While structured VQ is a necessity to reduce the complexity, we have developed, analyzed and compared three different schemes of compensation for the loss due to structured VQ. Focusing on the widely used methods of split VQ (SVQ) and KLT based transform domain scalar quantization (TrSQ), we develop expressions for their optimum performance using high rate quantization theory. We propose the use of conditional PDF based SVQ (CSVQ) to compensate for the split loss in SVQ and analytically show that it achieves coding gain over SVQ. Using the analytical expressions of complexity, an algorithm to choose the optimum splits is proposed. We analyze these techniques for their complexity as well as perceptual distortion measure, considering the specific case of quantizing the wide band speech line spectrum frequency (LSF) parameters. Using natural speech data, it is shown that the new conditional PDF based methods provide better perceptual distortion performance than the traditional methods. Exploring the use of GMMs for the source, we take the approach of separately estimating the GMM parameters and then use the high rate quantization theory in a simplified manner to derive closed form expressions for optimum MSE performance. This has led to the development of non-linear prediction for compensating the split loss (in contrast to the linear prediction using a Gaussian model). We show that the GMM approach can improve the recently proposed adaptive VQ scheme of switched SVQ (SSVQ). We derive the optimum performance expressions for SSVQ, in both variable bit rate and fixed bit rate formats, using the simplified approach of GMM in high rate theory. As a third scheme for recovering the split loss in SVQ and reduce the complexity, we propose a two stage SVQ (TsSVQ), which is analyzed for minimum complexity as well as perceptual distortion. Utilizing the low complexity of transform domain SVQ (TrSVQ) as well as the two stage approach in a universal coding framework, it is shown that we can achieve low complexity as well as better performance than SSVQ. Further, the combination of GMM and universal coding led to the development of a highly scalable coder which can provide both bit-rate scalability, decoder scalability and rate-independent low complexity. Also, the perceptual distortion performance is comparable to that of SSVQ. Since GMM is a generic source model, we develop a new method of predicting the performance bound for perceptual distortion using VQ. Applying this method to LSF quantization, the minimum bit rates for quantizing telephone band LSF (TB-LSF) and wideband LSF (WB-LSF) are derived. Vector Analysis Quantization Theory Split Vector Quantization (SVQ) LSF Parameter Quantization Structured Quantization Vector Quantization - Stochastic Models Gaussian Mixture Model (GMM) Line Spectrum Frequency Coding Vector Quantization (VQ) Switched Quantization Speech Spectrum Quantization LSF Coding Split VQ Conditional PDF Communication Engineering
80	Estimation du taux d'erreurs binaires pour n'importe quel système de communication numérique DONG, Jia 18 December 2013 (has links) (PDF) This thesis is related to the Bit Error Rate (BER) estimation for any digital communication system. In many designs of communication systems, the BER is a Key Performance Indicator (KPI). The popular Monte-Carlo (MC) simulation technique is well suited to any system but at the expense of long time simulations when dealing with very low error rates. In this thesis, we propose to estimate the BER by using the Probability Density Function (PDF) estimation of the soft observations of the received bits. First, we have studied a non-parametric PDF estimation technique named the Kernel method. Simulation results in the context of several digital communication systems are proposed. Compared with the conventional MC method, the proposed Kernel-based estimator provides good precision even for high SNR with very limited number of data samples. Second, the Gaussian Mixture Model (GMM), which is a semi-parametric PDF estimation technique, is used to estimate the BER. Compared with the Kernel-based estimator, the GMM method provides better performance in the sense of minimum variance of the estimator. Finally, we have investigated the blind estimation of the BER, which is the estimation when the sent data are unknown. We denote this case as unsupervised BER estimation. The Stochastic Expectation-Maximization (SEM) algorithm combined with the Kernel or GMM PDF estimation methods has been used to solve this issue. By analyzing the simulation results, we show that the obtained BER estimate can be very close to the real values. This is quite promising since it could enable real-time BER estimation on the receiver side without decreasing the user bit rate with pilot symbols for example. BER estimation Monte-Carlo simulation Probability Density Function Soft observations Kernel method Gaussian Mixture Model

Search results