Global ETD Search

221	Modèles aléatoires harmoniques pour les signaux électroencéphalographiques Villaron, Emilie 25 June 2012 (has links) Cette thèse s'inscrit dans le contexte de l'analyse des signaux biomédicaux multicapteurs par des méthodes stochastiques. Les signaux auxquels nous nous intéressons présentent un caractère oscillant transitoire bien représenté par les décompositions dans le plan temps-fréquence c'est pourquoi nous avons choisi de considérer non plus les décours temporels de ces signaux mais les coefficients issus de la décomposition de ces derniers dans le plan temps-fréquence. Dans une première partie, nous décomposons les signaux multicapteurs sur une base de cosinus locaux (appelée base MDCT) et nous modélisons les coefficients à l'aide d'un modèle à états latents. Les coefficients sont considérés comme les réalisations de processus aléatoires gaussiens multivariés dont la distribution est gouvernée par une chaîne de Markov cachée. Nous présentons les algorithmes classiques liés à l'utilisation des modèles de Markov caché et nous proposons une extension dans le cas où les matrices de covariance sont factorisées sous forme d'un produit de Kronecker. Cette modélisation permet de diminuer la complexité des méthodes de calcul numérique utilisées tout en stabilisant les algorithmes associés. Nous appliquons ces modèles à des données électroencéphalographiques et nous montrons que les matrices de covariance représentant les corrélations entre les capteurs et les fréquences apportent des informations pertinentes sur les signaux analysés. Ceci est notamment illustré par un cas d'étude sur la caractérisation de la désynchronisation des ondes alpha dans le contexte de la sclérose en plaques. / This thesis adresses the problem of multichannel biomedical signals analysis using stochastic methods. EEG signals exhibit specific features that are both time and frequency localized, which motivates the use of time-frequency signal representations. In this document the (time-frequency labelled) coefficients are modelled as multivariate random variables. In the first part of this work, multichannel signals are expanded using a local cosine basis (called MDCT basis). The approach we propose models the distribution of time-frequency coefficients (here MDCT coefficients) in terms of latent variables by the use of a hidden Markov model. In the framework of application to EEG signals, the latent variables describe some hidden mental state of the subject. The latter control the covariance matrices of Gaussian vectors of fixed-time vectors of multi-channel, multi-frequency, MDCT coefficients. After presenting classical algorithms to estimate the parameters, we define a new model in which the (space-frequency) covariance matrices are expanded as tensor products (also named Kronecker products) of frequency and channels matrices. Inference for the proposed model is developped and yields estimates for the model parameters, together with maximum likelihood estimates for the sequences of latent variables. The model is applied to electroencephalogram data, and it is shown that variance-covariance matrices labelled by sensor and frequency indices can yield relevant informations on the analyzed signals. This is illustrated with a case study, namely the detection of alpha waves in rest EEG for multiple sclerosis patients and control subjects. Représentations temps-fréquence Modèles probabilistes de mélange Electroencéphalographie Modèles de Markov caché Covariance spatio-fréquentielle Produit de Kronecker Activité Alpha Débruitage Algorithmes MM Dictionnaires invariants Time-frequency representations Stochastic mixture model Eeg Hidden Markov Model Space-frequency covariance Kronecker product Alpha waves Denoising MM algorithms Translation-invariant dictionaries
222	Automatic Speech Quality Assessment in Unified Communication : A Case Study / Automatisk utvärdering av samtalskvalitet inom integrerad kommunikation : en fallstudie Larsson Alm, Kevin January 2019 (has links) Speech as a medium for communication has always been important in its ability to convey our ideas, personality and emotions. It is therefore not strange that Quality of Experience (QoE) becomes central to any business relying on voice communication. Using Unified Communication (UC) systems, users can communicate with each other in several ways using many different devices, making QoE an important aspect for such systems. For this thesis, automatic methods for assessing speech quality of the voice calls in Briteback’s UC application is studied, including a comparison of the researched methods. Three methods all using a Gaussian Mixture Model (GMM) as a regressor, paired with extraction of Human Factor Cepstral Coefficients (HFCC), Gammatone Frequency Cepstral Coefficients (GFCC) and Modified Mel Frequency Cepstrum Coefficients (MMFCC) features respectively is studied. The method based on HFCC feature extraction shows better performance in general compared to the two other methods, but all methods show comparatively low performance compared to literature. This most likely stems from implementation errors, showing the difference between theory and practice in the literature, together with the lack of reference implementations. Further work with practical aspects in mind, such as reference implementations or verification tools can make the field more popular and increase its use in the real world. speech voice communication qoe quality of experience unified communication uc speech quality assessment speech quality voice calls gaussian mixture model gmm gaussian mixture regression gmr mel frequency cepstrum coefficients mfcc human feature cepstrum coefficients hfcc gfcc Software Engineering Programvaruteknik
223	Abundância de aves de rapina no Cerrado e Pantanal do Mato Grosso do Sul e os efeitos da degradação de hábitat: perspectivas com métodos baseados na detectabilidade / Raptor abundance in the Brazilian Cerrado and Pantanal: insights from detection-based methods Dénes, Francisco Voeroes 12 September 2014 (has links) A urbanização e a expansão das fronteiras agrícolas na região Neotropical estão entre as principais forças causadoras da degradação ambiental em hábitats abertos naturais. Inferências e estimativas de abundância são críticas para quantificação de dinâmicas populacionais e impactos de mudanças ambientais. Contudo, a detecção imperfeita e outros fenômenos que causam inflação de zeros podem induzir erros de estimativas e dificultar a identificação de padrões ecológicos. Examinamos como a consideração desses fenômenos em dados de contagens de indivíduos não marcados pode informar na escolha do método apropriado para estimativas populacionais. Revisamos métodos estabelecidos (modelos lineares generalizados [GLMs] e amostragem de distância [distance sampling]) e emergentes que usam modelos hierárquicos baseados em misturas (N-mixture; modelo de Royle-Nichols [RN], e N-mixture básico, zero inflacionado, espacialmente explicito, visita única, e multiespécies) para estimar a abundância de populações não marcadas. Como estudo de caso, aplicamos o método N-mixture baseado em visitas únicas para modelar dados de contagens de aves de rapina em estradas e investigar como transformações de habitat no Cerrado e Pantanal do Mato Grosso do Sul afetaram as populações de 12 espécies em uma escala regional (>300.000 km2). Os métodos diferem nos pré-requisitos de desenho amostral, e a sua adequabilidade depender da espécie em questão, da escala e objetivos do estudo, e considerações financeiras e logísticas, que devem ser avaliados para que verbas, tempo e esforço sejam utilizados com eficiência. No estudo de caso, a detecção de todas as espécies foi influenciada pela horário de amostragem, com efeitos congruentes com expectativas baseadas no comportamentos de forregeamento e de voo. A vegetação fechada e carcaças também influenciaram a detecção de algumas espécies. A abundância da maioria das espécies foi negativamente influenciada pela conversão de habitats naturais para antrópicos, particularmente pastagens e plantações de soja e cana-de-açúcar, até mesmo para espécies generalistas consideradas como indicadores ruins da qualidade de hábitats. A proteção dos hábitats naturais remanescentes é essencial para prevenir um declínio ainda maior das populações de aves de rapina na área de estudo, especialmente no domínio do Cerrado / Urbanization and the expansion of agricultural frontiers are among the main forces driving the degradation of natural habitats in Neotropical open habitats. Inference and estimates of abundance are critical for quantifying population dynamics and the impacts of environmental change. Yet imperfect detection and other phenomena that cause zero inflation can induce estimation error and obscure ecological patterns. We examine how detection error and zero-inflation in count data of unmarked individuals inform the choice of analytical method for estimating population size. We review established (GLMs and distance sampling) and emerging methods that use N-mixture models (Royle-Nichols model, and basic, zero-inflated, temporary emigration, beta-binomial, generalized open-population, spatially explicit, single-visit and multispecies) to estimate abundance of unmarked populations. As a case study, we employed a single visit N-mixture approach to model roadside raptor count data and investigate how land-use transformations in the Cerrado and Pantanal domains in Brazil have affected the populations of 12 species on a regional scale (>300,000 km2). Methods differ in sampling design requirements, and their suitability will depend on the study species, scale and objectives of the study, and financial and logistical considerations, which should be evaluated to use funds, time and effort efficiently. In the case study, detection of all species was influenced by time of day, with effects that follow expectations based on foraging and flying behavior. Closed vegetation on and carcasses found during surveys also influenced detection of some species. Abundance of most species was negatively influenced by conversion of natural Cerrado and Pantanal habitats to anthropogenic uses, particularly pastures, soybean and sugar cane plantations, even for generalist species usually considered poor habitat-quality indicators. Protection of the remaining natural habitats is essential to prevent further decline of raptor populations in the study area, especially in the Cerrado domain Abundance estimation Aves de rapina Cerrado Cerrado Count data Dados de contagens Detecção Distance sampling Distance sampling Estimativa de abundância Hierarchial model Inflação de zeros Modelo de visita única Modelo hierárquico Modelo N-mixture N-mixture model Pantanal Pantanal Population size Raptor Single visit model Tamanho populacional Zero inflation
224	Speaker adaptation of deep neural network acoustic models using Gaussian mixture model framework in automatic speech recognition systems / Utilisation de modèles gaussiens pour l'adaptation au locuteur de réseaux de neurones profonds dans un contexte de modélisation acoustique pour la reconnaissance de la parole Tomashenko, Natalia 01 December 2017 (has links) Les différences entre conditions d'apprentissage et conditions de test peuvent considérablement dégrader la qualité des transcriptions produites par un système de reconnaissance automatique de la parole (RAP). L'adaptation est un moyen efficace pour réduire l'inadéquation entre les modèles du système et les données liées à un locuteur ou un canal acoustique particulier. Il existe deux types dominants de modèles acoustiques utilisés en RAP : les modèles de mélanges gaussiens (GMM) et les réseaux de neurones profonds (DNN). L'approche par modèles de Markov cachés (HMM) combinés à des GMM (GMM-HMM) a été l'une des techniques les plus utilisées dans les systèmes de RAP pendant de nombreuses décennies. Plusieurs techniques d'adaptation ont été développées pour ce type de modèles. Les modèles acoustiques combinant HMM et DNN (DNN-HMM) ont récemment permis de grandes avancées et surpassé les modèles GMM-HMM pour diverses tâches de RAP, mais l'adaptation au locuteur reste très difficile pour les modèles DNN-HMM. L'objectif principal de cette thèse est de développer une méthode de transfert efficace des algorithmes d'adaptation des modèles GMM aux modèles DNN. Une nouvelle approche pour l'adaptation au locuteur des modèles acoustiques de type DNN est proposée et étudiée : elle s'appuie sur l'utilisation de fonctions dérivées de GMM comme entrée d'un DNN. La technique proposée fournit un cadre général pour le transfert des algorithmes d'adaptation développés pour les GMM à l'adaptation des DNN. Elle est étudiée pour différents systèmes de RAP à l'état de l'art et s'avère efficace par rapport à d'autres techniques d'adaptation au locuteur, ainsi que complémentaire. / Differences between training and testing conditions may significantly degrade recognition accuracy in automatic speech recognition (ASR) systems. Adaptation is an efficient way to reduce the mismatch between models and data from a particular speaker or channel. There are two dominant types of acoustic models (AMs) used in ASR: Gaussian mixture models (GMMs) and deep neural networks (DNNs). The GMM hidden Markov model (GMM-HMM) approach has been one of the most common technique in ASR systems for many decades. Speaker adaptation is very effective for these AMs and various adaptation techniques have been developed for them. On the other hand, DNN-HMM AMs have recently achieved big advances and outperformed GMM-HMM models for various ASR tasks. However, speaker adaptation is still very challenging for these AMs. Many adaptation algorithms that work well for GMMs systems cannot be easily applied to DNNs because of the different nature of these models. The main purpose of this thesis is to develop a method for efficient transfer of adaptation algorithms from the GMM framework to DNN models. A novel approach for speaker adaptation of DNN AMs is proposed and investigated. The idea of this approach is based on using so-called GMM-derived features as input to a DNN. The proposed technique provides a general framework for transferring adaptation algorithms, developed for GMMs, to DNN adaptation. It is explored for various state-of-the-art ASR systems and is shown to be effective in comparison with other speaker adaptation techniques and complementary to them. Adaptation au locuteur Réseaux de neurones profonds Modèles de mélanges Gaussiens (GMM) Modèles acoustiques Apprentissage profond Speaker adaptation Speaker adaptive training Deep neural network (DNN) Gaussian mixture model (GMM) GMM-derived (GMMD) features Automatic speech recognition (ASR) Acoustic models Deep learning 006.454
225	參數模型與取樣差異於退休金財務評價之研究 / Parametric Statistical Model and Selection Bias in Pension Valuation : The Case of Taiwan Public Employees Retirement System 陳宏仁, Chen, Hung-Jen Unknown Date (has links) 確定給付制的退休金計畫，退休金成本提存的適當與否，關係到基金長期的財務健全及未來員工權益的保障，而我國公務人員退撫基金關係到廣大公務人員的權益，也影響到政府的財政支出，所以對公務人員退撫基金更有精算的必要，以確保提撥率之適當而不至於對政府財政增加額外負擔。本論文從人口面的角度出發，以我國公務人員退休撫卹基金為實證分析之研究對象，探討人口面的假設對於公務人員退撫基金提撥率，未來各項給付支出的影響，包括從經驗資料中取樣，探討大小不同的樣本建立之服職表，於計算提撥率的差異，並利用混成模型建立新進成員假設，以開放團體模擬基金成員結構，在某些固定假設之下，模擬未來五十年的基金資產與現金流量情況。根據本研究結果指出，利用不同取樣所建構的服職表，計算出之提撥率差異甚大，顯示小型的退休金計畫並不適宜以自身的經驗資料作為精算評價的基礎。另外，以常態分佈的混成模型建立公務人員新進假設，在人數設限成員群體的假設下作開放團體模擬的結果，顯示公務人員年齡結構在未來有逐漸老化的趨勢，在本文所採的假設下，基金資產將先增後減而於民國121年破產。在現行的公務人員退休撫卹制度下，要避免基金破產之情況發生，唯有提高提撥率、提高基金資產報酬率、或壓低薪資成長率。第一章緒論第一節研究動機與目的第二節研究範圍與限制第三節研究架構與內容第二章退休金精算考慮之因素第一節退休基金精算系統的概念及文獻回顧第二節精算假設第三節精算成本法第三章基金成員結構分析的理論基礎第一節服職表的編製壹、模型建立貳、修勻方法參、程式演算過程第二節混成參數模型的建構第三節基金成員新進參數模型的建立第四節基金成員新進、脫退隨機過程第四章公務人員退撫基金精算模擬第一節公務人員退撫基金給付規定第二節公務人員退撫基金精算評價系統簡介第三節公務人員退撫基金精算評價之實證壹、取樣差異對於提撥率的影響貳、開放團體模擬基金成員結構和財務預估第五章結論與建議第一節結論第二節對後續研究的建議附錄A：估計粗脫率之程式附錄B：修勻程式(Whittaker法) 附錄C：估計常態混成模型參數之程式附錄D：公務人員新進成員年齡、職等分佈模擬之程式附錄E-1：服職表1 附錄E-2：服職表2 附錄E-3：服職表3 附錄E-4：服職表4 附錄E-5：服職表5 附錄E-6：服職表6 附錄E-7：服職表7 / The adequacy of the plan contribution for a defined benefit pension scheme is directly related to its financial soundness and the plan member’s benefits. Due to uncertainty of the plan’s turnover, the service table plays an important role in actuarial valuation and cash flow projection. In this study, Taiwan public employees retirement system is studied to monitor the solvency issue due to bias in selecting the service tables. Tai-PERS is designed to provide retirement and ancillary benefits to 271,215 government employees. Its financial soundness is especially vital to the government annual balance. The plan contribution and projected cash flows of Tai-PERS are investigated using various sampling results. The distribution of the new entrants is assumed to follow the mixture model to describe the recruiting results. Then dynamic simulations under the open group assumption are performed to predict the future fund assets and cash flows. The results show significant differences in employing various service tables. Hence selecting proper demographic assumptions is particular important in pension valuation. Under our approach, the workforce of Tai-PERS is aging given the current plan population. Based on the given scenario, the projected plan assets increase and then decrease to be insolvent in 2032. Some interesting results are also discussed. 退休金精算評價精算成本法服職表開放團體混成模型公務人員退休撫卹基金 pension valuation actuarial cost method service table open group mixture model Tai-PERS
226	Rate-Distortion Performance And Complexity Optimized Structured Vector Quantization Chatterjee, Saikat 07 1900 (has links) Although vector quantization (VQ) is an established topic in communication, its practical utility has been limited due to (i) prohibitive complexity for higher quality and bit-rate, (ii) structured VQ methods which are not analyzed for optimum performance, (iii) difficulty of mapping theoretical performance of mean square error (MSE) to perceptual measures. However, an ever increasing demand for various source signal compression, points to VQ as the inevitable choice for high efficiency. This thesis addresses all the three above issues, utilizing the power of parametric stochastic modeling of the signal source, viz., Gaussian mixture model (GMM) and proposes new solutions. Addressing some of the new requirements of source coding in network applications, the thesis also presents solutions for scalable bit-rate, rate-independent complexity and decoder scalability. While structured VQ is a necessity to reduce the complexity, we have developed, analyzed and compared three different schemes of compensation for the loss due to structured VQ. Focusing on the widely used methods of split VQ (SVQ) and KLT based transform domain scalar quantization (TrSQ), we develop expressions for their optimum performance using high rate quantization theory. We propose the use of conditional PDF based SVQ (CSVQ) to compensate for the split loss in SVQ and analytically show that it achieves coding gain over SVQ. Using the analytical expressions of complexity, an algorithm to choose the optimum splits is proposed. We analyze these techniques for their complexity as well as perceptual distortion measure, considering the specific case of quantizing the wide band speech line spectrum frequency (LSF) parameters. Using natural speech data, it is shown that the new conditional PDF based methods provide better perceptual distortion performance than the traditional methods. Exploring the use of GMMs for the source, we take the approach of separately estimating the GMM parameters and then use the high rate quantization theory in a simplified manner to derive closed form expressions for optimum MSE performance. This has led to the development of non-linear prediction for compensating the split loss (in contrast to the linear prediction using a Gaussian model). We show that the GMM approach can improve the recently proposed adaptive VQ scheme of switched SVQ (SSVQ). We derive the optimum performance expressions for SSVQ, in both variable bit rate and fixed bit rate formats, using the simplified approach of GMM in high rate theory. As a third scheme for recovering the split loss in SVQ and reduce the complexity, we propose a two stage SVQ (TsSVQ), which is analyzed for minimum complexity as well as perceptual distortion. Utilizing the low complexity of transform domain SVQ (TrSVQ) as well as the two stage approach in a universal coding framework, it is shown that we can achieve low complexity as well as better performance than SSVQ. Further, the combination of GMM and universal coding led to the development of a highly scalable coder which can provide both bit-rate scalability, decoder scalability and rate-independent low complexity. Also, the perceptual distortion performance is comparable to that of SSVQ. Since GMM is a generic source model, we develop a new method of predicting the performance bound for perceptual distortion using VQ. Applying this method to LSF quantization, the minimum bit rates for quantizing telephone band LSF (TB-LSF) and wideband LSF (WB-LSF) are derived. Vector Analysis Quantization Theory Split Vector Quantization (SVQ) LSF Parameter Quantization Structured Quantization Vector Quantization - Stochastic Models Gaussian Mixture Model (GMM) Line Spectrum Frequency Coding Vector Quantization (VQ) Switched Quantization Speech Spectrum Quantization LSF Coding Split VQ Conditional PDF Communication Engineering
227	Estimation du taux d'erreurs binaires pour n'importe quel système de communication numérique DONG, Jia 18 December 2013 (has links) (PDF) This thesis is related to the Bit Error Rate (BER) estimation for any digital communication system. In many designs of communication systems, the BER is a Key Performance Indicator (KPI). The popular Monte-Carlo (MC) simulation technique is well suited to any system but at the expense of long time simulations when dealing with very low error rates. In this thesis, we propose to estimate the BER by using the Probability Density Function (PDF) estimation of the soft observations of the received bits. First, we have studied a non-parametric PDF estimation technique named the Kernel method. Simulation results in the context of several digital communication systems are proposed. Compared with the conventional MC method, the proposed Kernel-based estimator provides good precision even for high SNR with very limited number of data samples. Second, the Gaussian Mixture Model (GMM), which is a semi-parametric PDF estimation technique, is used to estimate the BER. Compared with the Kernel-based estimator, the GMM method provides better performance in the sense of minimum variance of the estimator. Finally, we have investigated the blind estimation of the BER, which is the estimation when the sent data are unknown. We denote this case as unsupervised BER estimation. The Stochastic Expectation-Maximization (SEM) algorithm combined with the Kernel or GMM PDF estimation methods has been used to solve this issue. By analyzing the simulation results, we show that the obtained BER estimate can be very close to the real values. This is quite promising since it could enable real-time BER estimation on the receiver side without decreasing the user bit rate with pilot symbols for example. BER estimation Monte-Carlo simulation Probability Density Function Soft observations Kernel method Gaussian Mixture Model
228	Non-Parametric Clustering of Multivariate Count Data Tekumalla, Lavanya Sita January 2017 (has links) (PDF) The focus of this thesis is models for non-parametric clustering of multivariate count data. While there has been significant work in Bayesian non-parametric modelling in the last decade, in the context of mixture models for real-valued data and some forms of discrete data such as multinomial-mixtures, there has been much less work on non-parametric clustering of Multi-variate Count Data. The main challenges in clustering multivariate counts include choosing a suitable multivariate distribution that adequately captures the properties of the data, for instance handling over-dispersed data or sparse multivariate data, at the same time leveraging the inherent dependency structure between dimensions and across instances to get meaningful clusters. As the first contribution, this thesis explores extensions to the Multivariate Poisson distribution, proposing efficient algorithms for non-parametric clustering of multivariate count data. While Poisson is the most popular distribution for count modelling, the Multivariate Poisson often leads to intractable inference and a suboptimal t of the data. To address this, we introduce a family of models based on the Sparse-Multivariate Poisson, that exploit the inherent sparsity in multivariate data, reducing the number of latent variables in the formulation of Multivariate Poisson leading to a better t and more efficient inference. We explore Dirichlet process mixture model extensions and temporal non-parametric extensions to models based on the Sparse Multivariate Poisson for practical use of Poisson based models for non-parametric clustering of multivariate counts in real-world applications. As a second contribution, this thesis addresses moving beyond the limitations of Poisson based models for non-parametric clustering, for instance in handling over dispersed data or data with negative correlations. We explore, for the first time, marginal independent inference techniques based on the Gaussian Copula for multivariate count data in the Dirichlet Process mixture model setting. This enables non-parametric clustering of multivariate counts without limiting assumptions that usually restrict the marginal to belong to a particular family, such as the Poisson or the negative-binomial. This inference technique can also work for mixed data (combination of counts, binary and continuous data) enabling Bayesian non-parametric modelling to be used for a wide variety of data types. As the third contribution, this thesis addresses modelling a wide range of more complex dependencies such as asymmetric and tail dependencies during non-parametric clustering of multivariate count data with Vine Copula based Dirichlet process mixtures. While vine copula inference has been well explored for continuous data, it is still a topic of active research for multivariate counts and mixed multivariate data. Inference for multivariate counts and mixed data is a hard problem owing to ties that arise with discrete marginal. An efficient marginal independent inference approach based on extended rank likelihood, based on recent work in the statistics literature, is proposed in this thesis, extending the use vines for multivariate counts and mixed data in practical clustering scenarios. This thesis also explores the novel systems application of Bulk Cache Preloading by analysing I/O traces though predictive models for temporal non-parametric clustering of multivariate count data. State of the art techniques in the caching domain are limited to exploiting short-range correlations in memory accesses at the milli-second granularity or smaller and cannot leverage long range correlations in traces. We explore for the first time, Bulk Cache Preloading, the process of pro-actively predicting data to load into cache, minutes or hours before the actual request from the application, by leveraging longer range correlation at the granularity of minutes or hours. This enables the development of machine learning techniques tailored for caching due to relaxed timing constraints. Our approach involves a data aggregation process, converting I/O traces into a temporal sequence of multivariate counts, that we analyse with the temporal non-parametric clustering models proposed in this thesis. While the focus of our thesis is models for non-parametric clustering for discrete data, particularly multivariate counts, we also hope our work on bulk cache preloading paves the way to more inter-disciplinary research for using data mining techniques in the systems domain. As an additional contribution, this thesis addresses multi-level non-parametric admixture modelling for discrete data in the form of grouped categorical data, such as document collections. Non-parametric clustering for topic modelling in document collections, where a document is as-associated with an unknown number of semantic themes or topics, is well explored with admixture models such as the Hierarchical Dirichlet Process. However, there exist scenarios, where a doc-ument requires being associated with themes at multiple levels, where each theme is itself an admixture over themes at the previous level, motivating the need for multilevel admixtures. Consider the example of non-parametric entity-topic modelling of simultaneously learning entities and topics from document collections. This can be realized by modelling a document as an admixture over entities while entities could themselves be modeled as admixtures over topics. We propose the nested Hierarchical Dirichlet Process to address this gap and apply a two level version of our model to automatically learn author entities and topics from research corpora. Multivariate Count Data Clustering Mixture Models Non-parametric Clustering Bulk Cache Preloading Dirichlet Process Mixture Models Spatio-Temporal Data Aggregation Sparse Multivariate Poisson MultiVariate Poisson (MVP) Copulas Nested Hierarchical Dirichlet Processes Dirichlet Process Mixtures Sparse-Multivariate Poisson Dirichlet Process Mixture Model Computer Science
229	Abundância de aves de rapina no Cerrado e Pantanal do Mato Grosso do Sul e os efeitos da degradação de hábitat: perspectivas com métodos baseados na detectabilidade / Raptor abundance in the Brazilian Cerrado and Pantanal: insights from detection-based methods Francisco Voeroes Dénes 12 September 2014 (has links) A urbanização e a expansão das fronteiras agrícolas na região Neotropical estão entre as principais forças causadoras da degradação ambiental em hábitats abertos naturais. Inferências e estimativas de abundância são críticas para quantificação de dinâmicas populacionais e impactos de mudanças ambientais. Contudo, a detecção imperfeita e outros fenômenos que causam inflação de zeros podem induzir erros de estimativas e dificultar a identificação de padrões ecológicos. Examinamos como a consideração desses fenômenos em dados de contagens de indivíduos não marcados pode informar na escolha do método apropriado para estimativas populacionais. Revisamos métodos estabelecidos (modelos lineares generalizados [GLMs] e amostragem de distância [distance sampling]) e emergentes que usam modelos hierárquicos baseados em misturas (N-mixture; modelo de Royle-Nichols [RN], e N-mixture básico, zero inflacionado, espacialmente explicito, visita única, e multiespécies) para estimar a abundância de populações não marcadas. Como estudo de caso, aplicamos o método N-mixture baseado em visitas únicas para modelar dados de contagens de aves de rapina em estradas e investigar como transformações de habitat no Cerrado e Pantanal do Mato Grosso do Sul afetaram as populações de 12 espécies em uma escala regional (>300.000 km2). Os métodos diferem nos pré-requisitos de desenho amostral, e a sua adequabilidade depender da espécie em questão, da escala e objetivos do estudo, e considerações financeiras e logísticas, que devem ser avaliados para que verbas, tempo e esforço sejam utilizados com eficiência. No estudo de caso, a detecção de todas as espécies foi influenciada pela horário de amostragem, com efeitos congruentes com expectativas baseadas no comportamentos de forregeamento e de voo. A vegetação fechada e carcaças também influenciaram a detecção de algumas espécies. A abundância da maioria das espécies foi negativamente influenciada pela conversão de habitats naturais para antrópicos, particularmente pastagens e plantações de soja e cana-de-açúcar, até mesmo para espécies generalistas consideradas como indicadores ruins da qualidade de hábitats. A proteção dos hábitats naturais remanescentes é essencial para prevenir um declínio ainda maior das populações de aves de rapina na área de estudo, especialmente no domínio do Cerrado / Urbanization and the expansion of agricultural frontiers are among the main forces driving the degradation of natural habitats in Neotropical open habitats. Inference and estimates of abundance are critical for quantifying population dynamics and the impacts of environmental change. Yet imperfect detection and other phenomena that cause zero inflation can induce estimation error and obscure ecological patterns. We examine how detection error and zero-inflation in count data of unmarked individuals inform the choice of analytical method for estimating population size. We review established (GLMs and distance sampling) and emerging methods that use N-mixture models (Royle-Nichols model, and basic, zero-inflated, temporary emigration, beta-binomial, generalized open-population, spatially explicit, single-visit and multispecies) to estimate abundance of unmarked populations. As a case study, we employed a single visit N-mixture approach to model roadside raptor count data and investigate how land-use transformations in the Cerrado and Pantanal domains in Brazil have affected the populations of 12 species on a regional scale (>300,000 km2). Methods differ in sampling design requirements, and their suitability will depend on the study species, scale and objectives of the study, and financial and logistical considerations, which should be evaluated to use funds, time and effort efficiently. In the case study, detection of all species was influenced by time of day, with effects that follow expectations based on foraging and flying behavior. Closed vegetation on and carcasses found during surveys also influenced detection of some species. Abundance of most species was negatively influenced by conversion of natural Cerrado and Pantanal habitats to anthropogenic uses, particularly pastures, soybean and sugar cane plantations, even for generalist species usually considered poor habitat-quality indicators. Protection of the remaining natural habitats is essential to prevent further decline of raptor populations in the study area, especially in the Cerrado domain Aves de rapina Cerrado Dados de contagens Detecção Distance sampling Estimativa de abundância Inflação de zeros Modelo de visita única Modelo hierárquico Modelo N-mixture Pantanal Tamanho populacional Abundance estimation Cerrado Count data Distance sampling Hierarchial model N-mixture model Pantanal Population size Raptor Single visit model Zero inflation
230	Méthode non-paramétrique des noyaux associés mixtes et applications / Non parametric method of mixed associated kernels and applications Libengue Dobele-kpoka, Francial Giscard Baudin 13 June 2013 (has links) Nous présentons dans cette thèse, l'approche non-paramétrique par noyaux associés mixtes, pour les densités àsupports partiellement continus et discrets. Nous commençons par rappeler d'abord les notions essentielles d'estimationpar noyaux continus (classiques) et noyaux associés discrets. Nous donnons la définition et les caractéristiques desestimateurs à noyaux continus (classiques) puis discrets. Nous rappelons aussi les différentes techniques de choix deparamètres de lissage et nous revisitons les problèmes de supports ainsi qu'une résolution des effets de bord dans le casdiscret. Ensuite, nous détaillons la nouvelle méthode d'estimation de densités par les noyaux associés continus, lesquelsenglobent les noyaux continus (classiques). Nous définissons les noyaux associés continus et nous proposons laméthode mode-dispersion pour leur construction puis nous illustrons ceci sur les noyaux associés non-classiques de lalittérature à savoir bêta et sa version étendue, gamma et son inverse, gaussien inverse et sa réciproque le noyau dePareto ainsi que le noyau lognormal. Nous examinons par la suite les propriétés des estimateurs qui en sont issus plusprécisément le biais, la variance et les erreurs quadratiques moyennes ponctuelles et intégrées. Puis, nous proposons unalgorithme de réduction de biais que nous illustrons sur ces mêmes noyaux associés non-classiques. Des études parsimulations sont faites sur trois types d’estimateurs à noyaux lognormaux. Par ailleurs, nous étudions lescomportements asymptotiques des estimateurs de densité à noyaux associés continus. Nous montrons d'abord lesconsistances faibles et fortes ainsi que la normalité asymptotique ponctuelle. Ensuite nous présentons les résultats desconsistances faibles et fortes globales en utilisant les normes uniformes et L1. Nous illustrons ceci sur trois typesd’estimateurs à noyaux lognormaux. Par la suite, nous étudions les propriétés minimax des estimateurs à noyauxassociés continus. Nous décrivons d'abord le modèle puis nous donnons les hypothèses techniques avec lesquelles noustravaillons. Nous présentons ensuite nos résultats minimax tout en les appliquant sur les noyaux associés non-classiquesbêta, gamma et lognormal. Enfin, nous combinons les noyaux associés continus et discrets pour définir les noyauxassociés mixtes. De là, les outils d'unification d'analyses discrètes et continues sont utilisés, pour montrer les différentespropriétés des estimateurs à noyaux associés mixtes. Une application sur un modèle de mélange des lois normales et dePoisson tronquées est aussi donnée. Tout au long de ce travail, nous choisissons le paramètre de lissage uniquementavec la méthode de validation croisée par les moindres carrés. / We present in this thesis, the non-parametric approach using mixed associated kernels for densities withsupports being partially continuous and discrete. We first start by recalling the essential concepts of classical continuousand discrete kernel density estimators. We give the definition and characteristics of these estimators. We also recall thevarious technical for the choice of smoothing parameters and we revisit the problems of supports as well as a resolutionof the edge effects in the discrete case. Then, we describe a new method of continuous associated kernels for estimatingdensity with bounded support, which includes the classical continuous kernel method. We define the continuousassociated kernels and we propose the mode-dispersion for their construction. Moreover, we illustrate this on the nonclassicalassociated kernels of literature namely, beta and its extended version, gamma and its inverse, inverse Gaussianand its reciprocal, the Pareto kernel and the kernel lognormal. We subsequently examine the properties of the estimatorswhich are derived, specifically, the bias, variance and the pointwise and integrated mean squared errors. Then, wepropose an algorithm for reducing bias that we illustrate on these non-classical associated kernels. Some simulationsstudies are performed on three types of estimators lognormal kernels. Also, we study the asymptotic behavior of thecontinuous associated kernel estimators for density. We first show the pointwise weak and strong consistencies as wellas the asymptotic normality. Then, we present the results of the global weak and strong consistencies using uniform andL1norms. We illustrate this on three types of lognormal kernels estimators. Subsequently, we study the minimaxproperties of the continuous associated kernel estimators. We first describe the model and we give the technicalassumptions with which we work. Then we present our results that we apply on some non-classical associated kernelsmore precisely beta, gamma and lognormal kernel estimators. Finally, we combine continuous and discrete associatedkernels for defining the mixed associated kernels. Using the tools of the unification of discrete and continuous analysis,we show the different properties of the mixed associated kernel estimators. All through this work, we choose thesmoothing parameter using the least squares cross-validation method. Convergence Densité mixte Échelles de temps Effet de bords Estimation non-paramétrique par noyau Modèle de mélange Noyau uni-modal Paramètre de dispersion Validation croisée Asymmetric kernel Boundary effect Convergence Cross-validation Dispersion parameter Mixed density Mixture model Nonparametric kernel estimation Time-scales Unimodal kernel 519

Search results