Spelling suggestions: "subject:"crinite mixture models"" "subject:"cofinite mixture models""
11 |
Misturas finitas de normais assimétricas e de t assimétricas aplicadas em análise discriminanteCoelho, Carina Figueiredo 28 June 2013 (has links)
Submitted by Kamila Costa (kamilavasconceloscosta@gmail.com) on 2015-06-18T20:16:38Z
No. of bitstreams: 1
Dissertação-Carina Figueiredo Coelho.pdf: 3096964 bytes, checksum: 57c06ccd1fdc732a7cf9a50381d3806b (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2015-07-06T15:29:34Z (GMT) No. of bitstreams: 1
Dissertação-Carina Figueiredo Coelho.pdf: 3096964 bytes, checksum: 57c06ccd1fdc732a7cf9a50381d3806b (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2015-07-06T15:27:26Z (GMT) No. of bitstreams: 1
Dissertação-Carina Figueiredo Coelho.pdf: 3096964 bytes, checksum: 57c06ccd1fdc732a7cf9a50381d3806b (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2015-07-06T15:33:36Z (GMT) No. of bitstreams: 1
Dissertação-Carina Figueiredo Coelho.pdf: 3096964 bytes, checksum: 57c06ccd1fdc732a7cf9a50381d3806b (MD5) / Made available in DSpace on 2015-07-06T15:33:36Z (GMT). No. of bitstreams: 1
Dissertação-Carina Figueiredo Coelho.pdf: 3096964 bytes, checksum: 57c06ccd1fdc732a7cf9a50381d3806b (MD5)
Previous issue date: 2013-06-28 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / We investigated use of finite mixture models with skew normal independent distributions
to model the conditional distributions in discriminat analysis, particularly the skew
normal and skew t. To evaluate this model, we developed a simulation study and applications
with real data sets, analyzing error rates associated with the classifiers obtained with
these mixture models. Problems were simulated with different structures and separations
for the classes distributions employing different training set sizes. The results of the study
suggest that the models evaluated are able to adjust to different problems studied, from
the simplest to the most complex in terms of modeling the observations for classification
purposes. With real data, where then shapes distributions of the class is unknown, the
models showed reasonable error rates when compared to other classifiers. As a limitation
for the analized sets of data was observed that modeling by finite mixtures requires large
samples per class when the dimension of the feature vector is relatively high. / Investigamos o emprego de misturas finitas de densidades na família normal assimétrica
independente, em particular a normal assimétrica e a t assimétrica, para modelar as
distribuições condicionais do vetor de características em Análise Discriminante (AD). O
objetivo é obter modelos capazes de modelar dados com estruturas mais complexas onde,
por exemplo, temos assimetria e multimodalidade, o quemuitas vezes ocorrem em problemas
reais de AD. Para avaliar esta modelagem, desenvolvemos um estudo de simulação
e aplicações em dados reais, analisando a taxa de erro (TE) associadas aos classificadores
obtidos com estes modelos de misturas. Foram simulados problemas com diferentes
estruturas, relativas à separação e distribuição das classes e o tamanho do conjunto de
treinamento. Os resultados do estudo sugerem que os modelos avaliados são capazes de
se ajustar aos diferentes problemas estudados, desde os mais simples aos mais complexos,
em termos de modelagem das observações para fins de classificação. Com os dados
reais, situações onde desconhecemos as formas das distribuições nas classes, os modelos
apresentaram TE’s razoáveis quando comparados a outros classificadores. Como uma
limitação, para os conjuntos de dados analisados, foi observado que a modelagem por
misturas finitas necessita de amostras grandes por classe em situações onde a dimensão
do vetor de características é relativamente alta.
|
12 |
Mixture model analysis with rank-based samplesHatefi, Armin January 2013 (has links)
Simple random sampling (SRS) is the most commonly used sampling design in data collection. In many applications (e.g., in fisheries and medical research) quantification of the variable of interest is either time-consuming or expensive but ranking a number of sampling units, without actual measurement on them, can be done relatively easy and at low cost. In these situations, one may use rank-based sampling (RBS) designs to obtain more representative samples from the underlying population and improve the efficiency of the statistical inference. In this thesis, we study the theory and application of the finite mixture models (FMMs) under RBS designs. In Chapter 2, we study the problems of Maximum Likelihood (ML) estimation and classification in a general class of FMMs under different ranked set sampling (RSS) designs. In Chapter 3, deriving Fisher information (FI) content of different RSS data structures including complete and incomplete RSS data, we show that the FI contained in each variation of the RSS data about different features of FMMs is larger than the FI contained in their SRS counterparts. There are situations where it is difficult to rank all the sampling units in a set with high confidence. Forcing rankers to assign unique ranks to the units (as RSS) can lead to substantial ranking error and consequently to poor statistical inference. We hence focus on the partially rank-ordered set (PROS) sampling design, which is aimed at reducing the ranking error and the burden on rankers by allowing them to declare ties (partially ordered subsets) among the sampling units. Studying the information and uncertainty structures of the PROS data in a general class of distributions, in Chapter 4, we show the superiority of the PROS design in data analysis over RSS and SRS schemes. In Chapter 5, we also investigate the ML estimation and classification problems of FMMs under the PROS design. Finally, we apply our results to estimate the age structure of a short-lived fish species based on the length frequency data, using SRS, RSS and PROS designs.
|
13 |
Modelos de mistura de distribuições na segmentação de imagens SAR polarimétricas multi-look / Multi-look polarimetric SAR image segmentation using mixture modelsMichelle Matos Horta 04 June 2009 (has links)
Esta tese se concentra em aplicar os modelos de mistura de distribuições na segmentação de imagens SAR polarimétricas multi-look. Dentro deste contexto, utilizou-se o algoritmo SEM em conjunto com os estimadores obtidos pelo método dos momentos para calcular as estimativas dos parâmetros do modelo de mistura das distribuições Wishart, Kp ou G0p. Cada uma destas distribuições possui parâmetros específicos que as diferem no ajuste dos dados com graus de homogeneidade variados. A distribuição Wishart descreve bem regiões com características mais homogêneas, como cultivo. Esta distribuição é muito utilizada na análise de dados SAR polarimétricos multi-look. As distribuições Kp e G0p possuem um parâmetro de rugosidade que as permitem descrever tanto regiões mais heterogêneas, como vegetação e áreas urbanas, quanto regiões homogêneas. Além dos modelos de mistura de uma única família de distribuições, também foi analisado o caso de um dicionário contendo as três famílias. Há comparações do método SEM proposto para os diferentes modelos com os métodos da literatura k-médias e EM utilizando imagens reais da banda L. O método SEM com a mistura de distribuições G0p forneceu os melhores resultados quando os outliers da imagem são desconsiderados. A distribuição G0p foi a mais flexível ao ajuste dos diferentes tipos de alvo. A distribuição Wishart foi robusta às diferentes inicializações. O método k-médias com a distribuição Wishart é robusto à segmentação de imagens contendo outliers, mas não é muito flexível à variabilidade das regiões heterogêneas. O modelo de mistura do dicionário de famílias melhora a log-verossimilhança do método SEM, mas apresenta resultados parecidos com os do modelo de mistura G0p. Para todos os tipos de inicialização e grupos, a distribuição G0p predominou no processo de seleção das distribuições do dicionário de famílias. / The main focus of this thesis consists of the application of mixture models in multi-look polarimetric SAR image segmentation. Within this context, the SEM algorithm, together with the method of moments, were applied in the estimation of the Wishart, Kp and G0p mixture model parameters. Each one of these distributions has specific parameters that allows fitting data with different degrees of homogeneity. The Wishart distribution is suitable for modeling homogeneous regions, like crop fields for example. This distribution is widely used in multi-look polarimetric SAR data analysis. The distributions Kp and G0p have a roughness parameter that allows them to describe both heterogeneous regions, as vegetation and urban areas, and homogeneous regions. Besides adopting mixture models of a single family of distributions, the use of a dictionary with all the three family of distributions was proposed and analyzed. Also, a comparison between the performance of the proposed SEM method, considering the different models in real L-band images and two widely known techniques described in literature (k-means and EM algorithms), are shown and discussed. The proposed SEM method, considering a G0p mixture model combined with a outlier removal stage, provided the best classication results. The G0p distribution was the most flexible for fitting the different kinds of data. The Wishart distribution was robust for different initializations. The k-means algorithm with Wishart distribution is robust for segmentation of SAR images containing outliers, but it is not so flexible to variabilities in heterogeneous regions. The mixture model considering the dictionary of distributions improves the SEM method log-likelihood, but presents similar results to those of G0p mixture model. For all types of initializations and clusters, the G0p prevailed in the distribution selection process of the dictionary of distributions.
|
14 |
Probabilistic models in noisy environments : and their application to a visual prosthesis for the blindArchambeau, Cédric 26 September 2005 (has links)
In recent years, probabilistic models have become fundamental techniques in machine learning. They are successfully applied in various engineering problems, such as robotics, biometrics, brain-computer interfaces or artificial vision, and will gain in importance in the near future. This work deals with the difficult, but common situation where the data is, either very noisy, or scarce compared to the complexity of the process to model. We focus on latent variable models, which can be formalized as probabilistic graphical models and learned by the expectation-maximization algorithm or its variants (e.g., variational Bayes).<br>
After having carefully studied a non-exhaustive list of multivariate kernel density estimators, we established that in most applications locally adaptive estimators should be preferred. Unfortunately, these methods are usually sensitive to outliers and have often too many parameters to set. Therefore, we focus on finite mixture models, which do not suffer from these drawbacks provided some structural modifications.<br>
Two questions are central in this dissertation: (i) how to make mixture models robust to noise, i.e. deal efficiently with outliers, and (ii) how to exploit side-channel information, i.e. additional information intrinsic to the data. In order to tackle the first question, we extent the training algorithms of the popular Gaussian mixture models to the Student-t mixture models. the Student-t distribution can be viewed as a heavy-tailed alternative to the Gaussian distribution, the robustness being tuned by an extra parameter, the degrees of freedom. Furthermore, we introduce a new variational Bayesian algorithm for learning Bayesian Student-t mixture models. This algorithm leads to very robust density estimators and clustering. To address the second question, we introduce manifold constrained mixture models. This new technique exploits the information that the data is living on a manifold of lower dimension than the dimension of the feature space. Taking the implicit geometrical data arrangement into account results in better generalization on unseen data.<br>
Finally, we show that the latent variable framework used for learning mixture models can be extended to construct probabilistic regularization networks, such as the Relevance Vector Machines. Subsequently, we make use of these methods in the context of an optic nerve visual prosthesis to restore partial vision to blind people of whom the optic nerve is still functional. Although visual sensations can be induced electrically in the blind's visual field, the coding scheme of the visual information along the visual pathways is poorly known. Therefore, we use probabilistic models to link the stimulation parameters to the features of the visual perceptions. Both black-box and grey-box models are considered. The grey-box models take advantage of the known neurophysiological information and are more instructive to medical doctors and psychologists.<br>
|
15 |
Classification de données multivariées multitypes basée sur des modèles de mélange : application à l'étude d'assemblages d'espèces en écologie / Model-based clustering for multivariate and mixed-mode data : application to multi-species spatial ecological dataGeorgescu, Vera 17 December 2010 (has links)
En écologie des populations, les distributions spatiales d'espèces sont étudiées afin d'inférer l'existence de processus sous-jacents, tels que les interactions intra- et interspécifiques et les réponses des espèces à l'hétérogénéité de l'environnement. Nous proposons d'analyser les données spatiales multi-spécifiques sous l'angle des assemblages d'espèces, que nous considérons en termes d'abondances absolues et non de diversité des espèces. Les assemblages d'espèces sont une des signatures des interactions spatiales locales des espèces entre elles et avec leur environnement. L'étude des assemblages d'espèces peut permettre de détecter plusieurs types d'équilibres spatialisés et de les associer à l'effet de variables environnementales. Les assemblages d'espèces sont définis ici par classification non spatiale des observations multivariées d'abondances d'espèces. Les méthodes de classification basées sur les modèles de mélange ont été choisies afin d'avoir une mesure de l'incertitude de la classification et de modéliser un assemblage par une loi de probabilité multivariée. Dans ce cadre, nous proposons : 1. une méthode d'analyse exploratoire de données spatiales multivariées d'abondances d'espèces, qui permet de détecter des assemblages d'espèces par classification, de les cartographier et d'analyser leur structure spatiale. Des lois usuelles, telle que la Gaussienne multivariée, sont utilisées pour modéliser les assemblages, 2. un modèle hiérarchique pour les assemblages d'abondances lorsque les lois usuelles ne suffisent pas. Ce modèle peut facilement s'adapter à des données contenant des variables de types différents, qui sont fréquemment rencontrées en écologie, 3. une méthode de classification de données contenant des variables de types différents basée sur des mélanges de lois à structure hiérarchique (définies en 2.). Deux applications en écologie ont guidé et illustré ce travail : l'étude à petite échelle des assemblages de deux espèces de pucerons sur des feuilles de clémentinier et l'étude à large échelle des assemblages d'une plante hôte, le plantain lancéolé, et de son pathogène, l'oïdium, sur les îles Aland en Finlande / In population ecology, species spatial patterns are studied in order to infer the existence of underlying processes, such as interactions within and between species, and species response to environmental heterogeneity. We propose to analyze spatial multi-species data by defining species abundance assemblages. Species assemblages are one of the signatures of the local spatial interactions between species and with their environment. Species assemblages are defined here by a non spatial classification of the multivariate observations of species abundances. Model-based clustering procedures using mixture models were chosen in order to have an estimation of the classification uncertainty and to model an assemblage by a multivariate probability distribution. We propose : 1. An exploratory tool for the study of spatial multivariate observations of species abundances, which defines species assemblages by a model-based clustering procedure, and then maps and analyzes the spatial structure of the assemblages. Common distributions, such as the multivariate Gaussian, are used to model the assemblages. 2. A hierarchical model for abundance assemblages which cannot be modeled with common distributions. This model can be easily adapted to mixed mode data, which are frequent in ecology. 3. A clustering procedure for mixed-mode data based on mixtures of hierarchical models. Two ecological case-studies guided and illustrated this work: the small-scale study of the assemblages of two aphid species on leaves of Citrus trees, and the large-scale study of the assemblages of a host plant, Plantago lanceolata, and its pathogen, the powdery mildew, on the Aland islands in south-west Finland
|
16 |
Relation entre les trajectoires d’usage régulier de cannabis et celles d’activité physique modérée à vigoureuse chez les jeunes adultesKabanemi, Tshala Tina 01 1900 (has links)
Contexte : Les études sur l’association entre la consommation de cannabis et l’activité physique sont majoritairement transversales et rapportent des résultats contradictoires. Ce mémoire se penche sur la relation entre les trajectoires de consommation de cannabis et celles d’activité physique chez les jeunes adultes pour éclairer la recherche et informer la santé publique. Objectifs : Décrire 1) les trajectoires sexospécifiques d’activité physique modérée à vigoureuse (APMV) et d’usage régulier (1-7 jours/semaine) de cannabis (URC) des adultes de 20 à 35 ans et 2) la relation entre les trajectoires des deux comportements. Méthodes : Les données sur les 742 participants proviennent des cinq derniers cycles de l’étude longitudinale Nicotine Dependence in Teens. L’âge moyen des participants à chaque suivi est de 20,3, 24,0, 30,5, 33,6 et 35,2 ans. La méthode fondée sur le groupement de Nagin est utilisée pour identifier des trajectoires distinctes d’APMV et d’URC. Des probabilités conditionnelles reliant les trajectoires des deux comportements sont estimées pour décrire leur relation. Résultats : Les quatre trajectoires d’APMV et les deux trajectoires d’URC identifiées sont similaires chez les deux sexes. Les probabilités conditionnelles suggèrent que la trajectoire croissante d’APMV est associée à la trajectoire d’URC persistant chez les hommes et, dans une moindre mesure, chez les femmes. Conclusion : Il existe des trajectoires distinctes d’APMV et d’URC chez les jeunes adultes. Les individus, et plus particulièrement les hommes, qui ont des niveaux croissants d’APMV de 20 à 35 ans ont une probabilité accrue de consommer régulièrement du cannabis durant la même période. / Background: Most studies investigating the association between cannabis use and physical activity are cross-sectional and they report contradictory results. We investigated how cannabis use and physical activity co-occur over time among young adults to better understand their relationship and inform research and public health. Objectives: Describe 1) sex-specific trajectories of moderate-to-vigorous physical activity (MVPA) and regular (1-7 days/week) cannabis use (RCU) from age 20 to 35 and 2) associations between the trajectories of these two behaviors. Methods: A total of 742 participants from the five most recent cycles of the Nicotine Dependence in Teens longitudinal study provided MVPA and RCU data. Mean age at each cycle was 20.3, 24.0, 30.5, 33.6 and 35.2 years. Group-based trajectory modeling was used to identify distinct trajectories of MVPA and RCU. Conditional probabilities linking trajectories across behaviors were estimated to describe associations between MVPA and RCU trajectories. Results: The four MVPA trajectories and the two RCU trajectories identified were similar across sexes. Conditional probabilities suggested an association between the increasing MVPA trajectory and the trajectory of persistent RCU, more so among men than among women. Conclusion: Distinctive trajectories of MVPA and RCU exist in young adulthood. Individuals, and particularly men, with increasing MVPA levels from age 20 to 35 have an increased probability of RCU over the same age range.
|
Page generated in 0.0479 seconds