Global ETD Search

141	Misturas finitas de normais assimétricas e de t assimétricas aplicadas em análise discriminante Coelho, Carina Figueiredo 28 June 2013 (has links) Submitted by Kamila Costa (kamilavasconceloscosta@gmail.com) on 2015-06-18T20:16:38Z No. of bitstreams: 1 Dissertação-Carina Figueiredo Coelho.pdf: 3096964 bytes, checksum: 57c06ccd1fdc732a7cf9a50381d3806b (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2015-07-06T15:29:34Z (GMT) No. of bitstreams: 1 Dissertação-Carina Figueiredo Coelho.pdf: 3096964 bytes, checksum: 57c06ccd1fdc732a7cf9a50381d3806b (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2015-07-06T15:27:26Z (GMT) No. of bitstreams: 1 Dissertação-Carina Figueiredo Coelho.pdf: 3096964 bytes, checksum: 57c06ccd1fdc732a7cf9a50381d3806b (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2015-07-06T15:33:36Z (GMT) No. of bitstreams: 1 Dissertação-Carina Figueiredo Coelho.pdf: 3096964 bytes, checksum: 57c06ccd1fdc732a7cf9a50381d3806b (MD5) / Made available in DSpace on 2015-07-06T15:33:36Z (GMT). No. of bitstreams: 1 Dissertação-Carina Figueiredo Coelho.pdf: 3096964 bytes, checksum: 57c06ccd1fdc732a7cf9a50381d3806b (MD5) Previous issue date: 2013-06-28 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / We investigated use of finite mixture models with skew normal independent distributions to model the conditional distributions in discriminat analysis, particularly the skew normal and skew t. To evaluate this model, we developed a simulation study and applications with real data sets, analyzing error rates associated with the classifiers obtained with these mixture models. Problems were simulated with different structures and separations for the classes distributions employing different training set sizes. The results of the study suggest that the models evaluated are able to adjust to different problems studied, from the simplest to the most complex in terms of modeling the observations for classification purposes. With real data, where then shapes distributions of the class is unknown, the models showed reasonable error rates when compared to other classifiers. As a limitation for the analized sets of data was observed that modeling by finite mixtures requires large samples per class when the dimension of the feature vector is relatively high. / Investigamos o emprego de misturas finitas de densidades na família normal assimétrica independente, em particular a normal assimétrica e a t assimétrica, para modelar as distribuições condicionais do vetor de características em Análise Discriminante (AD). O objetivo é obter modelos capazes de modelar dados com estruturas mais complexas onde, por exemplo, temos assimetria e multimodalidade, o quemuitas vezes ocorrem em problemas reais de AD. Para avaliar esta modelagem, desenvolvemos um estudo de simulação e aplicações em dados reais, analisando a taxa de erro (TE) associadas aos classificadores obtidos com estes modelos de misturas. Foram simulados problemas com diferentes estruturas, relativas à separação e distribuição das classes e o tamanho do conjunto de treinamento. Os resultados do estudo sugerem que os modelos avaliados são capazes de se ajustar aos diferentes problemas estudados, desde os mais simples aos mais complexos, em termos de modelagem das observações para fins de classificação. Com os dados reais, situações onde desconhecemos as formas das distribuições nas classes, os modelos apresentaram TE’s razoáveis quando comparados a outros classificadores. Como uma limitação, para os conjuntos de dados analisados, foi observado que a modelagem por misturas finitas necessita de amostras grandes por classe em situações onde a dimensão do vetor de características é relativamente alta. Análise Discriminante Mistura Finita de Densidades Normal Assimétrica e t Assimétrica Discriminant analysis Finite mixture models Skew normal and skew t CIÊNCIAS EXATAS E DA TERRA: MATEMÁTICA
142	Modelagem de dados contínuos censurados, inflacionados de zeros / Modeling censored continous, zero inflated Janeiro, Vanderly 16 July 2010 (has links) Muitos equipamentos utilizados para quantificar substâncias, como toxinas em alimentos, freqüentemente apresentam deficiências para quantificar quantidades baixas. Em tais casos, geralmente indicam a ausência da substância quando esta existe, mas está abaixo de um valor pequeno \'ksi\' predeterminado, produzindo valores iguais a zero não necessariamente verdadeiros. Em outros casos, detectam a presença da substância, mas são incapazes de quantificá-la quando a quantidade da substância está entre \'ksai\' e um valor limiar \'tau\', conhecidos. Por outro lado, quantidades acima desse valor limiar são quantificadas de forma contínua, dando origem a uma variável aleatória contínua X cujo domínio pode ser escrito como a união dos intervalos, [ómicron, \"ksai\'), [\"ksai\', \'tau\' ] e (\'tau\', ?), sendo comum o excesso de valores iguais a zero. Neste trabalho, são propostos modelos que possibilitam discriminar a probabilidade de zeros verdadeiros, como o modelo de mistura com dois componentes, sendo um degenerado em zero e outro com distribuição contínua, sendo aqui consideradas as distribuições: exponencial, de Weibull e gama. Em seguida, para cada modelo, foram observadas suas características, propostos procedimentos para estimação de seus parâmetros e avaliados seus potenciais de ajuste por meio de métodos de simulação. Finalmente, a metodologia desenvolvida foi ilustrada por meio da modelagem de medidas de contaminação com aflatoxina B1, observadas em grãos de milho, de três subamostras de um lote de milho, analisados no Laboratório de Micotoxinas do Departamento de Agroindústria, Alimentos e Nutrição da ESALQ/USP. Como conclusões, na maioria dos casos, as simulações indicaram eficiência dos métodos propostos para as estimações dos parâmetros dos modelos, principalmente para a estimativa do parâmetro \'delta\' e do valor esperado, \'Epsilon\' (Y). A modelagem das medidas de aflatoxina, por sua vez, mostrou que os modelos propostos são adequados aos dados reais, sendo que o modelo de mistura com distribuição de Weibull, entretanto, ajustou-se melhor aos dados. / Much equipment used to quantify substances, such as toxins in foods, is unable to measure low amounts. In cases where the substance exists, but in an amount below a small fixed value \'ksi\' , the equipment usually indicates that the substance is not present, producing values equal to zero. In cases where the quantity is between \'\'ksi\' and a known threshold value \'tau\', it detects the presence of the substance but is unable to measure the amount. When the substance exists in amounts above the threshold value ?, it is measure continuously, giving rise to a continuous random variable X whose domain can be written as the union of intervals, [ómicron, \"ksai\'), [\"ksai\', \'tau\' ] and (\'tau\', ?), This random variable commonly has an excess of zero values. In this work we propose models that can detect the probability of true zero, such as the mixture model with two components, one being degenerate at zero and the other with continuous distribution, where we considered the distributions: exponential, Weibull and gamma. Then, for each model, its characteristics were observed, procedures for estimating its parameters were proposed and its potential for adjustment by simulation methods was evaluated. Finally, the methodology was illustrated by modeling measures of contamination with aflatoxin B1, detected in grains of corn from three sub-samples of a batch of corn analyzed at the laboratory of of Mycotoxins, Department of Agribusiness, Food and Nutrition ESALQ/USP. In conclusion, in the majority of cases the simulations indicated that the proposed methods are efficient in estimating the parameters of the models, in particular for estimating the parameter ? and the expected value, E(Y). The modeling of measures of aflatoxin, in turn, showed that the proposed models are appropriate for the actual data, however the mixture model with a Weibull distribution fits the data best. Aflatoxinas aflatoxins Dados censurados Distribuições (Probabilidade) Estatística aplicada exponential distribution gamma distribution Maximum likilihood Mixture models Modelagem de dados Verossimilhança. weibull distribution Zeros inflation
143	Analyse et modélisation de la Dominance Temporelle des Sensations à l'aide de processus stochastiques / Analysis and modeling of Temporal Dominance of Sensations with stochastic processes Lecuelle, Guillaume 01 October 2019 (has links) La Dominance Temporelle des Sensations (DTS) est une méthode d’analyse sensorielle qui mesure la perception temporelle d’un produit au cours de sa dégustation. Pour un panéliste, la DTS consiste à choisir parmi une liste de descripteurs lequel est dominant à chaque instant. Ce travail a pour but la modélisation des données DTS à l’aide de processus stochastiques et propose d’utiliser les processus semi-markoviens (PSM), une généralisation des chaînes de Markov qui permet de modéliser librement les durées de dominance. Le modèle obtenu peut être utilisé pour comparer des échantillons DTS en réalisant un rapport de vraisemblance. Étant donné que les probabilités de transition entre les descripteurs peuvent dépendre du temps, nous proposons d’utiliser des modèles différents par période et nous proposons un algorithme pour déterminer le nombre et les frontières de ces périodes de manière optimale. Le modèle est représenté sous forme d’un graphe montrant les transitions entre descripteurs les plus observées. Finalement, ce travail introduit les modèles de mélange de processus semi-markoviens afin de segmenter le panel en fonction des différences de perception interindividuelles.Les méthodes développées sont appliquées à des jeux de données DTS variés : chocolats, fromages frais et Goudas. Les résultats montrent que la modélisation par un PSM apporte de nouvelles informations sur la perception temporelle, en particulier sur la variabilité de perception au sein d’un panel, alors que les méthodes classiques se focalisent sur une vision moyenne de la perception du panel. De plus, à notre connaissance, ce travail est le premier à proposer l’identification d’un modèle de mélange de processus semi-markoviens. / Temporal Dominance of Sensations (TDS) is a technique to measure temporal perception of food product during tasting. For a panelist, it consists in choosing in a list of attributes which one is dominant at any time. This work aims to model TDS data with a stochastic process and proposes to use semi-Markov processes (SMP), a generalization of Markov chains which allows dominance durations to be modeled by any type of distribution. The model can then be used to compare TDS samples based on likelihood ratio. Because probabilities of transition from one attribute to another one can also depend on time, we propose to model TDS by period and we propose a method to select optimally the number of periods and the frontiers between periods. Graphs built upon the stochastic pattern can be plotted to represent main chronological transitions between attributes. Finally, this work introduces new statistical models based on finite mixtures of semi-Markov processes in order to derive consumer segmentation based on individual differences in temporal perception of a product.The methods are applied to various TDS datasets: chocolates, fresh cheeses and Gouda cheeses. Results show that SMP modeling gives new information about temporal perception compared to classical methods. It particularly emphasizes the existence of several perceptions for a same product in a panel, whereas classical methods only provide a mean panel overview. Furthermore, as far as we know, this work is the first one that considers mixtures of semi-Markov processes. Analyse sensorielle Dominance Temporelle des Sensations Processus semi-Markoviens Modèles de mélange Sensory analysis Temporal Dominance of Sensations Semi-Markov processes Mixture models 664.07
144	Genomic sequence processing: gene finding in eukaryotes Akhtar, Mahmood, Electrical Engineering & Telecommunications, Faculty of Engineering, UNSW January 2008 (has links) Of the many existing eukaryotic gene finding software programs, none are able to guarantee accurate identification of genomic protein coding regions and other biological signals central to pathway from DNA to the protein. Eukaryotic gene finding is difficult mainly due to noncontiguous and non-continuous nature of genes. Existing approaches are heavily dependent on the compositional statistics of the sequences they learn from and are not equally suitable for all types of sequences. This thesis firstly develops efficient digital signal processing-based methods for the identification of genomic protein coding regions, and then combines the optimum signal processing-based non-data-driven technique with an existing data-driven statistical method in a novel system demonstrating improved identification of acceptor splice sites. Most existing well-known DNA symbolic-to-numeric representations map the DNA information into three or four numerical sequences, potentially increasing the computational requirement of the sequence analyzer. Proposed mapping schemes, to be used for signal processing-based gene and exon prediction, incorporate DNA structural properties in the representation, in addition to reducing complexity in subsequent processing. A detailed comparison of all DNA representations, in terms of computational complexity and relative accuracy for the gene and exon prediction problem, reveals the newly proposed ?paired numeric? to be the best DNA representation. Existing signal processing-based techniques rely mostly on the period-3 behaviour of exons to obtain one dimensional gene and exon prediction features, and are not well equipped to capture the complementary properties of exonic / intronic regions and deal with the background noise in detection of exons at their nucleotide levels. These issues have been addressed in this thesis, by proposing six one-dimensional and three multi-dimensional signal processing-based gene and exon prediction features. All one-dimensional and multi-dimensional features have been evaluated using standard datasets such as Burset/Guigo1996, HMR195, and the GENSCAN test set. This is the first time that different gene and exon prediction features have been compared using substantial databases and using nucleotide-level metrics. Furthermore, the first investigation of the suitability of different window sizes for period-3 exon detection is performed. Finally, the optimum signal processing-based gene and exon prediction scheme from our evaluations is combined with a data-driven statistical technique for the recognition of acceptor splice sites. The proposed DSP-statistical hybrid is shown to achieve 43% reduction in false positives over WWAM, as used in GENSCAN. Gaussian mixture models deoxyribonucleic acid (DNA) discrete Fourier transforms (DFTs) correlation discrete cosine transforms (DCTs) Eukaryotes Gene mapping -- Computer simulation DNA -- Analysis
145	A Note on the Generalization Performance of Kernel Classifiers with Margin Evgeniou, Theodoros, Pontil, Massimiliano 01 May 2000 (has links) We present distribution independent bounds on the generalization misclassification performance of a family of kernel classifiers with margin. Support Vector Machine classifiers (SVM) stem out of this class of machines. The bounds are derived through computations of the $V_gamma$ dimension of a family of loss functions where the SVM one belongs to. Bounds that use functions of margin distributions (i.e. functions of the slack variables of SVM) are derived. AI MIT Artificial Intelligence missing data mixture models statistical learning EM algorithm neural networks kernel classifiers Support Vector Machine regularization networks statistical learning theory V-gamma dimension.
146	Markov Random Field Based Road Network Extraction From High Resoulution Satellite Images Ozturk, Mahir 01 February 2013 (has links) (PDF) Road Networks play an important role in various applications such as urban and rural planning, infrastructure planning, transportation management, vehicle navigation. Extraction of Roads from Remote Sensed satellite images for updating road database in geographical information systems (GIS) is generally done manually by a human operator. However, manual extraction of roads is time consuming and labor intensive process. In the existing literature, there are a great number of researches published for the purpose of automating the road extraction process. However, automated processes still yield some erroneous and incomplete results and human intervention is still required. The aim of this research is to propose a framework for road network extraction from high spatial resolution multi-spectral imagery (MSI) to improve the accuracy of road extraction systems. The proposed framework begins with a spectral classification using One-class Support Vector Machines (SVM) and Gaussian Mixture Models (GMM) classifiers. Spectral Classification exploits the spectral signature of road surfaces to classify road pixels. Then, an iterative template matching filter is proposed to refine spectral classification results. K-medians clustering algorithm is employed to detect candidate road centerline points. Final road network formation is achieved by Markov Random Fields. The extracted road network is evaluated against a reference dataset using a set of quality metrics.
147	Improved GMM-Based Classification Of Music Instrument Sounds Krishna, A G 05 1900 (has links) This thesis concerns with the recognition of music instruments from isolated notes. Music instrument recognition is a relatively nascent problem fast gaining importance not only because of the academic value the problem provides, but also for the potential it has in being able to realize applications like music content analysis, music transcription etc. Line spectral frequencies are proposed as features for music instrument recognition and shown to perform better than Mel filtered cepstral coefficients and linear prediction cepstral coefficients. Assuming a linear model of sound production, features based on the prediction residual, which represents the excitation signal, is proposed. Four improvements are proposed for classification using Gaussian mixture model (GMM) based classifiers. One of them involves characterizing the regions of overlap between classes in the feature space to improve classification. Applications to music instrument recognition and speaker recognition are shown. An experiment is proposed for discovering the hierarchy in music instrument in a data-driven manner. The hierarchy thus discovered closely corresponds to the hierarchy defined by musicians and experts and therefore shows that the feature space has successfully captured the required features for music instrument characterization. Sound-Pattern Perception Music Instrument Recognition Speaker Recognition Gaussian Mixture Models GMM MIR Speaker Identification Speaker Segmentation Music Instruments Improved Classification Communication Engineering
148	Mixture model analysis with rank-based samples Hatefi, Armin January 2013 (has links) Simple random sampling (SRS) is the most commonly used sampling design in data collection. In many applications (e.g., in fisheries and medical research) quantification of the variable of interest is either time-consuming or expensive but ranking a number of sampling units, without actual measurement on them, can be done relatively easy and at low cost. In these situations, one may use rank-based sampling (RBS) designs to obtain more representative samples from the underlying population and improve the efficiency of the statistical inference. In this thesis, we study the theory and application of the finite mixture models (FMMs) under RBS designs. In Chapter 2, we study the problems of Maximum Likelihood (ML) estimation and classification in a general class of FMMs under different ranked set sampling (RSS) designs. In Chapter 3, deriving Fisher information (FI) content of different RSS data structures including complete and incomplete RSS data, we show that the FI contained in each variation of the RSS data about different features of FMMs is larger than the FI contained in their SRS counterparts. There are situations where it is difficult to rank all the sampling units in a set with high confidence. Forcing rankers to assign unique ranks to the units (as RSS) can lead to substantial ranking error and consequently to poor statistical inference. We hence focus on the partially rank-ordered set (PROS) sampling design, which is aimed at reducing the ranking error and the burden on rankers by allowing them to declare ties (partially ordered subsets) among the sampling units. Studying the information and uncertainty structures of the PROS data in a general class of distributions, in Chapter 4, we show the superiority of the PROS design in data analysis over RSS and SRS schemes. In Chapter 5, we also investigate the ML estimation and classification problems of FMMs under the PROS design. Finally, we apply our results to estimate the age structure of a short-lived fish species based on the length frequency data, using SRS, RSS and PROS designs. Finite mixture models Ranked set sampling Partial ranking Latent variables Expectation-Maximization algorithm Classification Fisher information Entropy Age structures of Spot fish
149	Statistical Post-Processing Methods And Their Implementation On The Ensemble Prediction Systems For Forecasting Temperature In The Use Of The French Electric Consumption Gogonel, Adriana Geanina 27 November 2012 (has links) (PDF) The thesis has for objective to study new statistical methods to correct temperature predictionsthat may be implemented on the ensemble prediction system (EPS) of Meteo France so toimprove its use for the electric system management, at EDF France. The EPS of Meteo Francewe are working on contains 51 members (forecasts by time-step) and gives the temperaturepredictions for 14 days. The thesis contains three parts: in the first one we present the EPSand we implement two statistical methods improving the accuracy or the spread of the EPS andwe introduce criteria for comparing results. In the second part we introduce the extreme valuetheory and the mixture models we use to combine the model we build in the first part withmodels for fitting the distributions tails. In the third part we introduce the quantile regressionas another way of studying the tails of the distribution. Temperature forecasts Ensemble prediction systems Best member method EPS validation criteria Extreme value theory Mixture models Quantile regression
150	台灣地區影音著作盜版率之研究 / The study of audio-visual works' piracy rate in Taiwan. 邱奕傑, Chiu,Yi-Jye. Unknown Date (has links) 隨著資訊科技的發展與網際網路的普及，音樂與電影光碟盜版的問題也逐年嚴重，然影音盜版不僅影響權利人團體，影音業者及創作者之生存，亦攸關我國智慧財產之發展，更常成為我國在國際貿易諮商上的重要課題。在種種緣由與現況下，得使國內許多產、官、學、研團體想去研究影音盜版的相關議題，以了解其嚴重程度如何，或有無較客觀合理的指標或評估方式?並進一步研擬有效方法來防範盜版問題進一步惡化，為此上述問題乃是本研究之源起。目前所有的影音盜版研究，多著重在計算盜版率,探討盜版因素,盜版行為的心理與法制問題，皆還尚未針對影音盜版率，建構出可供學者推論的盜版率機率分配，及其他相關的數量研究，因此，本研究的主要實証方向，乃根據2004年經濟部智慧財產局(intellectual property office ministry of economic affairs,R.O.C)委託政治大學之消費者調查資料，就音樂CD、影音VCD/DVD兩部分，針對筆者有興趣之變項(性別、年齡、有無上網下載等)，(1)分別建構各自的混合分配並了解其分配間的差異與趨勢， (2)探討消費者對盜版行為的態度，(3)了解消費者對喜好的光碟所願付價格之差異，(4)建立盜版率分配的信賴帶，以及(5)針對現有的調查資料進行盜版辨別。最後，就查緝盜版與維護智慧財產權兩方面，實證分析提供政府相關單位作為參考的依據，以求擬訂周詳且完善的措施來防範日益惡化的盜版問題。 / With the development of computer technology and widespread of internet, the piracy problem goes more serious. The piracy situation makes much influence not only on the rights of international oblige societies but also the growing of the intellectual properties in Taiwan. Moreover, it becomes the rock on the road of international commercial negotiations. Beyond the serious situation in the mean time, more researchers and relevant organizations on the island are trying to pay more attention to this important issue. This research intends to understand several questions: How is the actual situation on the piracy problem? Are there any objective evaluation ways? Are there any effective policies to prevent it from going deeper? These questions lead to this research. In the meantime, most of Audio & Video piracy research emphasized only on calculating the piracy rate, or the reasons, or the relevant psychological and law problems, but few on piracy quantitative studies. Therefore the mainly intention of this research is based on the data from the IPO(Intellectual Property Office Ministry of Economic Affairs, ROC), which is executing by National Chengchi University. As for the two parts concerning music CD and visual VCD/DVD, and the variables those I am highly interested including gender, age, education level, downloading or not. The empirical study results show as below: (1)The piracy rate distribution corresponds with the Mixed Model, that mean that it have been proportionally mixed two degenerate distribution (while X=0 and 100) with the Normal distribution. (2) On the facets of distribution differences and trends analysis, not only music CD and visual VCD/DVD, the results of the research by Mann-Whitney test and Kolmogorov-Smirnov two sample test both reveal the rising tendency of overall piracy rate. The generation of 20~29 years old is the mainly pirate group, moreover, higher education grades group does the more pirating behaviors, and lower income group intends more unauthorized copying conducts. Furthermore, along with the development of internet technology, the infringement behavior is more serious on the network connectors than the non-network downloaders. (3) Under surveying the opinions of consumers about the piracy, regardless of whether music or movies, the deviation is more serious on male than female, under 30-year-old than above, low educated than high, low income than high, pirate than non-pirate, downloaders than non-downloaders. The problem locates not only the lack of the concepts and recognition on the intellectual properties rights, but also the scarce of moral or legal limitations on the unauthorized rebuilding or downloading. But in the other curious facet, although the higher grade educated groups got more equitable standpoints on the piracy discussion, but evidenced depend upon the collected data they are also mainly the group who did the piracy behaviors more. (4) On the price range that a consumer would like to pay for, most of the pirate consumer tends to pay low price to buy the A/V goods, most of the non-pirating consumer group tends to pay general price to buy ones, and no significant difference of these two groups with high price, (5) On the facets of confidence bands on the whole music CD and visual VCD/DVD pirating rate, because of the specialties of pirating data- the higher frequency while the piracy rate values 0 and 100, so that the upper and lower bound reveals at 0 and 100. Futhermore, the confidence bands obtains from the population distribution function, therefore it’s suitable for the goodness-of-fit test. The results met the Kolmogorov-Smirnov one sample test. (6) On the data recognition facets, the logistic regression model of piracy is constructed in this research. Classification from the fitted logistic regression models, the results reveals 107 non-pirate are mis-judged to pirating behaviors, 186 pirating samples are neglected to non-pirate ones, the correct recognition rate goes high of 88 %. Key Words：Piracy Rate, Mixture Models, Mann-Whitney, Kolmogorov-Smirnov, Logistic Regression Model, Nonparametric Statistics. 盜版率混合模型無母數統計羅吉斯迴歸模型 Piracy rate Mixture models Nonparametric statistics Logistic regression model

Search results