Global ETD Search

121	Bayesian Probit Regression Models for Spatially-Dependent Categorical Data Berrett, Candace 02 November 2010 (has links) No description available. Statistics spatial statistics latent variable methods binary data categorical data data augmentation MCMC classification
122	Visualizing Categorical Time Series Data with Applications to Computer and Communications Network Traces Ribler, Randy L. 04 April 1997 (has links) Visualization tools allow scientists to comprehend very large data sets and to discover relationships which are otherwise difficult to detect. Unfortunately, not all types of data can be visualized easily using existing tools. In particular, long sequences of nonnumeric data cannot be visualized adequately. Examples of this type of data include trace files of computer performance information, the nucleotides in a genetic sequence, a record of stocks traded over a period of years, and the sequence of words in this document. The term categorical time series is defined and used to describe this family of data. When visualizations designed for numerical time series are applied to categorical time series, the distortions which result from the arbitrary conversion of unordered categorical values to totally ordered numerical values can be profound. Examples of this phenomenon are presented and explained. Several new, general purpose techniques for visualizing categorical time series data have been developed as part of this work and have been incorporated into the Chitra perfor- mance analysis and visualization system. All of these new visualizations can be produced in O(n) time. The new visualizations for categorical time series provide general purpose techniques for visualizing aspects of categorical data which are commonly of interest. These include periodicity, stationarity, cross-correlation, autocorrelation, and the detection of recurring patterns. The effective use of these visualizations is demonstrated in a number of application domains, including performance analysis, World Wide Web traffic analysis, network routing simulations, document comparison, pattern detection, and the analysis of the performance of genetic algorithms. / Ph. D. visualization categorical data time series data mining performance analysis information visualization
123	Enhancing NFL Game Insights: Leveraging XGBoost For Advanced Football Data Analytics To Quantify Multifaceted Aspects Of Gameplay Schoborg, Christopher P 01 January 2024 (has links) (PDF) XGBoost, renowned for its efficacy in various statistical domains, offers enhanced precision and efficiency. Its versatility extends to both regression and categorization tasks, rendering it a valuable asset in predictive modeling. In this dissertation, I aim to harness the power of XGBoost to forecast and rank performances within the National Football League (NFL). Specifically, my research focuses on predicting the next play in NFL games based on pre-snap data, optimizing the draft ranking process by integrating data from the NFL combine, and collegiate statistics, creating a player rating system that can be compared across all positions, and evaluating strategic decisions for NFL teams when crossing the 50-yard line, including the feasibility of attempting a first down conversion versus opting for a field goal attempt. NFL Analytics XGBoost Prediction Fourth Down Categorical Data Analysis Data Science
124	Étude et décontamination du transcriptome de novo du nématode doré Globodera rostochiensis Lafond Lapalme, Joël January 2016 (has links) Le nématode doré, Globodera rostochiensis, est un nématode phytoparasite qui peut infecter des plantes agricoles telles la pomme de terre, la tomate et l’aubergine. En raison des pertes de rendement considérables associées à cet organisme, il est justifiable de quarantaine dans plusieurs pays, dont le Canada. Les kystes du nématode doré protègent les œufs qu’ils contiennent, leur permettant de survivre (en état de dormance) jusqu’à 20 ans dans le sol. L’éclosion des œufs n’aura lieu qu’en présence d’exsudats racinaires d’une plante hôte compatible à proximité. Malheureusement, très peu de connaissances sont disponibles sur les mécanismes moléculaires liés à cette étape-clé du cycle vital du nématode doré. Dans cet ouvrage, nous avons utilisé la technique RNA-seq pour séquencer tous les ARNm d’un échantillon de kystes du nématode doré afin d’assembler un transcriptome de novo (sans référence) et d’identifier des gènes jouant un rôle dans les mécanismes de survie et d’éclosion. Cette méthode nous a permis de constater que les processus d’éclosion et de parasitisme sont étroitement reliés. Plusieurs effecteurs impliqués dans le mouvement vers la plante hôte et la pénétration de la racine sont induits dès que le kyste est hydraté (avant même le déclenchement de l’éclosion). Avec l’aide du génome de référence du nématode doré, nous avons pu constater que la majorité des transcrits du transcriptome ne provenaient pas du nématode doré. En effet, les kystes échantillonnés au champ peuvent contenir des contaminants (bactéries, champignons, etc.) sur leur paroi et même à l’intérieur du kyste. Ces contaminants seront donc séquencés et assemblés avec le transcriptome de novo. Ces transcrits augmentent la taille du transcriptome et induisent des erreurs lors des analyses post-assemblages. Les méthodes de décontamination actuelles utilisent des alignements sur des bases de données d’organismes connus pour identifier ces séquences provenant de contaminants. Ces méthodes sont efficaces lorsque le ou les contaminants sont connus (possède un génome de référence) comme la contamination humaine. Par contre, lorsque le ou les contaminants sont inconnus, ces méthodes deviennent insuffisantes pour produire un transcriptome décontaminé de qualité. Nous avons donc conçu une méthode qui utilise un algorithme de regroupement hiérarchique des séquences. Cette méthode produit, de façon récursive, des sous-groupes de séquences homogènes en fonction des patrons fréquents présents dans les séquences. Une fois les groupes créés, ils sont étiquetés comme contaminants ou non en fonction des résultats d’alignements du sous-groupe. Les séquences ambiguës ayant aucun ou plusieurs alignements différents sont donc facilement classées en fonction de l’étiquette de leur groupe. Notre méthode a été efficace pour décontaminer le transcriptome du nématode doré ainsi que d’autres cas de contamination. Cette méthode fonctionne pour décontaminer un transcriptome, mais nous avons aussi démontré qu’elle a le potentiel de décontaminer de courtes séquences brutes. Décontaminer directement les séquences brutes serait la méthode de décontamination optimale, car elle minimiserait les erreurs d’assemblage. Nématode doré Globodera rostochiensis Éclosion Transcriptome Assemblage de novo Gènes différentiellement exprimés Décontamination MCSC
125	The Strucplot Framework: Visualizing Multi-way Contingency Tables with vcd Hornik, Kurt, Zeileis, Achim, Meyer, David 10 1900 (has links) (PDF) This paper describes the "strucplot" framework for the visualization of multi-way contingency tables. Strucplot displays include hierarchical conditional plots such as mosaic, association, and sieve plots, and can be combined into more complex, specialized plots for visualizing conditional independence, GLMs, and the results of independence tests. The framework's modular design allows flexible customization of the plots' graphical appearance, including shading, labeling, spacing, and legend, by means of "graphical appearance control" functions. The framework is provided by the R package vcd.
126	Statistické usuzování v analýze kategoriálních dat / Statistical inference for categorical data analysis Kocáb, Jan January 2010 (has links) This thesis introduces statistical methods for categorical data. These methods are especially used in social sciences such as sociology, psychology and political science, but their importance has increased also in medical and technical sciences. In the first part there is mentioned statistical inference for a proportion. Here is written about classical, exact and Bayesian methods for estimating and hypothesis testing. If we have a large sample then we can approximate exact distribution by normal distribution but if we have a small sample cannot use this approximation and it is necessary to use discrete distribution which makes inference more complicated. The second part deals with two categorical variables analysis in contingency tables. Here are explained measures of association for 2 x 2 contingency tables such as difference of proportion and odds ratio and also presented how we can test independence in the case of large sample and small one. If we have small sample we are not allowed to use classical chi-squared tests and it is necessary to use alternative methods. This part contains variety of exact tests of independence and Bayesian approach for the 2 x 2 table too. In the end of this part there is written about a table for two dependent samples and we are interested whether two variables give identical results which occurs when marginal proportions are equal. In the last part there are methods used on data and discussed results.
127	O lixo do capital: uma crítica ao processo de reciclagem de materiais enquanto reposição crítica das categorias modernas / A critique of material recycling process as critical parts of modern categories Lacerda, Leonardo Mamede de 11 March 2015 (has links) Visando contribuir para o debate sobre a naturalização e autonomização dos processos sociais, esta dissertação tenta problematizar a reciclagem de materiais, principalmente, de latinhas de alumínio sob a perspectiva da crise de reprodução do capital. A crítica à consciência naturalizante, socialmente necessária à reposição dos pressupostos da produção mercantil, é uma carência que esta pesquisa tenta evidenciar. Nesse sentido, o intuito do texto é tentar revelar como a consciência moderna, portanto, fetichista, se relaciona com teorias e práticas que envolvem a reciclagem, consubstanciada pela assim chamada consciência ambiental e, fundamentalmente, pela afirmação positiva da categoria trabalho, escondendo sua crise e, por isso, repondo-o em níveis cada vez mais profundos da vida cotidiana. Assim, a análise da reciclagem nos ajudaria refletir sobre como uma atividade desempenhada por quase todos os níveis da sociedade, que muitas vezes é reconhecida como uma prática incriticável e que é capaz de repor, de forma intensivamente crítica, as categorias do capital. Para tanto, tentou-se analisar os elementos críticos da realização do capital financeiro no âmbito de uma grande empresa de reciclagem de alumínio e nas relações internas em uma cooperativa de catadores, ambos em Pindamonhangaba-SP, mas não só. Os dados levantados e as análises tentam contribuir, por fim, para o questionamento de uma dada razão que se apresenta como alternativa para a crise do trabalho: o empreendedorismo. Desse modo, esta pesquisa buscou, também, analisar a institucionalização da forma social empreendedora, que esconde a lógica de ficcionalização das relações sociais, ficção esta que põe em prática uma suposta reciclagem do capital produtivo. / In the intent of contributing to the discussion regarding naturalization and autonomization of social processes, this dissertation attempts to analyze recycling, especially recycling of aluminum cans, from the perspective of the capital reproduction crisis. The criticism of the naturalization consciousness socially necessary to allow the replenishment of the mercantile production, is an insufficiency this research aims to emphasize. In this sense, the goal of this paper is to attempt disclosing how modern consciousness, therefore fetishist, relates to theories and practices involving recycling, consolidated by the so called environmental awareness, and ultimately for the positive affirmation of the work category, hiding its crisis, and as a result, replacing it, in ever deeper levels of quotidian life. This way, the analysis of the recycling would help us to reflect on how an activity performed by almost every level of society, and that many times is viewed as a practice above criticism, and that is able to replenish, in an important way, categories of Capital. To do so, we tried to analyze the critical elements of attaining financial capital within a large recycler of aluminum and the internal relations in a cooperative of waste pickers, both in Pindamonhangaba-SP, but not only. The data collected and analyzes try to contribute, ultimately, to the questioning of a given reason that presents itself as an alternative to the crisis of work: entrepreneurship. This way, this research also attempted to analyze the institutionalization of the social entrepreneurship form, that masks the logic of the fictionalization of social relations, fiction that enables an alleged recycling of the productive capital. Categorical reproduction Reciclagem Recycling Reposição categorial Trabalho e crise Work and crisis
128	Modelos para dados categorizados ordinais com efeito aleatório: uma aplicação à análise sensorial / Models for ordinal categorical data with random effects: an application to the sensory analysis Fatoretto, Maíra Blumer 12 January 2016 (has links) Os modelos para dados categorizados ordinais são extensões dos Modelos Lineares Generalizados e suas suposições e inferências são fundamentadas por esta classe de modelos. Os Modelos de Logitos Cumulativos, em que a função de ligação é constituída de probabilidades acumuladas, são muito utilizados para este tipo de variável, sendo uma de suas simplificações, os Modelos de Chances Proporcionais, em que para todas as covaríaveis no modelo há um crescimento linear nas razões de chances, porém, neste caso, é necessária a verificação da suposição de paralelismo. Outros modelos como o Modelo de Chances Proporcionais Parciais, o Modelo de Categorias Adjacentes e o Modelo Logito de Razão Contínua também podem ser utilizados. Em diversos estudos deste tipo, é necessário a utilização de modelos mistos, seja pelo tipo de um fator ou a dependência entre observações da variável resposta. Objetivou-se, neste trabalho, o estudo de modelos para variável resposta ordinal com a inclusão de um ou mais efeitos aleatórios. Esses modelos são ilustrados com a utilização de dados reais de análise sensorial, cuja variável resposta é constituída de uma escala ordinal e deseja-se saber dentre duas variedades de tomates desidratados (Italiano e Sweet Grape), qual teve melhor aceitação pelos consumidores. Nesse experimento os provadores avaliaram uma única vez cada uma das variedades, sendo as repetições constituídas pelas avaliações dadas por diferentes provadores. Nesse caso, é necessária a inclusão de um efeito aleatório por provador, para que o modelo consiga capturar as diferenças entre esses provadores não treinados. O Modelo de Chances Proporcionais ajustou-se de maneira satisfatória aos dados, podendo-se fazer uso das estimativas de probabilidades e razões de chances para a interpretação dos resultados e concluindo-se que o sabor da variedade Sweet Grape foi o que mais agradou os provadores, independente do sexo. / Models for ordinal categorical data are extensions of the Generalized Linear Models and their assumptions and inferences are based on this class of models. The Cumulative Logit Models in wich the link function consists of accumulated probabilities are more used for this type of variable, with one of its simplifications are the Proportional Odds Model, in wich for all covariates in the model there is a linear growth in odds ratios, but in this case, checking the parallelism assumption is required. Other models such as the Partial Proportional Odds Model, the Adjacent-Categories Logits and Continuation-Ratio Logits model can also be used. In several of such studies, the use of mixed models is required, either by type of factor or dependence between the response variable observations. The aim of this work is studying models for ordinal variable response with the inclusion of one or more random effects. These models are illustrated by using real data of sensory analysis, the response variable consists of an ordinal scale and we want to know from two varieties of dried tomatoes, Italian and Sweet Grape, which had better acceptance by consumers. In this experiment, the panelists evaluated each variety once, and the repetitions constituted by the ratings given by different tasters. In this case, the inclusion of a random effect by taster is required so that the model can capture the difference between these untrained tasters. The Proportional Odds Model fitted satisfactorily to the data and it is possible to make use of the estimates of probabilities and odds ratios for the interpretation of results and concluding that the taste of the variety Sweet Grape was the one that most pleased the tasters regardless of sex. Categorical Data Cumulative Logit Models Dados categorizados Generalized Linear Mixed Models Modelos de logitos cumulativos Modelos lineares generalizados mistos
129	Modelos de regressão para variáveis categóricas ordinais com aplicações ao problema de classificação / Regression models for ordinal categorical variables with applications to the classification problem Okura, Roberta Irie Sumi 11 April 2008 (has links) Neste trabalho, apresentamos algumas metodologias para analisar dados que possuem variável resposta categórica ordinal. Descrevemos os principais Modelos de Regressão conhecidos atualmente que consideram a ordenação das categorias de resposta, entre eles: Modelos Cumulativos e Modelos Sequenciais. Discutimos também o problema de discriminação e classificação de elementos em grupos ordinais, comentando sobre os preditores mais comuns para dados desse tipo. Apresentamos ainda a técnica de Análise Discriminante Ótima e sua versão aprimorada, baseada na utilização de métodos bootstrap. Por fim, aplicamos algumas das técnicas descritas a dados reais da área financeira, com o intuito de classificar possíveis clientes, no momento da aquisição de um cartão de crédito, como futuros bons, médios ou maus pagadores. Para essa aplicação, discutimos as vantagens e desvantagens dos modelos utilizados em termos de qualidade da classificação. / In this work, some methods to analyse data with ordinal categorical response are presented. We describe the most important and widely used Regression Models which consider the ordering of response categories like: Cumulative Models and Sequential Models. We also discuss the problem of how to discriminate and classify elements in ordinal groups, commenting on the most common predictors to this kind of data. Also we present the technique known as optimal discriminant analysis and its improved version, based on the use of bootstrap methods. Finally, we apply some of the described techniques to real financial data, intending to classify possible consumers, on acquistion of a credit card, as high, medium and low risk customers. With this application, we discuss the advantages and disadvantages of the models used in terms of quality of classification. classificação classification discriminação ordinal modelos de regressão ordinais ordinal categorical variables ordinal discrimination ordinal regression models variáveis categóricas ordinais
130	Predições estatísticas para dados politômicos / Statistical predictions for polytomous data Requena, Guaraci de Lima 17 August 2018 (has links) Este trabalho generaliza a partição da distribuição de Bernoulli multivariada em distribuições de Bernoulli e como esta partição leva a um modelo de regressão e a um classificador para dados politômicos. Como ponto de partida, desejamos explicitar a função de ligação para os modelos de regressão multinomial e escrevê-la a partir de funções de distribuição, como feito no caso binomial, a fim de flexibilizá-la para além da logito usual. Para isso, estudamos as fatorações da Bernoulli multivariada em Bernoullis, bem como a multinomial em binomiais, a fim de explicitar como as funções de distribuição podem desempenhar um papel na ligação entre o espaço das covariáveis e o vetor de probabilidades. Basu & Pereira (1982) exploram tais fatorações em um problema de não resposta e Pereira & Stern (2008) as generalizam para uma classe de fatorações. Este trabalho propõe uma simplificação tanto da regressão multinomial - agregando a flexibilidade do caso binomial -, quanto da classificação politômica, no sentido de decompor o problema politômico em dicotômicos através da generalização da classe de fatorações. Um problema computacional surge pois tal classe pode ter um número muito grande de elementos distintos de acordo com o número de categorias e, assim, duas propostas são feitas para buscar uma que minimiza os riscos de classificação binomial envolvidos, passo-a-passo. A motivação para este trabalho é apresentada a fim de se estudar as performances de tais modelos de regressão e classificadores. Partimos de um problema da área médica, mais especificamente em transtorno obsessivo-compulsivo, em que desejamos classificar um indivíduo a fim de obter um fenótipo mais puro de tal transtorno e de modelá-lo a fim de buscar as covariáveis que estão relacionadas com tal fenótipo, a partir de um conjunto de dados reais. / This work explores a partition of the multivariate Bernoulli distribution in Bernoulli distributions and how this partition leads to a regression model and to a classifier for polytomous data. As starting point, we want to make explicit the link function for multinomial regression models and write it from distribution functions, as in the binomial case, in order to flexibilize it beyond the usual logit. For that, we study the factorizations of the multivariate Bernoulli in Bernoullis, as well as the multinomial in binomials, in order to make explicit as the distribution functions may play a role in the linkage between the space of covariates and the vector of probabilities. Basu and Pereira (1982) explore these factorizations in a nonresponse problem and Pereira and Stern (2008) generalize them to a class of factorizations. Thus, this work proposes a simplification of the multinomial regression - adding the flexibility from the binomial case -, and of the polytomous classification, decomposing de polytomous problem in dichotomous through the generalization of the class of factorizations. At this point, a computational problem arises because the amount of factorizations may be very large according to the number of categories and then we propose two approaches to seek a factorization that minimize the involved binomial classification risks, step-by-step. The motivation for this work is presented in order to study the performance of such regression models and classifiers. We start from a medical problem, more precisely in obsessive-compulsive disorder, in which we want to classify a patient in order to get a more pure phenotype of such disorder and model it in order to seek the related covariates, from a real dataset. Categorical data Classificação Classification Dados categóricos Factorization Fatoração Multinomial regression Obsessive-compulsive disorder Regressão multinomial Transtorno obsessivo-compulsivo

Search results