Global ETD Search

1	Statistical models and techniques for dendrochronology Jubock, Z. H. January 1988 (has links) No description available. 519.5 Tree ring data statistics
2	On testing for the Cox model using resampling methods Fang, Jing, 方婧 January 2007 (has links) published_or_final_version / abstract / Statistics and Actuarial Science / Master / Master of Philosophy Random data (Statistics) Survival analysis (Biometry) Gaussian processes.
3	On testing for the Cox model using resampling methods Fang, Jing, January 2007 (has links) Thesis (M. Phil.)--University of Hong Kong, 2008. / Also available in print.
4	Statistical Analysis of High-Dimensional Gene Expression Data Justin Zhu Unknown Date (has links) The use of diagnostic rules based on microarray gene expression data has received wide attention in bioinformatics research. In order to form diagnostic rules, statistical techniques are needed to form classifiers with estimates for their associated error rates, and to correct for any selection biases in the estimates. There are also the associated problems of identifying the genes most useful in making these predictions. Traditional statistical techniques require the number of samples to be much larger than the number of features. Gene expression datasets usually have a small number of samples, but a large number of features. In this thesis, some new techniques are developed, and traditional techniques are used innovatively after appropriate modification to analyse gene expression data. Classification: We first consider classifying tissue samples based on the gene expression data. We employ an external cross-validation with recursive feature elimination to provide classification error rates for tissue samples with different numbers of genes. The techniques are implemented as an R package BCC (Bias-Corrected Classification), and are applied to a number of real-world datasets. The results demonstrate that the error rates vary with different numbers of genes. For each dataset, there is usually an optimal number of genes that returns the lowest cross-validation error rate. Detecting Differentially Expressed Genes: We then consider the detection of genes that are differentially expressed in a given number of classes. As this problem concerns the selection of significant genes from a large pool of candidate genes, it needs to be carried out within the framework of multiple hypothesis testing. The focus is on the use of mixture models to handle the multiplicity issue. The mixture model approach provides a framework for the estimation of the prior probability that a gene is not differentially expressed. It estimates various error rates, including the FDR (False Discovery Rate) and the FNR (False Negative Rate). We also develop a method for selecting biomarker genes for classification, based on their repeatability among the highly differentially expressed genes in cross-validation trials. The latter method incorporates both gene selection and classification. Selection Bias: When forming a prediction rule on the basis of a small number of classified tissue samples, some form of feature (gene) selection is usually adopted. This is a necessary step if the number of features is high. As the subset of genes used in the final form of the rule has not been randomly selected but rather chosen according to some criteria designed to reflect the predictive power of the rule, there will be a selection bias inherent in estimates of the error rates of the rule if care is not taken. Various situations are presented where selection biases arise in the formation of a prediction rule and where there is a consequent need for the correction of the biases. Three types of selection biases are analysed: selection bias from not using external cross-validation, selection bias of not working with the full set of genes, and the selection bias from optimizing the classification error rate over a number of subsets obtained according to a selection method. Here we mostly employ the support vector machine with recursive feature elimination. This thesis includes a description of cross-validation schemes that are able to correct for these selection biases. Furthermore, we examine the bias incurred when using the predicted rather than the true outcomes to define the class labels in forming and evaluating the performance of the discriminant rule. Case Study: We present a case study using the breast cancer datasets. In the study, we compare the 70 highly differentially expressed genes proposed by van 't Veer and colleagues, against the set of the genes selected using our repeatability method. The results demonstrate that there is more than one set of biomarker genes. We also examine the selection biases that may exist when analysing this dataset. The selection biases are demonstrated to be substantial.
5	Estimação de maxima verossimilhança para processo de nascimento puro espaço-temporal com dados parcialmente observados / Maximum likelihood estimation for space-time pu birth process with missing data Goto, Daniela Bento Fonsechi 09 October 2008 (has links) Orientador: Nancy Lopes Garcia / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matematica, Estatistica e Computação Cientifica / Made available in DSpace on 2018-08-11T16:45:43Z (GMT). No. of bitstreams: 1 Goto_DanielaBentoFonsechi_M.pdf: 3513260 bytes, checksum: ff6f9e35005ad9015007d1f51ee722c1 (MD5) Previous issue date: 2008 / Resumo: O objetivo desta dissertação é estudar estimação de máxima verossimilhança para processos de nascimento puro espacial para dois diferentes tipos de amostragem: a) quando há observação permanente em um intervalo [0, T]; b) quando o processo é observado após um tempo T fixo. No caso b) não se conhece o tempo de nascimento dos pontos, somente sua localização (dados faltantes). A função de verossimilhança pode ser escrita para o processo de nascimento puro não homogêneo em um conjunto compacto através do método da projeção descrito por Garcia and Kurtz (2008), como projeção da função de verossimilhança. A verossimilhança projetada pode ser interpretada como uma esperança e métodos de Monte Carlo podem ser utilizados para estimar os parâmetros. Resultados sobre convergência quase-certa e em distribuição são obtidos para a aproximação do estimador de máxima verossimilhança. Estudos de simulação mostram que as aproximações são adequadas. / Abstract: The goal of this work is to study the maximum likelihood estimation of a spatial pure birth process under two different sampling schemes: a) permanent observation in a fixed time interval [0, T]; b) observation of the process only after a fixed time T. Under scheme b) we don't know the birth times, we have a problem of missing variables. We can write the likelihood function for the nonhomogeneous pure birth process on a compact set through the method of projection described by Garcia and Kurtz (2008), as the projection of the likelihood function. The fact that the projected likelihood can be interpreted as an expectation suggests that Monte Carlo methods can be used to compute estimators. Results of convergence almost surely and in distribution are obtained for the aproximants to the maximum likelihood estimator. Simulation studies show that the approximants are appropriate. / Mestrado / Inferencia em Processos Estocasticos / Mestre em Estatística Método de projeção Estimador de máxima verossimilhança Dados faltantes (Estatística) Projection method Maximum likelihood estimation Missing data (Statistics)
6	Modeling Random Events Quintos Lima, Alejandra January 2022 (has links) In this thesis, we address two types of modeling of random events. The first one, contained in Chapters 2 and 3, is related to the modeling of dependent stopping times. In Chapter 2, we use a modified Cox construction, along with a modification of the bivariate exponential introduced by Marshall & Olkin (1967), to create a family of stopping times, which are not necessarily conditionally independent, allowing for a positive probability for them to be equal. We also present a series of results exploring the special properties of this construction, along with some generalizations and possible applications. In Chapter 3, we present a detailed application of our model to Credit Risk theory. We propose a new measure of systemic risk that is consistent with the economic theories relating to the causes of financial market failures and can be estimated using existing hazard rate methodologies, and hence, it is simple to estimate and interpret. We do this by characterizing the probability of a market failure which is defined as the default of two or more globally systemically important banks (G-SIBs) in a small interval of time. We derive various theorems related to market failure probabilities, such as the probability of a catastrophic market failure, the impact of increasing the number of G-SIBs in an economy, and the impact of changing the initial conditions of the economy's state variables. The second type of random events we focus on is the failure of a group in the context of microlending, which is a loan made by a bank to a small group of people without credit histories. Since the creation of this mechanism by Muhammed Yunus, it has received a fair amount of academic attention. However, one of the issues not yet addressed in full detail is the issue of the size of the group. In Chapter 4, we propose a model with interacting forces to find the optimal group size. We define "optimal" as that group size that minimizes the probability of default of the group. Ultimately, we show that the original choice of Muhammad Yunus, of a group size of five people, is, under the right, and, we believe, reasonable hypotheses, either close to optimal, or even at times exactly optimal, i.e., the optimal group size is indeed five people. Mathematical models Statistics Random data (Statistics) Microfinance Markets--Forecasting Financial crises--Mathematical models
7	Quantile based estimation of treatment effects in censored data Crotty, Nicholas Paul 27 May 2013 (has links) M.Sc. (Mathematical Statistics) / Comparison of two distributions via use of the quantile comparison function is carried out specifically from possibly censored data. A semi-parametric method which assumes linearity of the quantile comparison function is examined thoroughly for non-censored data and then extended to incorporate censored data. A fully nonparametric method to construct confidence bands for the quantile comparison function is set out. The performance of all methods examined is tested using Monte Carlo Simulation. Monte Carlo method Estimation theory Censored observations (Statistics) Censored data (Statistics) Regression analysis
8	Boclusterização na análise de dados incertos / Biclustering on uncertais data analysis França, Fabricio Olivetti de 17 August 2018 (has links) Orientador: Fernando Jose Von Zuben / Tese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Eletrica e de Computação / Made available in DSpace on 2018-08-17T09:17:50Z (GMT). No. of bitstreams: 1 Franca_FabricioOlivettide_D.pdf: 3983253 bytes, checksum: 6b0d30018574ad5a6e0cce05c34606b8 (MD5) Previous issue date: 2010 / Resumo: O processo de aquisição de dados está sujeito a muitas fontes de incerteza e inconsistência. Essas incertezas podem fazer com que os dados se tornem ruidosos ou impedir a aquisição dos mesmos, gerando o problema de dados faltantes. A maioria das ferramentas utilizadas para tratar tais problemas age de forma global em relação às informações da base de dados e ignora o efeito que o ruído pode ter na análise desses. Esta tese tem como objetivo explorar as propriedades do processo de biclusterização, que faz uma análise local dos dados, criando múltiplos modelos de imputação de dados que buscam minimizar o erro de predição dos valores faltantes na base de dados. Primeiramente, é proposto um novo algoritmo de biclusterização com um melhor desempenho que outras abordagens utilizadas atualmente, enfatizando a capacidade dos biclusters em gerar modelos com ruído reduzido. Em seguida, é proposta uma formulação de otimização quadrática para, utilizando os modelos locais gerados pelo bicluster, imputar os valores faltantes na base de dados. Os resultados obtidos indicam que a utilização da biclusterização ajuda a reduzir o erro de predição da imputação, além de fornecer condições favoráveis a uma análise a posteriori das informações contidas nos dados / Abstract: The data acquisition process is subject to many inconsistencies and uncertainties. These uncertainties may produce noisy data or even provoke the absence of some of them, thus leading to the missing data problem. Most procedures used to deal with such problem act in a global manner, relatively to the dataset, and ignore the noise e_ect on such analysis. The objective of this thesis is to explore the properties of the so called biclustering method, which performs a local data analysis, creating several imputation models for the dataset in order to minimize the prediction error estimating missing values of the dataset. First, it is proposed a new biclustering algorithm with a better performance than the one produced by other traditional approaches, with emphasis on the noise reduction capability of the models generated by the biclusters. Next, it is proposed the formulation of a quadratic optimization problem to impute the missing data by means of the local models engendered by a set of biclusters. The obtained results show that the use of biclustering helps to reduce the prediction error of data imputation, besides providing some interesting conditions for an a posteriori analysis of the dataset / Doutorado / Engenharia de Computação / Doutor em Engenharia Elétrica Aprendizado de máquina Dados faltantes (Estatística) Cluster Mineração de dados (Computação) Algoritmos evolutivos Computer training Missing data (Statistics) Cluster Data mining (Computer) Evolutionary algorithms
9	Tratamento de dados faltantes empregando biclusterização com imputação múltipla / Treatment of missing data using biclustering with multiple imputation Veroneze, Rosana, 1982- 18 August 2018 (has links) Orientadores: Fernando José Von Zuben, Fabrício Olivetti de França. / Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de Computação / Made available in DSpace on 2018-08-18T15:42:38Z (GMT). No. of bitstreams: 1 Veroneze_Rosana_M.pdf: 1996086 bytes, checksum: d4be557c3ffb4512e37232c537c78721 (MD5) Previous issue date: 2011 / Resumo: As respostas fornecidas por sistemas de recomendação podem ser interpretadas como dados faltantes a serem imputados a partir do conhecimento dos dados presentes e de sua relação com os dados faltantes. Existem variadas técnicas de imputação de dados faltantes, sendo que o emprego de imputação múltipla será considerado neste trabalho. Também existem propostas alternativas para se chegar à imputação múltipla, sendo que se propõe aqui a biclusterização como uma estratégia eficaz, flexível e com desempenho promissor. Para tanto, primeiramente é realizada a análise de sensibilidade paramétrica do algoritmo SwarmBcluster, recentemente proposto para a tarefa de biclusterização e já adaptado, na literatura, para a realização de imputação única. Essa análise mostrou que a escolha correta dos parâmetros pode melhorar o desempenho do algoritmo. Em seguida, o SwarmBcluster é estendido para a implementação de imputação múltipla, sendo comparado com o bem-conhecido algoritmo NORM. A qualidade dos resultados obtidos é mensurada através de métricas diversas, as quais mostram que a biclusterização conduz a imputações múltiplas de melhor qualidade na maioria dos experimentos / Abstract: The answers provided by recommender systems can be interpreted as missing data to be imputed considering the knowledge associated with the available data and the relation between the available and the missing data. There is a wide range of techniques for data imputation, and this work is concerned with multiple imputation. Alternative approaches for multiple imputation have already been proposed, and this work takes biclustering as an effective, flexible and promising strategy. To this end, firstly it is performed a parameter sensitivity analysis of the SwarmBcluster algorithm, recently proposed to implement biclustering and already adapted, in the literature, to accomplish single imputation of missing data. This analysis has indicated that a proper choice of parameters may significantly improve the performance of the algorithm. Secondly, SwarmBcluster was extended to implement multiple imputation, being compared with the well-known NORM algorithm. The quality of the obtained results is computed considering diverse metrics, which reveal that biclustering guides to imputations of better quality in the majority of the experiments / Mestrado / Engenharia de Computação / Mestre em Engenharia Elétrica Dados faltantes (Estatística) Sistemas de recomendação Cluster Algoritmos evolutivos Mineração de dados (Computação) Missing data (Statistics) Recommender systems Cluster Evolutionary algorithms Data mining
10	Características do clima de Uberlândia-MG: análise da temperatura, precipitação e umidade relativa / Climate characteristics of Uberlândia-MG: analisys of temperature, rainfall and relative humidity Petrucci, Eduardo 08 February 2018 (has links) CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / O objetivo deste trabalho é analisar o comportamento das variáveis temperatura, umidade relativa do ar e precipitação da cidade de Uberlândia/MG. Foram utilizados dados diários das referidas variáveis, registrados pela Estação Meteorológica Convencional de Uberlândia, n° 83.257. No tratamento inicial, foi realizada organização e tabulação dos dados, tendo-se feito a validação e preenchimento de falhas. Posteriormente foi realizada estatística descritiva dos dados: valores mensais, anuais e quinquenais da temperatura média (máximo, média e mínimo), temperatura máxima (máxima absoluta e média), temperatura mínima (mínimo absoluto e média), umidade relativa do ar média (máximo, média e mínima), umidade relativa máxima (máxima e média) e umidade relativa mínima (mínimo e média) e as precipitações máximas diárias e totais anuais. Análise de frequência utilizando o método da Curva de Permanência permitiu identificar variações inter-quinquenais. Na distribuição de probabilidades de chuvas e construção do I-D-F e da Equação de Chuvas Intensas para a cidade os dados foram ajustados com a função densidade de probabilidade de Gumbel. Os resultados apontam que há tendências positivas (aumento) nos valores de Temperatura Máxima absoluta, com média do período de 35,9°C e amplitude de 3,1°C, e valores médios das décadas de 1980, 34,4°C, 1990, 35,5°C, 2000, 36,1°C e 2010 com 37,5°C; Temperatura Mínima, com média do período 7,3°C e amplitude de 3,6°C, e valores médios das décadas de 1980, 5,4°C, 1990, 6,7°C, 2000, 7,9°C e 2010 com 9°C; Temperatura Média, com média do período de 22,6°C e amplitude de 1,4°C, e valores médios das décadas de 1980, 22°C, 1990, 22,6°C, 2000, 22,8°C e 2010 com 23,4°C. Tendência negativa (redução) nos valores de Umidade Relativa Mínima, com média do período de 34% e amplitude de 10,8%, e valores médios das décadas de 1980 com 39,6%, 1990 com 34,4%, 2000 com 32,4% e 2010 com 28,8; Umidade Relativa Máxima sensível redução de 1,1%; e Umidade relativa Média, com média do período de 68% e amplitude de 5%, valores médios das décadas de 1980 com 70%, 1990 com 69%, 2000 com 68% e 2010 com 65%. Para a precipitação, nos últimos anos tem se registrado chuvas abaixo da média do período histórico que é de 1487 mm, e médias das décadas de 1980 de 1593 mm, 1990 com 1490 mm, 2000 com 1560 mm e 2010 com 1269 mm. A partir da década de 2010 foram intensificadas as sequências de dias sem chuva na estação chuvosa e sequências de dias sem chuva cada vez maiores. A partir do cálculo de equação de chuvas intensas e curvas I-D-F, foram encontrados os valores da constante de regressão “a” =330,4083, coeficiente de regressão “b” 0,1452 e média dos coeficientes de regressão para todos os retornos “c” c= -0,6164, resultando na Equação de Chuvas Intensas: I= (330,4083 x T0,1452)/t0,6164. Pelo gráfico I-D-F, são esperadas chuvas mais intensas nas primeiras horas de duração para períodos de retornos mais longos, por exemplo, para retorno de 100 anos, são esperadas chuvas com intensidade de 109 mm/h nos primeiros 15 minutos de duração do evento. / The objective of this work is to analyze the behavior of the varying temperatures, relative humidity and precipitation of the city of Uberlândia / MG. Daily data of these variables were used, recorded by the Uberlândia Conventional Weather Station, No. 83.257. In the initial steps, the data were organized and tabulated, with validation and filling in of the faults realized. Subsequently, a descriptive statistical analysis was performed: monthly, annual and quinquennial values of the mean temperature (maximum, average and minimum), maximum temperature (absolute maximum and average), minimum temperature (absolute minimum and average), relative humidity (maximum and average) and minimum relative humidity (minimum and average), and the maximum in 24 hours and total annual precipitation. Frequency analysis using the Permanence Curve method allowed the identification of inter-quinquennial variations. In the distribution of rainfall probabilities and construction of the I-D-F curves and the Intense Rainfall Equation for the city, the data were adjusted with Gumbel’s probability density function. The results indicate that there are positive trends (increase) in absolute maximum temperature values, with a mean of 35.9 ° C and a amplitude of 3.1 ° C, and mean values of the 1980s, 34.4 ° C, 1990, 35.5 ° C, 2000, 36.1 ° C and 2010 with 37.5 ° C; Minimum temperature, with a mean temperature of this period of 7.3 ° C and amplitude of 3.6°, and mean values of the 1980s , 5.4 ° C, 1990, 6.7 ° C, 2000, 7.9 ° C, and 2010 with 9°C; Mean temperature, with a mean in this period of 22,6°C and a amplitude of 1,4°C, and mean values of the 1980s, 22°C, 1990, 22,6°C, 2000, 22,8°C and 2010 with 23,4°C. Negative trend (reduction) in the values of Minimum Relative Humidity, with an average of 34% and amplitude of 10.8%, and average values of the 1980s with 39.6%, 1990 with 34.4%, 2000 with 32,4% and 2010 with 28.8%; Maximum Relative Humidity with a sensitive reduction of 1.1%; and Average Relative Humidity, with mean of 68% and amplitude of 5%, average values of the 1980s with 70%, 1990 with 69%, 2000 with 68% and 2010 with 65%. Concerning precipitation, in the last years rainfall has been registered below the average of the historical period that is of 1487 mm, and mean values of the decades of 1980 of 1593 mm, 1990 with 1490 mm, 2000 with 1560 mm and 2010 with 1269 mm. From the decade of 2010 the sequences of days without rainfall in the rainy season and sequences of days without rainfall were intensified. From the calculation of the intense rainfall equation and IDF curves, the values of the regression constant "a" = 330.4083, regression coefficient "b" 0.1452 and mean of the regression coefficients for all returns "c" "C = -0.6164, resulting in the Intense Rainfall Equation: I = (330.4083 x T0.1452) / t0.6164. Through the I-D-F graph, more intense rains are expected in the first few hours of duration for periods of longer returns, for example, for 100-year returns, rainfall of 109 mm / h is expected in the first 15 minutes of the event. / Dissertação (Mestrado) CNPQ::CIENCIAS HUMANAS::GEOGRAFIA Estatística de dados Curva de Permanência Curvas I-D-F Equação de chuvas intensas Uberlândia-MG Data statistics Permanence Curve I-D-F curves Intense Rainfall Equation

Search results