Global ETD Search

171	Inference of nonparametric hypothesis testing on high dimensional longitudinal data and its application in DNA copy number variation and micro array data analysis Zhang, Ke January 1900 (has links) Doctor of Philosophy / Department of Statistics / Haiyan Wang / High throughput screening technologies have generated a huge amount of biological data in the last ten years. With the easy availability of array technology, researchers started to investigate biological mechanisms using experiments with more sophisticated designs that pose novel challenges to statistical analysis. We provide theory for robust statistical tests in three flexible models. In the first model, we consider the hypothesis testing problems when there are a large number of variables observed repeatedly over time. A potential application is in tumor genomics where an array comparative genome hybridization (aCGH) study will be used to detect progressive DNA copy number changes in tumor development. In the second model, we consider hypothesis testing theory in a longitudinal microarray study when there are multiple treatments or experimental conditions. The tests developed can be used to detect treatment effects for a large group of genes and discover genes that respond to treatment over time. In the third model, we address a hypothesis testing problem that could arise when array data from different sources are to be integrated. We perform statistical tests by assuming a nested design. In all models, robust test statistics were constructed based on moment methods allowing unbalanced design and arbitrary heteroscedasticity. The limiting distributions were derived under the nonclassical setting when the number of probes is large. The test statistics are not targeted at a single probe. Instead, we are interested in testing for a selected set of probes simultaneously. Simulation studies were carried out to compare the proposed methods with some traditional tests using linear mixed-effects models and generalized estimating equations. Interesting results obtained with the proposed theory in two cancer genomic studies suggest that the new methods are promising for a wide range of biological applications with longitudinal arrays. high dimensional data longitudinal analysis nonparametric inference hypothesis testing DNA copy number variation Biology, Biostatistics (0308) Statistics (0463)
172	Algoritmiese rangordebepaling van akademiese tydskrifte Strydom, Machteld Christina 31 October 2007 (has links) Opsomming Daar bestaan 'n behoefte aan 'n objektiewe maatstaf om die gehalte van akademiese publikasies te bepaal en te vergelyk. Hierdie navorsing het die invloed of reaksie wat deur 'n publikasie gegenereer is uit verwysingsdata bepaal. Daar is van 'n iteratiewe algoritme gebruik gemaak wat gewigte aan verwysings toeken. In die Internetomgewing word hierdie benadering reeds met groot sukses toegepas deur onder andere die PageRank-algoritme van die Google soekenjin. Hierdie en ander algoritmes in die Internetomgewing is bestudeer om 'n algoritme vir akademiese artikels te ontwerp. Daar is op 'n variasie van die PageRank-algoritme besluit wat 'n Invloedwaarde bepaal. Die algoritme is op gevallestudies getoets. Die empiriese studie dui daarop dat hierdie variasie spesialisnavorsers se intu¨ıtiewe gevoel beter weergee as net die blote tel van verwysings. Abstract Ranking of journals are often used as an indicator of quality, and is extensively used as a mechanism for determining promotion and funding. This research studied ways of extracting the impact, or influence, of a journal from citation data, using an iterative process that allocates a weight to the source of a citation. After evaluating and discussing the characteristics that influence quality and importance of research with specialist researchers, a measure called the Influence factor was introduced, emulating the PageRankalgorithm used by Google to rank web pages. The Influence factor can be seen as a measure of the reaction that was generated by a publication, based on the number of scientists who read and cited itA good correlation between the rankings produced by the Influence factor and that given by specialist researchers were found. / Mathematical Sciences / M.Sc. (Operasionele Navorsing) Impact factor Influence factor Citation data PageRank Academic publications PageRank 519.5 Dissertations, Academic Ranking and selection (Statistics) Statistical hypothesis testing
173	Neighborhood-Oriented feature selection and classification of Duke’s stages on colorectal Cancer using high density genomic data. Peng, Liang January 1900 (has links) Master of Science / Department of Statistics / Haiyan Wang / The selection of relevant genes for classification of phenotypes for diseases with gene expression data have been extensively studied. Previously, most relevant gene selection was conducted on individual gene with limited sample size. Modern technology makes it possible to obtain microarray data with higher resolution of the chromosomes. Considering gene sets on an entire block of a chromosome rather than individual gene could help to reveal important connection of relevant genes with the disease phenotypes. In this report, we consider feature selection and classification while taking into account of the spatial location of probe sets in classification of Duke’s stages B and C using DNA copy number data or gene expression data from colorectal cancers. A novel method was presented for feature selection in this report. A chromosome was first partitioned into blocks after the probe sets were aligned along their chromosome locations. Then a test of interaction between Duke’s stage and probe sets was conducted on each block of probe sets to select significant blocks. For each significant block, a new multiple comparison procedure was carried out to identify truly relevant probe sets while preserving the neighborhood location information of the probe sets. Support Vector Machine (SVM) and K-Nearest Neighbor (KNN) classification using the selected final probe sets was conducted for all samples. Leave-One-Out Cross Validation (LOOCV) estimate of accuracy is reported as an evaluation of selected features. We applied the method on two large data sets, each containing more than 50,000 features. Excellent classification accuracy was achieved by the proposed procedure along with SVM or KNN for both data sets even though classification of prognosis stages (Duke’s stages B and C) is much more difficult than that for the normal or tumor types. Feature selection Classification Hypothesis testing Cross validation Multiple comparison Genomic data Bioinformatics (0715) Computer Science (0984) Statistics (0463)
174	A Bayesian nonparametric approach for the two-sample problem / Uma abordagem bayesiana não paramétrica para o problema de duas amostras Console, Rafael de Carvalho Ceregatti de 19 November 2018 (has links) In this work, we discuss the so-called two-sample problem Pearson and Neyman (1930) assuming a nonparametric Bayesian approach. Considering X1; : : : ; Xn and Y1; : : : ; Ym two independent i.i.d samples generated from P1 and P2, respectively, the two-sample problem consists in deciding if P1 and P2 are equal. Assuming a nonparametric prior, we propose an evidence index for the null hypothesis H0 : P1 = P2 based on the posterior distribution of the distance d (P1; P2) between P1 and P2. This evidence index has easy computation, intuitive interpretation and can also be justified in the Bayesian decision-theoretic context. Further, in a Monte Carlo simulation study, our method presented good performance when compared with the well known Kolmogorov- Smirnov test, the Wilcoxon test as well as a recent testing procedure based on Polya tree process proposed by Holmes (HOLMES et al., 2015). Finally, we applied our method to a data set about scale measurements of three different groups of patients submitted to a questionnaire for Alzheimer\'s disease diagnostic. / Neste trabalho, discutimos o problema conhecido como problema de duas amostras Pearson and Neyman (1930) utilizando uma abordagem bayesiana não-paramétrica. Considere X1; : : : ; Xn and Y1; : : : ;Ym duas amostras independentes, geradas por P1 e P2, respectivamente, o problema de duas amostras consiste em decidir se P1 e P2 são iguais. Assumindo uma priori não-paramétrica, propomos um índice de evidência para a hipótese nula H0 : P1 = P2 baseado na distribuição a posteriori da distância d (P1; P2) entre P1 e P2. O índice de evidência é de fácil implementação, tem uma interpretação intuitiva e também pode ser justificada no contexto da teoria da decisão bayesiana. Além disso, em um estudo de simulação de Monte Carlo, nosso método apresentou bom desempenho quando comparado com o teste de Kolmogorov-Smirnov, com o teste de Wilcoxon e com o método de Holmes. Finalmente, aplicamos nosso método em um conjunto de dados sobre medidas de escala de três grupos diferentes de pacientes submetidos a um questionário para diagnóstico de doença de Alzheimer. Bayesian nonparametrics Bayesiano Não-paramétrico Dirichlet process Hypothesis testing Problema de Duas Amostras Processo de Dirichlet Teste de Hipótese Two-sample problem
175	Sensitivitet vid mammografi och tomosyntes undersökningar Selaci, Albert, Sjöqvist, Hanna January 2019 (has links) Bröst består av mjölkkörtlar, subkutant fett och bindväv. Det finns också kärl och lymfa i brösten. Både män och kvinnor har bröst. Olika sjukdomar kan drabba brösten av benigna och maligna slag. Den mest använda undersökningsmetoden för att upptäcka bröstcancer är mammografi. Vid ytterligare undersökning av brösten kan digital bröst-tomosyntes (DBT) förekomma. DBT är en sorts begränsad vinkel-tomografi som producerar bilder på brösten i sektioner. Åsikter om DBT är motstridiga, en del studier säger att tomosyntes är bättre än mammografi gällande sensitivitet och andra säger att det är sämre eller ekvivalent. För att få kunskap om tomosyntes, mammografi och vad som skiljer i sensitivitet krävs det en sammanfattning av olika studier. Syftet med studien är att jämföra sensitivitet vid bröstundersökningar inom mammografi och tomosyntes. Via en systematisk litteraturstudie sammanfattas ett resultat utifrån kvantitativa artiklar som kvalitetsgranskas och analyseras. Arbetet har genomgått en etisk egengranskning. Resultatet skapades via hypotesprövning och SPSS och de påvisar att det finns en signifikant skillnad i sensitivitet mellan DBT och mammografi vilket innebär att DBT har högre sensitivitet sett till medelvärde och median. Breast clinical mammography digital breast tomosynthesis hypothesis testing Bröst klinisk mammografi digital bröst-tomosyntes hypotesprövning Medical and Health Sciences Medicin och hälsovetenskap
176	Análise Bayesiana de modelos de mistura finita com dados censurados / Bayesian analysis of finite mixture models with censored data Melo, Brian Alvarez Ribeiro de 21 February 2017 (has links) Misturas finitas são modelos paramétricos altamente flexíveis, capazes de descrever diferentes características dos dados em vários contextos, especialmente na análise de dados heterogêneos (Marin, 2005). Geralmente, nos modelos de mistura finita, todas as componentes pertencem à mesma família paramétrica e são diferenciadas apenas pelo vetor de parâmetros associado a essas componentes. Neste trabalho, propomos um novo modelo de mistura finita, capaz de acomodar observações censuradas, no qual as componentes são as densidades das distribuições Gama, Lognormal e Weibull (mistura GLW). Essas densidades são reparametrizadas, sendo reescritas em função da média e da variância, uma vez que estas quantidades são mais difundidas em diversas áreas de estudo. Assim, construímos o modelo GLW e desenvolvemos a análise de tal modelo sob a perspectiva bayesiana de inferência. Essa análise inclui a estimação, através de métodos de simulação, dos parâmetros de interesse em cenários com censura e com fração de cura, a construção de testes de hipóteses para avaliar efeitos de covariáveis e pesos da mistura, o cálculo de medidas para comparação de diferentes modelos e estimação da distribuição preditiva de novas observações. Através de um estudo de simulação, avaliamos a capacidade da mistura GLW em recuperar a distribuição original dos tempos de falha utilizando testes de hipóteses e estimativas do modelo. Os modelos desenvolvidos também foram aplicados no estudo do tempo de seguimento de pacientes com insuficiência cardíaca do Instituto do Coração da Faculdade de Medicina da Universidade de São Paulo. Nesta aplicação, os resultados mostram uma melhor adequação dos modelos de mistura em relação à utilização de apenas uma distribuição na modelagem dos tempos de seguimentos. Por fim, desenvolvemos um pacote para o ajuste dos modelos apresentados no software R. / Finite mixtures are highly flexible parametric models capable of describing different data features and are widely considered in many contexts, especially in the analysis of heterogeneous data (Marin, 2005). Generally, in finite mixture models, all the components belong to the same parametric family and are only distinguished by the associated parameter vector. In this thesis, we propose a new finite mixture model, capable of handling censored observations, in which the components are the densities from the Gama, Lognormal and Weibull distributions (the GLW finite mixture). These densities are rewritten in such a way that the mean and the variance are the parameters, since the interpretation of such quantities is widespread in various areas of study. In short, we constructed the GLW model and developed its analysis under the bayesian perspective of inference considering scenarios with censorship and cure rate. This analysis includes the parameter estimation, wich is made through simulation methods, construction of hypothesis testing to evaluate covariate effects and to assess the values of the mixture weights, computatution of model adequability measures, which are used to compare different models and estimation of the predictive distribution for new observations. In a simulation study, we evaluated the feasibility of the GLW mixture to recover the original distribution of failure times using hypothesis testing and some model estimated quantities as criteria for selecting the correct distribution. The models developed were applied in the study of the follow-up time of patients with heart failure from the Heart Institute of the University of Sao Paulo Medical School. In this application, results show a better fit of mixture models, in relation to the use of only one distribution in the modeling of the failure times. Finally, we developed a package for the adjustment of the presented models in software R. Análise de sobrevivência Bayesian inference Finite mixtures Hypothesis testing Inferência bayesiana Mistura finita Model selection Seleção de modelos Survival analysis Teste de hipóteses
177	The work-leisure relationship among working youth-centre members: implication for program planning with a test of the 'compensatory hypothesis'. January 1987 (has links) by Ho Kam Wan. / Thesis (M.S.W.)--Chinese University of Hong Kong, 1987. / Bibliography: leaves 158-167. Youth--Recreation Youth--China--Hong Kong--Recreation Youth centers Youth centers--China--Hong Kong Work--Psychological aspects Statistical hypothesis testing
178	Developing Criteria for Extracting Principal Components and Assessing Multiple Significance Tests in Knowledge Discovery Applications Keeling, Kellie Bliss 08 1900 (has links) With advances in computer technology, organizations are able to store large amounts of data in data warehouses. There are two fundamental issues researchers must address: the dimensionality of data and the interpretation of multiple statistical tests. The first issue addressed by this research is the determination of the number of components to retain in principal components analysis. This research establishes regression, asymptotic theory, and neural network approaches for estimating mean and 95th percentile eigenvalues for implementing Horn's parallel analysis procedure for retaining components. Certain methods perform better for specific combinations of sample size and numbers of variables. The adjusted normal order statistic estimator (ANOSE), an asymptotic procedure, performs the best overall. Future research is warranted on combining methods to increase accuracy. The second issue involves interpreting multiple statistical tests. This study uses simulation to show that Parker and Rothenberg's technique using a density function with a mixture of betas to model p-values is viable for p-values from central and non-central t distributions. The simulation study shows that final estimates obtained in the proposed mixture approach reliably estimate the true proportion of the distributions associated with the null and nonnull hypotheses. Modeling the density of p-values allows for better control of the true experimentwise error rate and is used to provide insight into grouping hypothesis tests for clustering purposes. Future research will expand the simulation to include p-values generated from additional distributions. The techniques presented are applied to data from Lake Texoma where the size of the database and the number of hypotheses of interest call for nontraditional data mining techniques. The issue is to determine if information technology can be used to monitor the chlorophyll levels in the lake as chloride is removed upstream. A relationship established between chlorophyll and the energy reflectance, which can be measured by satellites, enables more comprehensive and frequent monitoring. The results have both economic and political ramifications. Principal components analysis. Statistical hypothesis testing. information technology data mining Lake Texoma
179	Análise Bayesiana de modelos de mistura finita com dados censurados / Bayesian analysis of finite mixture models with censored data Brian Alvarez Ribeiro de Melo 21 February 2017 (has links) Misturas finitas são modelos paramétricos altamente flexíveis, capazes de descrever diferentes características dos dados em vários contextos, especialmente na análise de dados heterogêneos (Marin, 2005). Geralmente, nos modelos de mistura finita, todas as componentes pertencem à mesma família paramétrica e são diferenciadas apenas pelo vetor de parâmetros associado a essas componentes. Neste trabalho, propomos um novo modelo de mistura finita, capaz de acomodar observações censuradas, no qual as componentes são as densidades das distribuições Gama, Lognormal e Weibull (mistura GLW). Essas densidades são reparametrizadas, sendo reescritas em função da média e da variância, uma vez que estas quantidades são mais difundidas em diversas áreas de estudo. Assim, construímos o modelo GLW e desenvolvemos a análise de tal modelo sob a perspectiva bayesiana de inferência. Essa análise inclui a estimação, através de métodos de simulação, dos parâmetros de interesse em cenários com censura e com fração de cura, a construção de testes de hipóteses para avaliar efeitos de covariáveis e pesos da mistura, o cálculo de medidas para comparação de diferentes modelos e estimação da distribuição preditiva de novas observações. Através de um estudo de simulação, avaliamos a capacidade da mistura GLW em recuperar a distribuição original dos tempos de falha utilizando testes de hipóteses e estimativas do modelo. Os modelos desenvolvidos também foram aplicados no estudo do tempo de seguimento de pacientes com insuficiência cardíaca do Instituto do Coração da Faculdade de Medicina da Universidade de São Paulo. Nesta aplicação, os resultados mostram uma melhor adequação dos modelos de mistura em relação à utilização de apenas uma distribuição na modelagem dos tempos de seguimentos. Por fim, desenvolvemos um pacote para o ajuste dos modelos apresentados no software R. / Finite mixtures are highly flexible parametric models capable of describing different data features and are widely considered in many contexts, especially in the analysis of heterogeneous data (Marin, 2005). Generally, in finite mixture models, all the components belong to the same parametric family and are only distinguished by the associated parameter vector. In this thesis, we propose a new finite mixture model, capable of handling censored observations, in which the components are the densities from the Gama, Lognormal and Weibull distributions (the GLW finite mixture). These densities are rewritten in such a way that the mean and the variance are the parameters, since the interpretation of such quantities is widespread in various areas of study. In short, we constructed the GLW model and developed its analysis under the bayesian perspective of inference considering scenarios with censorship and cure rate. This analysis includes the parameter estimation, wich is made through simulation methods, construction of hypothesis testing to evaluate covariate effects and to assess the values of the mixture weights, computatution of model adequability measures, which are used to compare different models and estimation of the predictive distribution for new observations. In a simulation study, we evaluated the feasibility of the GLW mixture to recover the original distribution of failure times using hypothesis testing and some model estimated quantities as criteria for selecting the correct distribution. The models developed were applied in the study of the follow-up time of patients with heart failure from the Heart Institute of the University of Sao Paulo Medical School. In this application, results show a better fit of mixture models, in relation to the use of only one distribution in the modeling of the failure times. Finally, we developed a package for the adjustment of the presented models in software R. Análise de sobrevivência Inferência bayesiana Mistura finita Seleção de modelos Teste de hipóteses Bayesian inference Finite mixtures Hypothesis testing Model selection Survival analysis
180	Modelo de otimização de processos para melhoria da qualidade dos serviços em uma Instituição de ensino público Paula, Izabel Alinne Alves de 23 July 2013 (has links) Made available in DSpace on 2015-04-22T22:10:49Z (GMT). No. of bitstreams: 1 izabel.pdf: 5133966 bytes, checksum: 81bc090cf4739c9ffa486ed8ede553e1 (MD5) Previous issue date: 2013-07-23 / The quality since the advent of globalization has become one of the keywords most widespread in society. In the public sector is associated to the rapidity, reliability, precision and security, however, the general perception that the individuals have on the provision of public services in Brazil is not consistent with these adjectives quality. The literature suggests several tools to evaluate and improve the quality in services, but is subjective mechanisms predominate, which are summed to measure customer satisfaction. Admittedly, the quality has focused on the customer, but in the current Era of Quality is assumed that it is generated in the production process. Considering this gap, the question arose: how to optimize processes of the service sector, while focusing on quality improvement, especially processes executed in the public sector of education? To answer this question, it was adapted a model of analysis and processes improvement through the concatenation phases of Method of Analysis and Solution of Problems (MASP) with the phases of Design of Experiments. It was applied the proposed model in the Coordination of Integration Business-School (CIE-E) of IFAM Campus Manaus Donwtown, evaluating 397 records of stages finalized in the year 2012, where the duration of the process was portrayed as the villain. Thus, it was developed an experimental research characterized as exploratory and descriptive in nature and applied qualitative and quantitative approach. It was used Quality Tools for collection and evaluation of the data, as well the Nonparametric Tests for data analysis. Finally, supported on quality criteria and statistician identified that the combination high time load and no labour experience was the ideal combination of controllable factors that interfered in the records stage studied, resulting in lower process time, ie, it determined the optimal region. Thus, with this study, we evaluated the implantation procedure of statistical techniques in the service sector for the optimization and improvement of its processes. / A qualidade desde o surgimento da globalização tornou-se uma das palavras-chave mais difundidas junto à sociedade. No setor público sua concepção está associada à rapidez, confiabilidade, precisão e segurança, contudo, a percepção que os indivíduos têm sobre a prestação de serviços públicos no Brasil, não condiz com estes adjetivos de qualidade. A literatura aponta diversas ferramentas capazes de avaliar e melhorar a qualidade em serviços, entretanto predominam-se mecanismos subjetivos, que se resumem a mensurar a satisfação do cliente. É certo que, a qualidade tem foco no cliente, mas na atual Era da Qualidade assume-se que ela é gerada no processo produtivo. Considerando esta lacuna, surgiu o questionamento: de que forma pode-se otimizar processos do setor de serviço, mantendo o foco na melhoria da qualidade, em especial os processos executados no setor público de ensino? Para responder tal pergunta, adaptou-se um modelo de análise e melhoria de processos, através da concatenação das fases do Método de Análise e Solução de Problemas (MASP) com as etapas de Planejamento Experimental. Aplicou-se o modelo proposto na Coordenação de Integração Empresa-Escola (CIE-E) do IFAM Campus Manaus Centro, avaliando 397 registros de estágios finalizados no ano de 2012, onde o tempo de duração do processo foi apontado como o vilão. Deste modo, desenvolveu-se uma pesquisa experimental caracterizada como exploratória e descritiva, de natureza aplicada e abordagem qualitativa e quantitativa. Fez-se uso das Ferramentas da Qualidade para a coleta e disposição dos dados, como também de Testes Não Paramétricos para a análise dos dados. Ao fim, respaldado em critérios de qualidade e estatístico, identificou-se que a combinação carga horária alta e não experiência trabalhista constituía a combinação ideal dos fatores controláveis que interferiam nos registros de estágio estudados, resultando no menor tempo do processo, ou seja, determinou-se a região ótima. Assim, com esse estudo, pode-se avaliar o procedimento de implantação de técnicas estatísticas no setor de serviço para a otimização e melhoria de seus processos. Qualidade Setor de Serviço Otimização de Processos Testes de Hipóteses. Quality Service Sector ENGENHARIAS: ENGENHARIA DE PRODUÇÃO

Search results