171 |
Inference of nonparametric hypothesis testing on high dimensional longitudinal data and its application in DNA copy number variation and micro array data analysisZhang, Ke January 1900 (has links)
Doctor of Philosophy / Department of Statistics / Haiyan Wang / High throughput screening technologies have generated a huge amount of
biological data in the last ten years. With the easy availability of
array technology, researchers started to investigate biological
mechanisms using experiments with more sophisticated designs that pose novel challenges to
statistical analysis. We provide theory for robust statistical tests in three flexible
models. In the first model, we consider the hypothesis testing
problems when there are a large number of variables observed
repeatedly over time. A potential application is in tumor genomics
where an
array comparative genome hybridization (aCGH) study will be used to
detect progressive DNA copy number changes in tumor development. In
the second model, we consider hypothesis testing theory in a
longitudinal microarray study when there are multiple treatments or experimental conditions.
The tests developed can be used to
detect treatment effects for a large group of genes and discover genes that respond to treatment over
time. In the third model, we address a hypothesis testing problem that could
arise when array data from different sources are to be integrated. We
perform statistical tests by assuming a nested design. In all
models, robust test statistics were constructed based on moment methods allowing unbalanced design and arbitrary heteroscedasticity. The limiting
distributions were derived under the nonclassical setting when the number of probes is large. The
test statistics are not targeted at a single probe. Instead, we are
interested in testing for a selected set of probes simultaneously.
Simulation studies were carried out to compare the proposed methods with
some traditional tests using linear mixed-effects
models and generalized estimating equations. Interesting results obtained with the proposed theory in two
cancer genomic studies suggest that the new methods are promising for a wide range of biological applications with longitudinal arrays.
|
172 |
Algoritmiese rangordebepaling van akademiese tydskrifteStrydom, Machteld Christina 31 October 2007 (has links)
Opsomming
Daar bestaan 'n behoefte aan 'n objektiewe maatstaf om die gehalte van
akademiese publikasies te bepaal en te vergelyk.
Hierdie navorsing het die invloed of reaksie wat deur 'n publikasie gegenereer
is uit verwysingsdata bepaal. Daar is van 'n iteratiewe algoritme gebruik
gemaak wat gewigte aan verwysings toeken.
In die Internetomgewing word hierdie benadering reeds met groot sukses
toegepas deur onder andere die PageRank-algoritme van die Google soekenjin.
Hierdie en ander algoritmes in die Internetomgewing is bestudeer om 'n
algoritme vir akademiese artikels te ontwerp. Daar is op 'n variasie van die
PageRank-algoritme besluit wat 'n Invloedwaarde bepaal. Die algoritme is
op gevallestudies getoets. Die empiriese studie dui daarop dat hierdie variasie
spesialisnavorsers se intu¨ıtiewe gevoel beter weergee as net die blote tel van
verwysings.
Abstract
Ranking of journals are often used as an indicator of quality, and is extensively
used as a mechanism for determining promotion and funding.
This research studied ways of extracting the impact, or influence, of a journal
from citation data, using an iterative process that allocates a weight to the
source of a citation.
After evaluating and discussing the characteristics that influence quality and
importance of research with specialist researchers, a measure called the Influence
factor was introduced, emulating the PageRankalgorithm used by
Google to rank web pages. The Influence factor can be seen as a measure
of the reaction that was generated by a publication, based on the number of
scientists who read and cited itA good correlation between the rankings produced by the Influence factor
and that given by specialist researchers were found. / Mathematical Sciences / M.Sc. (Operasionele Navorsing)
|
173 |
Neighborhood-Oriented feature selection and classification of Duke’s stages on colorectal Cancer using high density genomic data.Peng, Liang January 1900 (has links)
Master of Science / Department of Statistics / Haiyan Wang / The selection of relevant genes for classification of phenotypes for diseases with gene expression data have been extensively studied. Previously, most relevant gene selection was
conducted on individual gene with limited sample size. Modern technology makes it possible to obtain microarray data with higher resolution of the chromosomes. Considering gene
sets on an entire block of a chromosome rather than individual gene could help to reveal important connection of relevant genes with the disease phenotypes. In this report, we consider feature selection and classification while taking into account of the spatial location of probe sets in classification of Duke’s stages B and C using DNA copy number data or gene expression data from colorectal cancers. A novel method was presented for feature selection in this report. A chromosome was first partitioned into blocks after the probe sets were aligned along their chromosome locations. Then a test of interaction between Duke’s stage and probe sets was conducted on each block of probe sets to select significant blocks. For each significant block, a new multiple comparison procedure was carried out to identify truly relevant probe sets while preserving the neighborhood location information of the
probe sets. Support Vector Machine (SVM) and K-Nearest Neighbor (KNN) classification
using the selected final probe sets was conducted for all samples. Leave-One-Out Cross Validation (LOOCV) estimate of accuracy is reported as an evaluation of selected features. We applied the method on two large data sets, each containing more than 50,000 features. Excellent classification accuracy was achieved by the proposed procedure along with SVM or KNN for both data sets even though classification of prognosis stages (Duke’s stages B and C) is much more difficult than that for the normal or tumor types.
|
174 |
A Bayesian nonparametric approach for the two-sample problem / Uma abordagem bayesiana não paramétrica para o problema de duas amostrasConsole, Rafael de Carvalho Ceregatti de 19 November 2018 (has links)
In this work, we discuss the so-called two-sample problem Pearson and Neyman (1930) assuming a nonparametric Bayesian approach. Considering X1; : : : ; Xn and Y1; : : : ; Ym two independent i.i.d samples generated from P1 and P2, respectively, the two-sample problem consists in deciding if P1 and P2 are equal. Assuming a nonparametric prior, we propose an evidence index for the null hypothesis H0 : P1 = P2 based on the posterior distribution of the distance d (P1; P2) between P1 and P2. This evidence index has easy computation, intuitive interpretation and can also be justified in the Bayesian decision-theoretic context. Further, in a Monte Carlo simulation study, our method presented good performance when compared with the well known Kolmogorov- Smirnov test, the Wilcoxon test as well as a recent testing procedure based on Polya tree process proposed by Holmes (HOLMES et al., 2015). Finally, we applied our method to a data set about scale measurements of three different groups of patients submitted to a questionnaire for Alzheimer\'s disease diagnostic. / Neste trabalho, discutimos o problema conhecido como problema de duas amostras Pearson and Neyman (1930) utilizando uma abordagem bayesiana não-paramétrica. Considere X1; : : : ; Xn and Y1; : : : ;Ym duas amostras independentes, geradas por P1 e P2, respectivamente, o problema de duas amostras consiste em decidir se P1 e P2 são iguais. Assumindo uma priori não-paramétrica, propomos um índice de evidência para a hipótese nula H0 : P1 = P2 baseado na distribuição a posteriori da distância d (P1; P2) entre P1 e P2. O índice de evidência é de fácil implementação, tem uma interpretação intuitiva e também pode ser justificada no contexto da teoria da decisão bayesiana. Além disso, em um estudo de simulação de Monte Carlo, nosso método apresentou bom desempenho quando comparado com o teste de Kolmogorov-Smirnov, com o teste de Wilcoxon e com o método de Holmes. Finalmente, aplicamos nosso método em um conjunto de dados sobre medidas de escala de três grupos diferentes de pacientes submetidos a um questionário para diagnóstico de doença de Alzheimer.
|
175 |
Sensitivitet vid mammografi och tomosyntes undersökningarSelaci, Albert, Sjöqvist, Hanna January 2019 (has links)
Bröst består av mjölkkörtlar, subkutant fett och bindväv. Det finns också kärl och lymfa i brösten. Både män och kvinnor har bröst. Olika sjukdomar kan drabba brösten av benigna och maligna slag. Den mest använda undersökningsmetoden för att upptäcka bröstcancer är mammografi. Vid ytterligare undersökning av brösten kan digital bröst-tomosyntes (DBT) förekomma. DBT är en sorts begränsad vinkel-tomografi som producerar bilder på brösten i sektioner. Åsikter om DBT är motstridiga, en del studier säger att tomosyntes är bättre än mammografi gällande sensitivitet och andra säger att det är sämre eller ekvivalent. För att få kunskap om tomosyntes, mammografi och vad som skiljer i sensitivitet krävs det en sammanfattning av olika studier. Syftet med studien är att jämföra sensitivitet vid bröstundersökningar inom mammografi och tomosyntes. Via en systematisk litteraturstudie sammanfattas ett resultat utifrån kvantitativa artiklar som kvalitetsgranskas och analyseras. Arbetet har genomgått en etisk egengranskning. Resultatet skapades via hypotesprövning och SPSS och de påvisar att det finns en signifikant skillnad i sensitivitet mellan DBT och mammografi vilket innebär att DBT har högre sensitivitet sett till medelvärde och median.
|
176 |
Análise Bayesiana de modelos de mistura finita com dados censurados / Bayesian analysis of finite mixture models with censored dataMelo, Brian Alvarez Ribeiro de 21 February 2017 (has links)
Misturas finitas são modelos paramétricos altamente flexíveis, capazes de descrever diferentes características dos dados em vários contextos, especialmente na análise de dados heterogêneos (Marin, 2005). Geralmente, nos modelos de mistura finita, todas as componentes pertencem à mesma família paramétrica e são diferenciadas apenas pelo vetor de parâmetros associado a essas componentes. Neste trabalho, propomos um novo modelo de mistura finita, capaz de acomodar observações censuradas, no qual as componentes são as densidades das distribuições Gama, Lognormal e Weibull (mistura GLW). Essas densidades são reparametrizadas, sendo reescritas em função da média e da variância, uma vez que estas quantidades são mais difundidas em diversas áreas de estudo. Assim, construímos o modelo GLW e desenvolvemos a análise de tal modelo sob a perspectiva bayesiana de inferência. Essa análise inclui a estimação, através de métodos de simulação, dos parâmetros de interesse em cenários com censura e com fração de cura, a construção de testes de hipóteses para avaliar efeitos de covariáveis e pesos da mistura, o cálculo de medidas para comparação de diferentes modelos e estimação da distribuição preditiva de novas observações. Através de um estudo de simulação, avaliamos a capacidade da mistura GLW em recuperar a distribuição original dos tempos de falha utilizando testes de hipóteses e estimativas do modelo. Os modelos desenvolvidos também foram aplicados no estudo do tempo de seguimento de pacientes com insuficiência cardíaca do Instituto do Coração da Faculdade de Medicina da Universidade de São Paulo. Nesta aplicação, os resultados mostram uma melhor adequação dos modelos de mistura em relação à utilização de apenas uma distribuição na modelagem dos tempos de seguimentos. Por fim, desenvolvemos um pacote para o ajuste dos modelos apresentados no software R. / Finite mixtures are highly flexible parametric models capable of describing different data features and are widely considered in many contexts, especially in the analysis of heterogeneous data (Marin, 2005). Generally, in finite mixture models, all the components belong to the same parametric family and are only distinguished by the associated parameter vector. In this thesis, we propose a new finite mixture model, capable of handling censored observations, in which the components are the densities from the Gama, Lognormal and Weibull distributions (the GLW finite mixture). These densities are rewritten in such a way that the mean and the variance are the parameters, since the interpretation of such quantities is widespread in various areas of study. In short, we constructed the GLW model and developed its analysis under the bayesian perspective of inference considering scenarios with censorship and cure rate. This analysis includes the parameter estimation, wich is made through simulation methods, construction of hypothesis testing to evaluate covariate effects and to assess the values of the mixture weights, computatution of model adequability measures, which are used to compare different models and estimation of the predictive distribution for new observations. In a simulation study, we evaluated the feasibility of the GLW mixture to recover the original distribution of failure times using hypothesis testing and some model estimated quantities as criteria for selecting the correct distribution. The models developed were applied in the study of the follow-up time of patients with heart failure from the Heart Institute of the University of Sao Paulo Medical School. In this application, results show a better fit of mixture models, in relation to the use of only one distribution in the modeling of the failure times. Finally, we developed a package for the adjustment of the presented models in software R.
|
177 |
The work-leisure relationship among working youth-centre members: implication for program planning with a test of the 'compensatory hypothesis'.January 1987 (has links)
by Ho Kam Wan. / Thesis (M.S.W.)--Chinese University of Hong Kong, 1987. / Bibliography: leaves 158-167.
|
178 |
Developing Criteria for Extracting Principal Components and Assessing Multiple Significance Tests in Knowledge Discovery ApplicationsKeeling, Kellie Bliss 08 1900 (has links)
With advances in computer technology, organizations are able to store large amounts of data in data warehouses. There are two fundamental issues researchers must address: the dimensionality of data and the interpretation of multiple statistical tests. The first issue addressed by this research is the determination of the number of components to retain in principal components analysis. This research establishes regression, asymptotic theory, and neural network approaches for estimating mean and 95th percentile eigenvalues for implementing Horn's parallel analysis procedure for retaining components. Certain methods perform better for specific combinations of sample size and numbers of variables. The adjusted normal order statistic estimator (ANOSE), an asymptotic procedure, performs the best overall. Future research is warranted on combining methods to increase accuracy. The second issue involves interpreting multiple statistical tests. This study uses simulation to show that Parker and Rothenberg's technique using a density function with a mixture of betas to model p-values is viable for p-values from central and non-central t distributions. The simulation study shows that final estimates obtained in the proposed mixture approach reliably estimate the true proportion of the distributions associated with the null and nonnull hypotheses. Modeling the density of p-values allows for better control of the true experimentwise error rate and is used to provide insight into grouping hypothesis tests for clustering purposes. Future research will expand the simulation to include p-values generated from additional distributions. The techniques presented are applied to data from Lake Texoma where the size of the database and the number of hypotheses of interest call for nontraditional data mining techniques. The issue is to determine if information technology can be used to monitor the chlorophyll levels in the lake as chloride is removed upstream. A relationship established between chlorophyll and the energy reflectance, which can be measured by satellites, enables more comprehensive and frequent monitoring. The results have both economic and political ramifications.
|
179 |
Análise Bayesiana de modelos de mistura finita com dados censurados / Bayesian analysis of finite mixture models with censored dataBrian Alvarez Ribeiro de Melo 21 February 2017 (has links)
Misturas finitas são modelos paramétricos altamente flexíveis, capazes de descrever diferentes características dos dados em vários contextos, especialmente na análise de dados heterogêneos (Marin, 2005). Geralmente, nos modelos de mistura finita, todas as componentes pertencem à mesma família paramétrica e são diferenciadas apenas pelo vetor de parâmetros associado a essas componentes. Neste trabalho, propomos um novo modelo de mistura finita, capaz de acomodar observações censuradas, no qual as componentes são as densidades das distribuições Gama, Lognormal e Weibull (mistura GLW). Essas densidades são reparametrizadas, sendo reescritas em função da média e da variância, uma vez que estas quantidades são mais difundidas em diversas áreas de estudo. Assim, construímos o modelo GLW e desenvolvemos a análise de tal modelo sob a perspectiva bayesiana de inferência. Essa análise inclui a estimação, através de métodos de simulação, dos parâmetros de interesse em cenários com censura e com fração de cura, a construção de testes de hipóteses para avaliar efeitos de covariáveis e pesos da mistura, o cálculo de medidas para comparação de diferentes modelos e estimação da distribuição preditiva de novas observações. Através de um estudo de simulação, avaliamos a capacidade da mistura GLW em recuperar a distribuição original dos tempos de falha utilizando testes de hipóteses e estimativas do modelo. Os modelos desenvolvidos também foram aplicados no estudo do tempo de seguimento de pacientes com insuficiência cardíaca do Instituto do Coração da Faculdade de Medicina da Universidade de São Paulo. Nesta aplicação, os resultados mostram uma melhor adequação dos modelos de mistura em relação à utilização de apenas uma distribuição na modelagem dos tempos de seguimentos. Por fim, desenvolvemos um pacote para o ajuste dos modelos apresentados no software R. / Finite mixtures are highly flexible parametric models capable of describing different data features and are widely considered in many contexts, especially in the analysis of heterogeneous data (Marin, 2005). Generally, in finite mixture models, all the components belong to the same parametric family and are only distinguished by the associated parameter vector. In this thesis, we propose a new finite mixture model, capable of handling censored observations, in which the components are the densities from the Gama, Lognormal and Weibull distributions (the GLW finite mixture). These densities are rewritten in such a way that the mean and the variance are the parameters, since the interpretation of such quantities is widespread in various areas of study. In short, we constructed the GLW model and developed its analysis under the bayesian perspective of inference considering scenarios with censorship and cure rate. This analysis includes the parameter estimation, wich is made through simulation methods, construction of hypothesis testing to evaluate covariate effects and to assess the values of the mixture weights, computatution of model adequability measures, which are used to compare different models and estimation of the predictive distribution for new observations. In a simulation study, we evaluated the feasibility of the GLW mixture to recover the original distribution of failure times using hypothesis testing and some model estimated quantities as criteria for selecting the correct distribution. The models developed were applied in the study of the follow-up time of patients with heart failure from the Heart Institute of the University of Sao Paulo Medical School. In this application, results show a better fit of mixture models, in relation to the use of only one distribution in the modeling of the failure times. Finally, we developed a package for the adjustment of the presented models in software R.
|
180 |
Modelo de otimização de processos para melhoria da qualidade dos serviços em uma Instituição de ensino públicoPaula, Izabel Alinne Alves de 23 July 2013 (has links)
Made available in DSpace on 2015-04-22T22:10:49Z (GMT). No. of bitstreams: 1
izabel.pdf: 5133966 bytes, checksum: 81bc090cf4739c9ffa486ed8ede553e1 (MD5)
Previous issue date: 2013-07-23 / The quality since the advent of globalization has become one of the keywords most widespread in society. In the public sector is associated to the rapidity, reliability, precision and security, however, the general perception that the individuals have on the provision of public services in Brazil is not consistent with these adjectives quality. The literature suggests several tools to evaluate and improve the quality in
services, but is subjective mechanisms predominate, which are summed to measure customer satisfaction. Admittedly, the quality has focused on the customer, but in the current Era of Quality is assumed that it is generated in the production process. Considering this gap, the question arose: how to optimize processes of the service sector, while focusing on quality improvement, especially processes executed in the public sector of education? To answer this question, it was adapted a model of analysis and processes improvement through the concatenation phases of Method of Analysis and Solution of Problems (MASP) with the phases of Design of Experiments. It was applied the proposed model in the Coordination of Integration Business-School (CIE-E) of IFAM Campus Manaus Donwtown, evaluating 397 records of stages finalized in the year 2012, where the duration of the process was portrayed as the villain. Thus, it was developed an experimental research characterized as exploratory and descriptive in nature and applied qualitative and quantitative approach. It was used Quality Tools for collection and evaluation of the data, as well the Nonparametric Tests for data analysis. Finally, supported on quality criteria and statistician identified that the combination high time load and no labour experience was the ideal combination of controllable factors that interfered in the records stage studied, resulting in lower process time, ie, it determined the optimal region. Thus, with this study, we evaluated the implantation procedure of statistical techniques in the service sector for the optimization and improvement of its processes. / A qualidade desde o surgimento da globalização tornou-se uma das palavras-chave mais difundidas junto à sociedade. No setor público sua concepção está associada à rapidez, confiabilidade, precisão e segurança, contudo, a percepção que os indivíduos têm sobre a prestação de serviços públicos no Brasil, não condiz com estes adjetivos de qualidade. A literatura aponta diversas ferramentas capazes de avaliar e melhorar a qualidade em serviços, entretanto predominam-se mecanismos subjetivos, que se resumem a mensurar a satisfação do cliente. É certo que, a qualidade tem foco no cliente, mas na atual Era da Qualidade assume-se que ela é gerada no processo produtivo. Considerando esta lacuna, surgiu o questionamento: de que forma pode-se otimizar processos do setor de serviço, mantendo o foco na melhoria da qualidade, em especial os processos executados no setor público de ensino? Para responder tal pergunta, adaptou-se um modelo de análise e melhoria de processos, através da concatenação das fases do Método de Análise e Solução
de Problemas (MASP) com as etapas de Planejamento Experimental. Aplicou-se o modelo proposto na Coordenação de Integração Empresa-Escola (CIE-E) do IFAM Campus Manaus Centro, avaliando 397 registros de estágios finalizados no ano de 2012, onde o tempo de duração do processo foi apontado como o vilão. Deste modo, desenvolveu-se uma pesquisa experimental caracterizada como exploratória
e descritiva, de natureza aplicada e abordagem qualitativa e quantitativa. Fez-se uso das Ferramentas da Qualidade para a coleta e disposição dos dados, como também de Testes Não Paramétricos para a análise dos dados. Ao fim, respaldado em
critérios de qualidade e estatístico, identificou-se que a combinação carga horária alta e não experiência trabalhista constituía a combinação ideal dos fatores controláveis que interferiam nos registros de estágio estudados, resultando no
menor tempo do processo, ou seja, determinou-se a região ótima. Assim, com esse estudo, pode-se avaliar o procedimento de implantação de técnicas estatísticas no
setor de serviço para a otimização e melhoria de seus processos.
|
Page generated in 0.1067 seconds