• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 38
  • 6
  • 4
  • 1
  • 1
  • Tagged with
  • 56
  • 56
  • 56
  • 15
  • 14
  • 12
  • 8
  • 7
  • 7
  • 6
  • 6
  • 6
  • 6
  • 5
  • 5
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

The variable selection problem and the application of the roc curve for binary outcome variables

Matshego, James Moeng. January 2007 (has links)
Thesis (M.Sc. (Applied Statistics)) --University of Pretoria, 2007. / Abstract in English. Includes bibliographical references.
32

Bayesian analysis of wandering vector models for ranking data /

Chan, Kit-yin. January 1998 (has links)
Thesis (M. Phil.)--University of Hong Kong, 1998. / Includes bibliographical references (leaves 98-103).
33

Judgement post-stratification for designed experiments

Du, Juan, January 2006 (has links)
Thesis (Ph. D.)--Ohio State University, 2006. / Title from first page of PDF file. Includes bibliographical references (p. 143-146).
34

A Nonparametric Test for the Non-Decreasing Alternative in an Incomplete Block Design

Ndungu, Alfred Mungai January 2011 (has links)
The purpose of this paper is to present a new nonparametric test statistic for testing against ordered alternatives in a Balanced Incomplete Block Design (BIBD). This test will then be compared with the Durbin test which tests for differences between treatments in a BIBD but without regard to order. For the comparison, Monte Carlo simulations were used to generate the BIBD. Random samples were simulated from: Normal Distribution; Exponential Distribution; T distribution with three degrees of freedom. The number of treatments considered was three, four and five with all the possible combinations necessary for a BIBD. Small sample sizes were 20 or less and large sample sizes were 30 or more. The powers and alpha values were then estimated after 10,000 repetitions.The results of the study show that the new test proposed is more powerful than the Durbin test. Regardless of the distribution, sample size or number of treatments, the new test tended to have higher powers than the Durbin test.
35

The rank analysis of triple comparisons

Pendergrass, Robert Nixon 12 March 2013 (has links)
General extensions of the probability model for paired comparisons, which was developed by R. A. Bradley and M. E. Terry, are considered. Four generalizations to triple comparisons are discussed. One of these models is used to develop methods of analysis of data obtained from the ranks of items compared in groups of size three. / Ph. D.
36

The curve through the expected values of order statistics with special reference to problems in nonparametric tests of hypotheses

Chow, Bryant January 1965 (has links)
The expected value ot the s<sup>th</sup> largest ot n ranked variates from a population with probability density f(x) occurs often in the statistical literature and especially in the theory of nonparametric statistics. A new expression for this value will be obtained tor any underlying density f(x) but emphasis will be placed on normal scores. A finite series representation, the individual terms of which are easy to calculate, will be obtained for the sum of squares of normal scores. The derivation of this series demonstrates a technique which can also be used to obtain the expected value of Fisher's measure or correlation as well as the expected value of the Fisher-Yates test statistic under an alternative hypothesis. / Ph. D.
37

Stability analysis of feature selection approaches with low quality data

Unknown Date (has links)
One of the greatest challenges to data mining is erroneous or noisy data. Several studies have noted the weak performance of classification models trained from low quality data. This dissertation shows that low quality data can also impact the effectiveness of feature selection, and considers the effect of class noise on various feature ranking techniques. It presents a novel approach to feature ranking based on ensemble learning and assesses these ensemble feature selection techniques in terms of their robustness to class noise. It presents a noise-based stability analysis that measures the degree of agreement between a feature ranking techniques output on a clean dataset versus its outputs on the same dataset but corrupted with different combinations of noise level and noise distribution. It then considers classification performances from models built with a subset of the original features obtained after applying feature ranking techniques on noisy data. It proposes the focused ensemble feature ranking as a noise-tolerant approach to feature selection and compares focused ensembles with general ensembles in terms of the ability of the selected features to withstand the impact of class noise when used to build classification models. Finally, it explores three approaches for addressing the combined problem of high dimensionality and class imbalance. Collectively, this research shows the importance of considering class noise when performing feature selection. / by Wilker Altidor. / Thesis (Ph.D.)--Florida Atlantic University, 2011. / Includes bibliography. / Electronic reproduction. Boca Raton, Fla., 2011. Mode of access: World Wide Web.
38

Feature selection techniques and applications in bioinformatics

Unknown Date (has links)
Possibly the largest problem when working in bioinformatics is the large amount of data to sift through to find useful information. This thesis shows that the use of feature selection (a method of removing irrelevant and redundant information from the dataset) is a useful and even necessary technique to use in these large datasets. This thesis also presents a new method in comparing classes to each other through the use of their features. It also provides a thorough analysis of the use of various feature selection techniques and classifier in different scenarios from bioinformatics. Overall, this thesis shows the importance of the use of feature selection in bioinformatics. / by David Dittman. / Thesis (M.S.C.S.)--Florida Atlantic University, 2011. / Includes bibliography. / Electronic reproduction. Boca Raton, Fla., 2011. Mode of access: World Wide Web.
39

Uncertain data management. / CUHK electronic theses & dissertations collection

January 2011 (has links)
In this thesis, we explore the issues of uncertain data management in several different aspects. First, we propose a novel linear time algorithm to compute the positional probability, the computation of which is a primitive operator for most of the ranking definitions. Our algorithm is based on the conditional probability formulation of positional probability and the system of linear equations. Based on the formulation of conditional probability, we also prove a tight upper bound of the top-k probability of tuples, which is then used to stop the top-k computation earlier. Second, we study top-k probabilistic ranking queries with joins when scores and probabilities are stored in different relations. We focus on reducing the join cost in probabilistic top-k ranking. We investigate two probabilistic score functions, namely, expected rank value and probability of highest ranking. We give upper/lower bounds of such probabilistic score functions in random access and sequential access, and propose new I/O efficient algorithms to find top-k objects. Third, we extend the possible worlds semantics to probabilistic XML ranking query, which is to rank top-k probabilities of the answers of a twig query in probabilistic XML data. The new challenge is how to compute top-k probabilities of answers of a twig query in probabilistic XML in the presence of containment (ancestor/descendant) relationships. We focus on node queries first, and propose a new dynamic programming algorithm which can compute top-k probabilities for the answers of node queries based on the previously computed results in probabilistic XML data. We further propose optimization techniques to share the computational cost. We also show techniques to support path queries and tree queries. Fourth, we study how to rank documents using a set of keywords, given a context that is associated with the documents. We model the problem using a graph with two different kinds of nodes (document nodes and multi-attribute nodes), where the edges between document nodes and multi-attribute nodes exist with some probability. We discuss its score function, cost function, and ranking with uncertainty. We also propose new algorithms to rank documents that are most related to the user-given keywords by integrating the context information. / Uncertain data management has received a lot of attentions recently due to the fact that data obtained can be incomplete or uncertain in many real applications. Ranking of uncertain data becomes an important research issue, the possible worlds semantics-based ranking makes it different from the ranking of deterministic data. In the traditional deterministic data, we can compute a score for each object, and then the objects are ranked based on the computed scores. However, in the scenario of uncertain data, each object has a probability to be the true answer (or the existence probability), besides the computed score. A probabilistic top-k ranking query ranks objects by the interplay of score and probability based on the possible worlds semantics. Many definitions have been proposed in the literature based on the possible worlds semantics. / Chang, Lijun. / Advisers: Hong Cheng; Jeffrey Xu Yu. / Source: Dissertation Abstracts International, Volume: 73-06, Section: B, page: . / Thesis (Ph.D.)--Chinese University of Hong Kong, 2011. / Includes bibliographical references (leaves 131-139). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [201-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese.
40

Advances in ranking and selection: variance estimation and constraints

Healey, Christopher M. 16 July 2010 (has links)
In this thesis, we first show that the performance of ranking and selection (R&S) procedures in steady-state simulations depends highly on the quality of the variance estimates that are used. We study the performance of R&S procedures using three variance estimators --- overlapping area, overlapping Cramer--von Mises, and overlapping modified jackknifed Durbin--Watson estimators --- that show better long-run performance than other estimators previously used in conjunction with R&S procedures for steady-state simulations. We devote additional study to the development of the new overlapping modified jackknifed Durbin--Watson estimator and demonstrate some of its useful properties. Next, we consider the problem of finding the best simulated system under a primary performance measure, while also satisfying stochastic constraints on secondary performance measures, known as constrained ranking and selection. We first present a new framework that allows certain systems to become dormant, halting sampling for those systems as the procedure continues. We also develop general procedures for constrained R&S that guarantee a nominal probability of correct selection, under any number of constraints and correlation across systems. In addition, we address new topics critical to efficiency of the these procedures, namely the allocation of error between feasibility check and selection, the use of common random numbers, and the cost of switching between simulated systems.

Page generated in 0.153 seconds