• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 6867
  • 727
  • 652
  • 593
  • 427
  • 427
  • 427
  • 427
  • 427
  • 424
  • 342
  • 133
  • 119
  • 111
  • 108
  • Tagged with
  • 13129
  • 2380
  • 2254
  • 2048
  • 1772
  • 1657
  • 1447
  • 1199
  • 1066
  • 904
  • 858
  • 776
  • 760
  • 741
  • 739
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
341

Estimating nonlinear functionals of a random field

Boeckenhauer, Rachel Kaye January 2001 (has links)
Environmental data are gathered with the goal of estimating some quantity of interest. In particular, in the case of groundwater or soil contamination, it is desirable to estimate the total amount of contaminant present within a region in order to more effectively remediate the contamination. This problem has been generally unaddressed previously, and is of interest to environmental scientists. A method is introduced here to estimate the integral of a lognormal process over a region using Monte Carlo simulation of the process conditional on the observed data. The performance of the method is evaluated with an application to groundwater data. Results are compared using uniform sampling and importance sampling over the region.
342

Gaussian mixture regression and classification

Sung, Hsi Guang January 2004 (has links)
The sparsity of high dimensional data space renders standard nonparametric methods ineffective for multivariate data. A new procedure, Gaussian Mixture Regression (GMR), is developed for multivariate nonlinear regression modeling. GMR has the tight structure of a parametric model, yet still retains the flexibility of a nonparametric method. The key idea of GMR is to construct a sequence of Gaussian mixture models for the joint density of the data, and then derive conditional density and regression functions from each model. Assuming the data are a random sample from the joint pdf fX,Y, we fit a Gaussian kernel density model fˆX,Y and then implement a multivariate extension of the Iterative Pairwise Replacement Algorithm (IPRA) to simplify the initial kernel density. IPRA generates a sequence of Gaussian mixture density models indexed by the number of mixture components K. The corresponding regression function of each density model forms a sequence of regression models which covers a spectrum of regression models of varying flexibility, ranging from approximately the classical linear model (K = 1) to the nonparametric kernel regression estimator (K = n). We use mean squared error and prediction error for selecting K. For binary responses, we extend GMR to fit nonparametric logistic regression models. Applying IPRA for each class density, we obtain two families of mixture density models. The logistic function can then be estimated by the ratio between pairs of members from each family. The result is a family of logistic models indexed by the number of mixtures in each density model. We call this procedure Gaussian Mixture Classification (GMC). For a given GMR or GMC model, forward and backward projection algorithms are implemented to locate the optimal subspaces that minimize information loss. They serve as the model-based dimension reduction techniques for GMR and GMC. In practice, GMR and GMC offer data analysts a systematic way to determine the appropriate level of model flexibility by choosing the number of components for modeling the underlying pdf. GMC can serve as an alternative or a complement to Mixture Discriminant Analysis (MDA). The uses of GMR and GMC are demonstrated in simulated and real data.
343

An empirical study of feature selection in binary classification with DNA microarray data

Lecocke, Michael Louis January 2005 (has links)
Motivation. Binary classification is a common problem in many types of research including clinical applications of gene expression microarrays. This research is comprised of a large-scale empirical study that involves a rigorous and systematic comparison of classifiers, in terms of supervised learning methods and both univariate and multivariate feature selection approaches. Other principle areas of investigation involve the use of cross-validation (CV) and how to guard against the effects of optimism and selection bias when assessing candidate classifiers via CV. This is taken into account by ensuring that the feature selection is performed during training of the classification rule at each stage of a CV process ("external CV"), which to date has not been the traditional approach to performing cross-validation. Results. A large-scale empirical comparison study is presented, in which a 10-fold CV procedure is applied internally and externally to a univariate as well as two genetic algorithm-(GA-) based feature selection processes. These procedures are used in conjunction with six supervised learning algorithms across six published two-class clinical microarray datasets. It was found that external CV generally provided more realistic and honest misclassification error rates than those from using internal CV. Also, although the more sophisticated multivariate FSS approaches were able to select gene subsets that went undetected via the combination of genes from even the top 100 univariately ranked gene list, neither of the two GA-based methods led to significantly better 10-fold internal nor external CV error rates. Considering all the selection bias estimates together across all subset sizes, learning algorithms, and datasets, the average bias estimates from each of the GA-based methods were roughly 2.5 times that of the univariate-based method. Ultimately, this research has put to test the more traditional implementations of the statistical learning aspects of cross-validation and feature selection and has provided a solid foundation on which these issues can and should be further investigated when performing limited-sample classification studies using high-dimensional gene expression data.
344

Nonparametric estimation of bivariate mean residual life function

Ghebremichael, Musie S. January 2005 (has links)
In survival analysis the additional lifetime that an object survives past a time t is called the residual life function of the object. Mathematically speaking if the lifetime of the object is described by a random variable T then the random variable R(t) = [T - t| T > t] is called the residual life random variable. The quantity e(t) = E( R(t)) = E[T - t|T > t] is called the mean residual lifetime (mrl) function or the life expectancy at age t. There are numerous situations where the bivariate mrl function is important. Times to death or times to initial contraction of a disease may be of interest for litter mate pairs of rats or for twin studies in humans. The time to a deterioration level or the time to reaction of a treatment may be of interest in pairs of lungs, kidneys, breasts, eyes or ears of humans. In reliability, the distribution of the lifelengths of a particular pair of components in a system may be of interest. Because of the dependence among the event times, we can not get reliable results by using the univariate mrl function on each event times in order to study the aging process. The bivariate mrl function is useful in analyzing the joint distribution of two event times where these times are dependent. In recent years, though considerable attention has been paid to the univariate mrl function, relatively little research has been devoted to the analysis of the bivariate mrl function. The specific contribution of this dissertation consists in proposing, and examining the properties of, nonparametric estimators of the bivariate mean residual life function when a certain order among such functions exists. That is, we consider the problem of nonparametric estimation of a bivariate mrl function when it is bounded from above by another known or unknown mrl function. The estimators under such an order constraint are shown to perform better than the empirical mrl function in terms of mean squared error. Moreover, they are shown to be projections, onto an appropriate space, of the empirical mean residual life function. Under suitable technical conditions, the asymptotic theory of these estimators is derived. Finally, the procedures are applied to a data set on bivariate survival. More specifically, we have used the Diabetic Retinopathy Study (DRS) data to illustrate our estimators. In this data set, the survival times of both left and right eyes are given for two groups of patients: juvenile and adult diabetics. Thus, it seems natural to assume that the mrl for the juvenile diabetics be longer than the mrl of the adult diabetics. Under this assumption, we calculated the estimators of the mrl function for each group. We have also calculated the empirical mrl functions of the two groups and compared them with the estimators of the mrl function obtained under the above assumption.
345

Bayesian inference for ordinal data

Zhou, Xian January 2006 (has links)
Albert & Chib proposed a Bayesian ordinal probit regression model using the Gibbs sampler to model ordinal data. Their method defines a relationship between latent variables and ordinal outcomes using cutpoint parameters. However, the convergence of this Gibbs sampler is slow when the sample size is large because the cutpoint parameters are not efficiently sampled. Cowles proposed a Gibbs/Metropolis-Hastings (MH) sampler that would update cutpoint parameters more efficiently. In the context of longitudinal ordinal data, this algorithm potentially require the computation of cumulative probability of a multivariate normal distribution to calculate the acceptance probability for the MH sampler. We propose a probit model where the latent variables follow a mixture of normal distributions. This mixture structure can successfully characterize the ordinality of data while holding the cutpoint parameters constant. Gibbs samplings along with reversible jump MCMC are carried out to estimate the size of the mixture. We adopt this idea in modeling ordinal longitudinal data, where the autoregressive error (1) model is proposed to characterize the underlying correlation structure among the repeated measurements. We also propose a Bayesian probabilistic model in estimating the clustering membership using a mixture of Gaussian distributions to tackle the problem of clustering ordinal data. Results are compared with those obtained from K-means method. We further extend the multinomial probit (MNP) model and develop a joint MNP and ordinal probit model to model the cell probabilities for multiple categorical outcomes with ordinal variables nested within each categorical outcome. A hierarchical prior is imposed on the location parameters of the normal kernels in the mixture model associated with the ordinal outcomes. Our model has wide applications in various fields such as clinical trials, marketing research, and social science.
346

Practical methods for data mining with massive data sets

Salch, John David January 1998 (has links)
The increasing size of data sets has necessitated advancement in exploratory techniques. Methods that are practical for moderate to small data sets become infeasible when applied to massive data sets. Advanced techniques such as binned kernel density estimation, tours, and mode-based projection pursuit will be explored. Mean-centered binning will be introduced as an improved method for binned density estimation. The density grand tour will be demonstrated as a means of exploring massive high-dimensional data sets. Projection pursuit by clustering components will be described as a means to find interesting lower-dimensional subspaces of data sets.
347

Modeling carcinogenesis in lung cancer: Taking genetic factors and smoking factor into account

Deng, Li January 2006 (has links)
The goal of my thesis is to assess the impacts of cigarette smoking and genetic susceptibility on the onset lung cancer and to compute the age-specific probability of developing lung cancer given risk factor levels. The improvement in predicting the chance of having lung cancer at certain age will enhance physicians' capability to design a sensible screening strategy for early tumor detection in a high-risk population. This is the only way to reduce the mortality rate since no effective treatment or cure is available for advanced lung cancer at this time. The evaluation of the effects of these two risk factors proceeds through parameter estimation in the framework of the two-stage clonal expansion (TSCE) model applied to case-control study data. The TSCE model describes carcinogenesis as transitions from normal cells to slightly abnormal cells and to cancerous cells. Our data analysis indicates that smoking enhances the proliferation rate while both smoking and genetic susceptibility affect initiation and malignancy transformation rates. The data suggests that there might be a mechanism difference in the development of lung cancer for non-smokers and for smokers. Besides predicting survival rates, I rigorously prove the non-identifiability theorem for the TSCE model in the piecewise constant case and derive a new algorithm of calculating the survival function for a 3-stage and 2-path stochastic model. This 3-stage and 2-path model has two new features: it consists of two stages instead of one for abnormal cells, where one stage is more advanced than the other, and it includes two paths connecting normal cells to cancerous cells. The test of the new model on Texas cancer data shows a very good fit. Such efforts in developing models that incorporate new findings will lead to a better understanding of the mechanism of carcinogenesis and eventually to the development of drugs to treat cancer.
348

Modeling auxiliary information in clinical trials

Han, Shu January 2005 (has links)
During a clinical trial, early endpoints may be available on some patients for whom the primary endpoint has not been observed. To model this situation, we develop a parametric model and a nonparametric model that utilize auxiliary endpoint data to predict the missing primary endpoint data. These predicted primary endpoint data assist researchers in determining whether the conclusions of a clinical trial can be obtained and announced earlier than otherwise. And such modeling may be able to enhance the precision of comparisons of the primary endpoint across treatment arms. The parametric model is developed using a Bayesian paradigm assuming that the data are normally distributed. The nonparametric model is developed using kernel density estimation. In both cases we base the conditional predictive distribution of the missing primary endpoint data on the auxiliary endpoint data for patients with missing data and on the pairs of observations for patients who have achieved both endpoints. The effects of bandwidth on the performance of the nonparametric model are evaluated. We consider a two-treatment clinical trial in which the primary objective is to compare the two treatments on the basis of the primary endpoint. We compare the performances of our two proposed models with those of two conventional methods, Last Observation Carried Forward (LOCF) and Ignoring Missing Values (IMV). Our simulation results demonstrate that both the parametric and the nonparametric model have advantages over conventional methods. The parametric model performs slightly better than the nonparametric model when the distributions of the auxiliary endpoint data and primary endpoint data are jointly normal. The nonparametric model is better than the parametric model when these distributions deviate sufficiently from normality. So the nonparametric model is robust in this sense.
349

Inverse decision theory with medical applications

Davies, Kalatu R. January 2005 (has links)
Medical decision makers would like to use decision theory to determine optimal treatment strategies for patients, but it can be very difficult to specify loss functions in the medical setting, especially when trying to assign monetary value to health outcomes. These issues led to the development of an alternative approach, called Inverse Decision Theory (IDT), in which given a probability model and a specific decision rule, we determine the set of losses for which that decision rule is optimal. This thesis presents the evolution of the IDT method and its applications to medical treatment decision rules. There are two ways in which we can use the IDT method. Under the first approach, we operate under the assumption that the decision rule of interest is optimal, and use the prior information that we have to make inferences on the losses. The second approach involves the use of the prior information to derive an optimal region and determine if the losses in this region are reasonable based on our prior information. We illustrate the use of IDT by applying it to the current standard of care (SOC) for the detection and treatment of cervical neoplasias. First, we model the diagnostic and treatment process as a Bayesian sequential decision procedure. Then, we determine the Bayes risk expression for all decision rules and compare the Bayes risk expression for the current SOC decision rule to the Bayes risk expressions of all other decision rules, forming linear inequality constraints on a region under which the current SOC is optimal. The current standard of care has been in use for many years, but we find another decision rule to be optimal. We question whether the current standard of care is the optimal decision rule and will continue to examine these implications and the practicality of implementing this new decision rule. The IDT method provides us with a mathematical technique for dealing with the challenges in formally quantifying patient experiences and outcomes. We believe that it will be applicable to many other disease conditions and become a valuable tool for determining optimal medical treatment standards of care.
350

An unconditional test for the single-sample binomial

Brott, Evan John January 2004 (has links)
An unconditional test is presented for a single-sample Binomial experiment with random sample size. The test is shown to be uniformly more powerful than the standard Binomial Test, and is shown through several simulations to produce more accurate p-values as well. The primary downside of the test lies in the fact that it can be anticonservative (that is, produce too small a p-value), although it may be preferable to the ultra-conservatism of the standard test which treats the sample size as being fixed.

Page generated in 0.0927 seconds