• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 6764
  • 117
  • 29
  • 4
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 6764
  • 1456
  • 1226
  • 1217
  • 1131
  • 963
  • 639
  • 636
  • 584
  • 467
  • 462
  • 454
  • 451
  • 404
  • 396
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
121

Quantifying local creation and regional transport using a hierarchical space-time model of ozone as a function of observed NOx, a latent space-time VOC process, emissions, and meteorology

Nail, Amy Jeanette 20 August 2007 (has links)
We explore the ability of a space-time model to decompose the 8-hour ozone concentration on a given day at a given site into the parts attributable to local emissions and regional transport, and ultimately to assess the efficacy of past and future emission control programs. We model ozone as created plus transported ozone plus an error term that has a seasonally varying spatial covariance. The created component uses atmospheric chemistry results to express ozone created on a given day at a given site as a function of the observed NOx concentration, the latent VOC concentration, and temperature. The ozone transported to a given day at a given site is expressed as a weighted average of the ozone observed at all sites on the previous day, where the weights are a function of wind speed and direction that appropriately distribute weight across redundant information. The latent VOC process model has a mean trend that includes emissions from various source types, temperature, a workday indicator variable, and an error term that has a seasonally varying spatial covariance. We fit the model using likelihood methods, and we compare our predictions to observations from a withheld dataset and to those predictions of CMAQ, the deterministic model used by EPA to assess emission control programs. We find that the model predictions based on the mean trend and the random deviations from this mean outperform CMAQ predictions according to multiple criteria, but predictions based on the mean trend alone underperform CMAQ predictions.
122

Robust Estimation via Measurement Error Modeling

Wang, Qiong 16 August 2005 (has links)
We introduce a new method to robustifying inference that can be applied in any situation where a parametric likelihood is available. The key feature is that data from the postulated parametric models are assumed to be measured with error where the measurement error distribution is chosen to produce the occasional gross errors found in data. We show that the tails of the error-contamination model control the properties (boundedness, redescendingness) of the resulting influence functions, with heavier tails in the error contamination model producing more robust estimators. In the application to location-scale models with independent and identically distributed data, the resulting analytically-intractable likelihoods are approximated via Monte Carlo integration. In the application to time series models, we propose a Bayesian approach to the robust estimation of time series parameters. We use Markov Chain Monte Carlo (MCMC) to estimate the parameters of interest and also the gross errors. The latter are used as outlier diagnostics.
123

Orthology-Based Multilevel Modeling of Differentially Expressed Mouse and Human Gene Pairs

Ogorek, Benjamin Alexander 21 August 2008 (has links)
There is great interest in finding human genes expressed through pharmaceutical intervention, thus opening a genomic window into benefit and side-effect profiles of a drug. Human insight gained from FDA-required animal experiments has historically been limited, but in the case of gene expression measurements, proposed biological orthologies between mouse and human genes provide a foothold for animal-to-human extrapolation. We have investigated a five-component, multilevel, bivariate normal mixture model that incorporates mouse, as well as human, gene expression data. The goal is two-fold: to increase human differential gene-finding power; and to find a subclass of gene pairs for which there is a direct exploitable relationship between animal and human genes. In simulation studies, the dual-species model boasted impressive gains in differential gene-finding power over a related marginal model using only human data. Bias in parameter estimation was problematic, however, and occasionally led to failures in control of the false discovery rate. Though it was considerably more difficult to find species-extrapolative gene-pairs (than differentially expressed human genes), simulation experiments deemed it to be possible, especially when traditional FDR controls are relaxed and under hypothetical parameter configurations.
124

Contagion in Financial Markets: Two Statistical Approaches

Rao, Harshavardhana 17 August 2004 (has links)
Financial markets in different countries undergo crises at one point in time or another. These crises can have different causes but they could affect other markets due to trade relations and capital mobility. Some crises affect markets in other countries more than what market fundamentals would dictate. We will model this phenomenon, also defined as contagion, using two approaches viz., one-factor model and volatility spillover, and compare these approaches.
125

Variations on the Accelerated Failure Time Model: Mixture Distributions, Cure Rates, and Different Censoring Scenarios

Krachey, Elizabeth Catherine 06 October 2009 (has links)
The accelerated failure time (AFT) model is a popular model for time-to-event data. It provides a useful alternative when the proportional hazards assumption is in question and it provides an intuitive linear regression interpretation where the logarithm of the survival time is regressed on the covariates. We have explored several deviations from the standard AFT model. Standard survival analysis assumes that in the case of perfect follow-up, every patient will eventually experience the event of interest. However, in some clinical trials, a number of patients may never experience such an event, and in essence, are considered cured from the disease. In such a scenario, the Kaplan-Meier survival curve will level off at a nonzero proportion. Hence there is a window of time in which most or all of the events occur, while heavy censoring occurs in the tail. The two-component mixture cure model provides a means of adjusting the AFT model to account for this cured fraction. Chapters 1 and 2 propose parametric and semiparametric estimation procedures for this cure rate AFT model. Survival analysis methods for interval-censoring have been much slower to develop than for the right-censoring case. This is in part because interval-censored data have a more complex censoring mechanism and because the counting process theory developed for right-censored data does not generalize or extend to interval-censored data. Because of the analytical difficulty associated with interval-censored data, recent estimation strategies have focused on the implementation rather than the large sample theoretical justifications of the semiparametric AFT model. Chapter 3 proposes a semiparametric Bayesian estimation procedure for the AFT model under interval-censored data.
126

Age-Dependent Tag Return Models for Estimating Fishing Mortality, Natural Mortality and Selectivity

Jiang, Honghua 09 September 2005 (has links)
We extend the instantaneous rates formulation of fisheries tag return models to allow for age-dependence of fishing mortality rates in Chapter 1. This is important in many applications where tagged fish vary over a large range of ages (and sizes). We focus on a model based on assuming selectivity by age is constant over years and that above a certain age selectivity is fixed at 1. We show that it is possible to allow natural mortality, M, to vary by age and year. We allow for incomplete mixing of tagged fish and for fisheries to be pulse, continuous or continuous over part of the year. We focus on the case where all or most age classes are tagged each year. We investigate model identifiability and how well parameters can be estimated using analytic and simulation methods. Results show that some models with the tag reporting rate estimated are singular or near-singular. The age-length key method commonly used for age specification may produce substantial errors in converting size to age, especially for the older fish. To reduce such errors, in Chapter 2 we propose two alternative sampling designs to the standard one of tagging all age classes: one where only age 1 fish are tagged, another where both age 1 and age 2 fish are tagged. Catch-and-release fisheries have become very important to the management of overexploited recreational fish stocks. Tag return studies where the tag is removed regardless of fish disposition have been used to assess the effectiveness of restoration efforts for these catch-and-release fisheries. In Chapter 3, we extend the instantaneous rate formulation of tag return models introduced in Chapter 1 to catch-and-release tagging studies. We illustrate the methods using multiple age class tag return data on striped bass (Morone saxatilis) from the Maryland Department of Natural Resources (MDNR). We found evidence that M is age dependent and that M has increased since 1999 possibly due to an outbreak of the disease (mycobacteriosis) in striped bass in the Chesapeake Bay.
127

Bayesian Analysis of Circular Data Using Wrapped Distributions

Ravindran, Palanikumar 29 October 2002 (has links)
Circular data arise in a number of different areas such as geological, meteorological, biological and industrial sciences. We cannot use standard statistical techniques to model circular data, due to the circular geometry of the sample space. One of the common methods used to analyze such data is the wrapping approach. Using the wrapping approach, we assume that, by wrapping a probability distribution from the real line onto the circle, we obtain the probability distribution for circular data. This approach creates a vast class of probability distributions that are flexible to account for different features of circular data. However, the likelihood-based inference for such distributions can be very complicated and computationally intensive. The EM algorithm used to compute the MLE is feasible, but is computationally unsatisfactory. Instead, we use Markov Chain Monte Carlo (MCMC) methods with a data augmentation step, to overcome such computational difficulties. Given a probability distribution on the circle, we assume that the original distribution was distributed on the real line, and then wrapped onto the circle. If we can "unwrap" the distribution off the circle and obtain a distribution on the real line, then the standard statistical techniques for data on the real line can be used. Our proposed methods are flexible and computationally efficient to fit a wide class of wrapped distributions. Furthermore, we can easily compute the usual summary statistics. We present extensive simulation studies to validate the performance of our method. We apply our method to several real data sets and compare our results to parameter estimates available in the literature. We find that the Wrapped Double Exponential family produces robust parameter estimates with good frequentist coverage probability. We extend our method to the regression model. As an example, we analyze the association between ozone data and wind direction. A major contribution of this dissertation is to illustrate a technique to interpret the circular regression coefficients in terms of the linear regression model setup. Regression diagnostics can be developed after augmenting wrapping numbers to the circular data (refer Section 3.5). We extend our method to fit time-correlated data. We can compute other statistics such as circular autocorrelation functions and their standard errors very easily. We use the Wrapped Normal model to analyze the hourly wind directions, which is an example of the time series circular data.
128

Family-based methods which rely on association for the mapping of genes in human populations

Monks, Stephanie Ann 14 May 1999 (has links)
<p>Family-based tests, that employ association between alleles at a marker locus and a trait locus, have proven useful for the localization of genes in human genomes. Many existing tests have been derived as extensions of the transmission/disequilibrium test (TDT), which was originally introduced as a test of linkage in the presence of association for a susceptibility locus. One of these tests, the sib-TDT or S-TDT, makes use of genetic information from sibships containing at least one diseased and nondiseased child and is defined for a biallelic marker. We propose an extension of the S-TDT to a multiallelic marker and provide evidence that the chi-square distribution can be used to measure statistical signicance. The test is compared to three contemporary extensions for a multiallelic marker. We also present a test for a multiallelic marker that combines data from families with and without parental genetic information. <BR><BR>Next, tests of linkage and association are developed for a quantitative trait that utilize families, without restrictions on the number of children per family that can be used in the analysis. Tests are introduced that can be used on family data with parent and child genotypes, only child genotypes, or a combination of these types of families. Equations are derived that allow one to determine the sample size needed to achieve desired power. Through simulation, we demonstrate that existing tests have an elevated false-positive rate, when the size restrictions are not followed, and that a good deal of information is lost by adhering to the size restrictions. Permutation procedures are introduced that are recommended for small samples, but can also be used for extensions of the tests to multiallelic markers and to the simultaneous use of more than one marker. <BR><BR>Finally, resampling procedures for existing tests are explored. The resampling procedures reduce families to contain the number of children allowable for a valid test of linkage and association. We show that our tests are equivalent to the use of within cluster resampling for the existing tests, but that differences exist when resampling is performed on the basis of phenotypically extreme individuals.<P>
129

STATISTICAL ANALYSIS OF GENETIC ASSOCIATIONS

Zaykin, Dmitri V. 30 September 1999 (has links)
<p>Zaykin, Dmitri V. Statistical Analysis of Genetic Associations.Advisor: Bruce S. Weir.There is an increasing need for a statistical treatment of geneticdata prompted by recent advances in molecular genetics and moleculartechnology. Study of associations between genes is one of the mostimportant aspects in applications of population genetics theory andstatistical methodology to genetic data. Developments of these methodsare important for conservation biology, experimental populationgenetics, forensic science, and for mapping human disease genes. Overthe next several years, genotypic data will be collected to attemptlocating positions of multiple genes affecting disease phenotype.Adequate statistical methodology is required to analyze thesedata. Special attention should be paid to multiple testing issuesresulting from searching through many genetic markers and high risk offalse associations. In this research we develop theory and methodsneeded to treat some of these problems. We introduce exact conditionaltests for analyzing associations within and between genes in samplesof multilocus genotypes and efficient algorithms to perform them.These tests are formulated for the general case of multiple alleles atarbitrary numbers of loci and lead to multiple testing adjustmentsbased on the closing testing principle, thus providing strongprotection of the family-wise error rate. We discuss an applicationof the closing method to the testing for Hardy-Weinberg equilibriumand computationally efficient shortcuts arising from methods forcombining p-values that allow to deal with large numbers of loci. Wealso discuss efficient Bayesian tests for heterozygote excess anddeficiency, as a special case of testing for Hardy-Weinbergequilibrium, and the frequentist properties of a p-value type ofquantity resulting from them. We further develop new methods forvalidation of experiments and for combining and adjusting independentand correlated p-values and apply them to simulated as well as toactual gene expression data sets. These methods prove to be especiallyuseful in situations with large numbers of statistical tests, such asin whole-genome screens for associations of genetic markers withdisease phenotypes and in analyzing gene expression data obtained fromDNA microarrays.<P>
130

GENERAL ZERO-INFLATED MODELS AND THEIR APPLICATIONS

Gan, Nianci 31 March 2000 (has links)
<p>Count data with excess zeros are commonly seen in experiments forimproving electronics manufacturing quality, in medical researchof HIV patients with high-risk behaviors and in agricultural study of number of insects per leaf.Yip (1988) and Lambert (1992) proposed zero-inflated Poisson distribution andHeilbron (1989) used zero-altered Poisson and negative binomial distributionsto model this type of data. Li, Lu, Park, Kim, Brinkley and Peterson (1999)derived multivariate version of the zero-inflated Poisson distribution andapplied it to detect equipment problems in electronics manufacturingprocesses.<p>Zero-inflated distributions assume that with probability 1 - p the onlypossible observation is 0, and with probability p, a random variabledescribing defect counts in the imperfect state is observed. For example, when manufacturing equipment is properlyaligned (perfect state), there may be no defects. Otherwise, defects may occuraccording to a distribution of the imperfect state. The defect counts inimperfect statecould follow Poisson, negative binomial, or other distributions but most of the current researches use Poisson distribution. Although the maximum likelihood (ML) method is widely used in estimatingparameters in the zero-inflated distributions, there is no theoreticalstudy on the properties of the ML estimates.In Chapter 1, we propose a generalframework for generalized zero-inflated models (ZIM), which assume only thatthe distribution of the imperfect state has the support of the nonnegativeintegers and satisfies appropriate regularity conditions. We study the properties of the ML estimates of ZIM parameters,including their existence, uniqueness, strong consistencyand asymptotic normality under regularity conditions. By focusing on the univariate ZIM, we give detailedrigorous proofs to the lemmas and theorems stated in the thesis. Then, we study covariate effects in the univariate and multivariate zero-inflated regression models. Because the zero-inflated model involves both Bernoulli parameter p and the imperfect state parameter lambda,building the model separately does not use the information efficiently and the resulted model is more complicated than needed. This problem gets worse in the multivariate ZIM, where the number of model terms increases drastically. Our procedure selects limited important model terms to maximize the ZIM likelihood functions. <p>In Chapter 2, we review current researches on zero-inflated Poissonmodels. Some new results on multivariate Poisson and multivariate zero-inflated Poisson distributions are given. By generalizing theresults in Lambert (1992) and Li, et al (1999), we propose a multivariatezero-inflated Poisson regression model. An example from Nortel process development research is used to illustrate the model selection procedure for the zero-inflated regression models and computational details. <P>

Page generated in 0.2865 seconds