• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 819
  • 411
  • 122
  • 95
  • 43
  • 31
  • 24
  • 22
  • 17
  • 15
  • 14
  • 13
  • 12
  • 10
  • 8
  • Tagged with
  • 1897
  • 415
  • 359
  • 342
  • 208
  • 190
  • 180
  • 168
  • 146
  • 144
  • 140
  • 133
  • 119
  • 117
  • 115
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
301

The Growth Curve Model for High Dimensional Data and its Application in Genomics

Jana, Sayantee 04 1900 (has links)
<p>Recent advances in technology have allowed researchers to collect high-dimensional biological data simultaneously. In genomic studies, for instance, measurements from tens of thousands of genes are taken from individuals across several experimental groups. In time course microarray experiments, gene expression is measured at several time points for each individual across the whole genome resulting in massive amount of data. In such experiments, researchers are faced with two types of high-dimensionality. The first is global high-dimensionality, which is common to all genomic experiments. The global high-dimensionality arises because inference is being done on tens of thousands of genes resulting in multiplicity. This challenge is often dealt with statistical methods for multiple comparison, such as the Bonferroni correction or false discovery rate (FDR). We refer to the second type of high-dimensionality as gene specific high-dimensionality, which arises in time course microarry experiments due to the fact that, in such experiments, sample size is often smaller than the number of time points ($n</p> <p>In this thesis, we use the growth curve model (GCM), which is a generalized multivariate analysis of variance (GMANOVA) model, and propose a moderated test statistic for testing a special case of the general linear hypothesis, which is specially useful for identifying genes that are expressed. We use the trace test for the GCM and modify it so that it can be used in high-dimensional situations. We consider two types of moderation: the Moore-Penrose generalized inverse and Stein's shrinkage estimator of $ S $. We performed extensive simulations to show performance of the moderated test, and compared the results with original trace test. We calculated empirical level and power of the test under many scenarios. Although the focus is on hypothesis testing, we also provided moderated maximum likelihood estimator for the parameter matrix and assessed its performance by investigating bias and mean squared error of the estimator and compared the results with those of the maximum likelihood estimators. Since the parameters are matrices, we consider distance measures in both power and level comparisons as well as when investigating bias and mean squared error. We also illustrated our approach using time course microarray data taken from a study on Lung Cancer. We were able to filter out 1053 genes as non-noise genes from a pool of 22,277 genes which is approximately 5\% of the total number of genes. This is in sync with results from most biological experiments where around 5\% genes are found to be differentially expressed.</p> / Master of Science (MSc)
302

Some Advanced Semiparametric Single-index Modeling for Spatially-Temporally Correlated Data

Mahmoud, Hamdy F. F. 09 October 2014 (has links)
Semiparametric modeling is a hybrid of the parametric and nonparametric modelings where some function forms are known and others are unknown. In this dissertation, we have made several contributions to semiparametric modeling based on the single index model related to the following three topics: the first is to propose a model for detecting change points simultaneously with estimating the unknown function; the second is to develop two models for spatially correlated data; and the third is to further develop two models for spatially-temporally correlated data. To address the first topic, we propose a unified approach in its ability to simultaneously estimate the nonlinear relationship and change points. We propose a single index change point model as our unified approach by adjusting for several other covariates. We nonparametrically estimate the unknown function using kernel smoothing and also provide a permutation based testing procedure to detect multiple change points. We show the asymptotic properties of the permutation testing based procedure. The advantage of our approach is demonstrated using the mortality data of Seoul, Korea from January, 2000 to December, 2007. On the second topic, we propose two semiparametric single index models for spatially correlated data. One additively separates the nonparametric function and spatially correlated random effects, while the other does not separate the nonparametric function and spatially correlated random effects. We estimate these two models using two algorithms based on Markov Chain Expectation Maximization algorithm. Our approaches are compared using simulations, suggesting that the semiparametric single index nonadditive model provides more accurate estimates of spatial correlation. The advantage of our approach is demonstrated using the mortality data of six cities, Korea from January, 2000 to December, 2007. The third topic involves proposing two semiparametric single index models for spatially and temporally correlated data. Our first model has the nonparametric function which can separate from spatially and temporally correlated random effects. We refer it to "semiparametric spatio-temporal separable single index model (SSTS-SIM)", while the second model does not separate the nonparametric function from spatially correlated random effects but separates the time random effects. We refer our second model to "semiparametric nonseparable single index model (SSTN-SIM)". Two algorithms based on Markov Chain Expectation Maximization algorithm are introduced to simultaneously estimate parameters, spatial effects, and times effects. The proposed models are then applied to the mortality data of six major cities in Korea. Our results suggest that SSTN-SIM is more flexible than SSTS-SIM because it can estimate various nonparametric functions while SSTS-SIM enforces the similar nonparametric curves. SSTN-SIM also provides better estimation and prediction. / Ph. D.
303

Treatment heterogeneity and potential outcomes in linear mixed effects models

Richardson, Troy E. January 1900 (has links)
Doctor of Philosophy / Department of Statistics / Gary L. Gadbury / Studies commonly focus on estimating a mean treatment effect in a population. However, in some applications the variability of treatment effects across individual units may help to characterize the overall effect of a treatment across the population. Consider a set of treatments, {T,C}, where T denotes some treatment that might be applied to an experimental unit and C denotes a control. For each of N experimental units, the duplet {r[subscript]i, r[subscript]Ci}, i=1,2,…,N, represents the potential response of the i[superscript]th experimental unit if treatment were applied and the response of the experimental unit if control were applied, respectively. The causal effect of T compared to C is the difference between the two potential responses, r[subscript]Ti- r[subscript]Ci. Much work has been done to elucidate the statistical properties of a causal effect, given a set of particular assumptions. Gadbury and others have reported on this for some simple designs and primarily focused on finite population randomization based inference. When designs become more complicated, the randomization based approach becomes increasingly difficult. Since linear mixed effects models are particularly useful for modeling data from complex designs, their role in modeling treatment heterogeneity is investigated. It is shown that an individual treatment effect can be conceptualized as a linear combination of fixed treatment effects and random effects. The random effects are assumed to have variance components specified in a mixed effects “potential outcomes” model when both potential outcomes, r[subscript]T,r[subscript]C, are variables in the model. The variance of the individual causal effect is used to quantify treatment heterogeneity. Post treatment assignment, however, only one of the two potential outcomes is observable for a unit. It is then shown that the variance component for treatment heterogeneity becomes non-estimable in an analysis of observed data. Furthermore, estimable variance components in the observed data model are demonstrated to arise from linear combinations of the non-estimable variance components in the potential outcomes model. Mixed effects models are considered in context of a particular design in an effort to illuminate the loss of information incurred when moving from a potential outcomes framework to an observed data analysis.
304

The financial crisis and household savings in South Africa : An econometric analysis / Itumeleng Pleasure Mongale

Mongale, Itumeleng Pleasure January 2012 (has links)
The "global" financial crisis (GFC) emerged during 2008 and it was mainly triggered by the sub-prime mortgage crisis (SMC) in the United States of America. The main aims of this thesis is to conduct an econometric analysis of the financial crisis and household savings in South Africa and also to provide a rationale that will facilitate a policy attention on Domestic Resource Mobilisation (DRM) through household savings. The study uses quarterly time series data for the period 199401 to 201102 obtained on-line from the South African Reserve Bank (SARB). The research is based on the Keynesian saving function, which is a complement of the consumption function. The model will be estimated by using a cointegrating vector autoregressive (CVAR) framework, which allows for endogeneity of the regressors. To check robustness on the cointegration results, the study employs the second empirical technique based on Generalized Impulse Response Function (GIRF) analysis and Variance Decomposition. The regression equation of household savings is expressed as a function of household disposable income, household debt to disposable income, real GOP, interest rate, inflation rate and foreign savings. The variables are tested for the presence of a unit root by the application of the Augmented Dickey-Fuller (AOF), Phillips-Perron (PP) Kwiatkowski, Phillips, Schmidt and Shin (KPSS) tests. The findings of the study are that all variables have unit roots. The cointegration model emphasises the presence of a long run equilibrium relationship between dependent and independent variables. The CVAR reveals the short run of the dynamic household savings model. Taking this into consideration, the study concludes that household debt has a huge influence on the level of household savings. The econometric analysis also revealed that household savings in South Africa actually improved during the period associated with the GFC. It could be postulated that South African households responded to their deteriorating financial situations by reducing their average spending and increasing their savings. Variance decomposition analysis revealed that 'own shocks' constitute the predominant source of variations in household saving therefore household savings can be explained by the disturbances in macroeconomic variables in the study. The study recommends the promotion of household savings and economic growth in order to reduce the dependence of South Africa on foreign savings. DRM is therefore enhanced by a higher level of household savings, which can facilitate higher levels of investment and economic growth. / Thesis (PhD (Economics) North-West University, Mafikeng Campus, 2012
305

A case study in handling over-dispersion in nematode count data

Kreider, Scott Edwin Douglas January 1900 (has links)
Master of Science / Department of Statistics / Leigh W. Murray / Traditionally the Poisson process is used to model count response variables. However, a problem arises when the particular response variable contains an inordinate number of both zeros and large observations, relative to the mean, for a typical Poisson process. In cases such as these, the variance of the data is greater than the mean and as such the data are over-dispersed with respect to the Poisson distribution due to the fact that the mean equals the variance for the Poisson distribution. This case study looks at several common and uncommon ways to attempt to properly account for this over-dispersion in a specific set of nematode count data using various procedures in SAS 9.2. These methods include but are not limited to a basic linear regression model, a generalized linear (log-linear) model, a zero-inflated Poisson model, a generalized Poisson model, and a Poisson hurdle model. Based on the AIC statistics the generalized log-linear models with the Pearson-scale and deviance-scale corrections perform the best. However, based on residual plots, none of the models appear to fit the data adequately. Further work with non-parametric methods or the negative binomial distribution may yield more ideal results.
306

An empirical comparison of extreme value modelling procedures for the estimation of high quantiles

Engberg, Alexander January 2016 (has links)
The peaks over threshold (POT) method provides an attractive framework for estimating the risk of extreme events such as severe storms or large insurance claims. However, the conventional POT procedure, where the threshold excesses are modelled by a generalized Pareto distribution, suffers from small samples and subjective threshold selection. In recent years, two alternative approaches have been proposed in the form of mixture models that estimate the threshold and a folding procedure that generates larger tail samples. In this paper the empirical performances of the conventional POT procedure, the folding procedure and a mixture model are compared by modelling data sets on fire insurance claims and hurricane damage costs. The results show that the folding procedure gives smaller standard errors of the parameter estimates and in some cases more stable quantile estimates than the conventional POT procedure. The mixture model estimates are dependent on the starting values in the numerical maximum likelihood estimation, and are therefore difficult to compare with those from the other procedures. The conclusion is that none of the procedures is overall better than the others but that there are situations where one method may be preferred.
307

GENERALIZATIONS OF AN INVERSE FREE KRYLOV SUBSPACE METHOD FOR THE SYMMETRIC GENERALIZED EIGENVALUE PROBLEM

Quillen, Patrick D. 01 January 2005 (has links)
Symmetric generalized eigenvalue problems arise in many physical applications and frequently only a few of the eigenpairs are of interest. Typically, the problems are large and sparse, and therefore traditional methods such as the QZ algorithm may not be considered. Moreover, it may be impractical to apply shift-and-invert Lanczos, a favored method for problems of this type, due to difficulties in applying the inverse of the shifted matrix. With these difficulties in mind, Golub and Ye developed an inverse free Krylov subspace algorithm for the symmetric generalized eigenvalue problem. This method does not rely on shift-and-invert transformations for convergence acceleration, but rather a preconditioner is used. The algorithm suffers, however, in the presence of multiple or clustered eigenvalues. Also, it is only applicable to the location of extreme eigenvalues. In this work, we extend the method of Golub and Ye by developing a block generalization of their algorithm which enjoys considerably faster convergence than the usual method in the presence of multiplicities and clusters. Preconditioning techniques for the problems are discussed at length, and some insight is given into how these preconditioners accelerate the method. Finally we discuss a transformation which can be applied so that the algorithm extracts interior eigenvalues. A preconditioner based on a QR factorization with respect to the B-1 inner product is developed and applied in locating interior eigenvalues.
308

EMPIRICAL PROCESSES AND ROC CURVES WITH AN APPLICATION TO LINEAR COMBINATIONS OF DIAGNOSTIC TESTS

Chirila, Costel 01 January 2008 (has links)
The Receiver Operating Characteristic (ROC) curve is the plot of Sensitivity vs. 1- Specificity of a quantitative diagnostic test, for a wide range of cut-off points c. The empirical ROC curve is probably the most used nonparametric estimator of the ROC curve. The asymptotic properties of this estimator were first developed by Hsieh and Turnbull (1996) based on strong approximations for quantile processes. Jensen et al. (2000) provided a general method to obtain regional confidence bands for the empirical ROC curve, based on its asymptotic distribution. Since most biomarkers do not have high enough sensitivity and specificity to qualify for good diagnostic test, a combination of biomarkers may result in a better diagnostic test than each one taken alone. Su and Liu (1993) proved that, if the panel of biomarkers is multivariate normally distributed for both diseased and non-diseased populations, then the linear combination, using Fisher's linear discriminant coefficients, maximizes the area under the ROC curve of the newly formed diagnostic test, called the generalized ROC curve. In this dissertation, we will derive the asymptotic properties of the generalized empirical ROC curve, the nonparametric estimator of the generalized ROC curve, by using the empirical processes theory as in van der Vaart (1998). The pivotal result used in finding the asymptotic behavior of the proposed nonparametric is the result on random functions which incorporate estimators as developed by van der Vaart (1998). By using this powerful lemma we will be able to decompose an equivalent process into a sum of two other processes, usually called the brownian bridge and the drift term, via Donsker classes of functions. Using a uniform convergence rate result given by Pollard (1984), we derive the limiting process of the drift term. Due to the independence of the random samples, the asymptotic distribution of the generalized empirical ROC process will be the sum of the asymptotic distributions of the decomposed processes. For completeness, we will first re-derive the asymptotic properties of the empirical ROC curve in the univariate case, using the same technique described before. The methodology is used to combine biomarkers in order to discriminate lung cancer patients from normals.
309

Robust Diagnostics for the Logistic Regression Model With Incomplete Data

范少華 Unknown Date (has links)
Atkinson 及 Riani 應用前進搜尋演算法來處理百牡利資料中所包含的多重離群值(2001)。在這篇論文中,我們沿用相同的想法來處理在不完整資料下一般線性模型中的多重離群值。這個演算法藉由先填補資料中遺漏的部分,再利用前進搜尋演算法來確認資料中的離群值。我們所提出的方法可以解決處理多重離群值時常會遇到的遮蓋效應。我們應用了一些真實資料來說明這個演算法並得到令人滿意結果。 / Atkinson and Riani (2001) apply the forward search algorithm to deal with the problem of the detection of multiple outliers in binomial data. In this thesis, we extend the similar idea to identify multiple outliers for the generalized linear models when part of data are missing. The algorithm starts with imputation method to fill-in the missing observations in the data, and then use the forward search algorithm to confirm outliers. The proposed method can overcome the masking effect, which commonly occurs when multiple outliers exit in the data. Real data are used to illustrate the procedure, and satisfactory results are obtained.
310

Economic and technological performances of international firms

Cincera, Michele 29 April 1998 (has links)
The research performed throughout this dissertation aims at implementing quantitative methods in order to assess economic and technological performances of firms, i.e. it tries to assess the impacts of the determinants of technological activity on the results of this activity. For this purpose, a representative sample of the most important R&D firms in the world is constituted. The micro-economic nature of the analysis, as well as its international dimension are two main features of this research at the empirical level. The second chapter illustrates the importance of R&D investments, patenting activities and other measures of technological activities performed by firms over the last 10 years. The third chapter describes the main features as well as the construction of the database. The raw data sample consists of comparable detailed micro-level data on 2676 large manufacturing firms from several countries. These firms have reported important R&D expenditures over the period 1980-1994. The fourth chapter explores the dynamic structure of the patent-R&D relationship by considering the number of patent applications as a function of present and lagged levels of R&D expenditures. R&D spillovers as well as technological and geographical opportunities are taken into account as additional determinants in order to explain patenting behaviours. The estimates are based on recently developed econometric techniques that deal with the discrete non-negative nature of the dependent patent variable as well as the simultaneity that can arise between the R&D decisions and patenting. The results show evidence of a rather contemporaneous impact of R&D activities on patenting. As far as R&D spillovers are concerned, these externalities have a significantly higher impact on patenting than own R&D. Furthermore, these effects appear to take more time, three years on average, to show up in patents. The fifth chapter explores the contribution of own stock of R&D capital to productivity performance of firms. To this end the usual productivity residual methodology is implemented. The empirical section presents a first set of results which replicate the analysis of previous studies and tries to assess the robustness of the findings with regard to the above issues. Then, further results, based on different sub samples of the data set, investigate to what extent the R&D contribution on productivity differs across firms of different industries and geographic areas or between small and large firms and low and high-tech firms. The last section explores more carefully the simultaneity issue. On the whole, the estimates indicate that R&D has a positive impact on productivity performances. Yet, this contribution is far from being homogeneous across the different dimensions of data or according to the various assumptions retained in the productivity model. The last empirical chapter goes deeper into the analysis of firms' productivity increases, by considering besides own R&D activities the impact of technological spillovers. The chapter begins by surveying the alternative ways proposed in the literature in order to asses the effect of R&D spillovers on productivity. The main findings reported by some studies at the micro level are then outlined. Then, the framework to formalize technological externalities and other technological determinants is exposed. This framework is based on a positioning of firms into a technological space using their patent distribution across technological fields. The question of whether the externalities generated by the technological and geographic neighbours are different on the recipient's productivity is also addressed by splitting the spillover variable into a local and national component. Then, alternative measures of technological proximity are examined. Some interesting observations emerge from the empirical results. First, the impact of spillovers on productivity increases is positive and much more important than the contribution of own R&D. Second, spillover effects are not the same according to whether they emanate from firms specialized in similar technological fields or firms more distant in the technological space. Finally, the magnitude and direction of these effects are radically different within and between the pillars of the Triad. While European firms do not appear to particularly benefit from both national and international sources of spillovers, US firms are mainly receptive to their national stock and Japanese firms take advantage from the international stock.

Page generated in 0.0538 seconds