• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 35
  • 15
  • 5
  • 3
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 282
  • 282
  • 101
  • 98
  • 81
  • 67
  • 67
  • 45
  • 39
  • 38
  • 37
  • 37
  • 35
  • 32
  • 30
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Correction of Bias in Estimating Autocovariance Function

Wu, Len-Hong 01 May 1983 (has links)
The purpose of this thesis was to evaluate a method for reducing the bias of estimation for autocovariance estimators. Two methods are compared, one is the standard method and the other is an adjustment method. The Monte Carlo method is used within comparison. The bias and the mean squared error of the estimated autocovariance is computed for several time series models and two variations of the adjustment method of estimation. The results indicate some improvement in bias and mean squared error for the new method.
22

A Comparison of Unsupervised Methods for DNA Microarray Leukemia Data

Harness, Denise 05 April 2018 (has links) (PDF)
Advancements in DNA microarray data sequencing have created the need for sophisticated machine learning algorithms and feature selection methods. Probabilistic graphical models, in particular, have been used to identify whether microarrays or genes cluster together in groups of individuals having a similar diagnosis. These clusters of genes are informative, but can be misleading when every gene is used in the calculation. First feature reduction techniques are explored, however the size and nature of the data prevents traditional techniques from working efficiently. Our method is to use the partial correlations between the features to create a precision matrix and predict which associations between genes are most important to predicting Leukemia diagnosis. This technique reduces the number of genes to a fraction of the original. In this approach, partial correlations are then extended into a spectral clustering approach. In particular, a variety of different Laplacian matrices are generated from the network of connections between features, and each implies a graphical network model of gene interconnectivity. Various edge and vertex weighted Laplacians are considered and compared against each other in a probabilistic graphical modeling approach. The resulting multivariate Gaussian distributed clusters are subsequently analyzed to determine which genes are activated in a patient with Leukemia. Finally, the results of this are compared against other feature engineering approaches to assess its accuracy on the Leukemia data set. The initial results show the partial correlation approach of feature selection predicts the diagnosis of a Leukemia patient with almost the same accuracy as using a machine learning algorithm on the full set of genes. More calculations of the precision matrix are needed to ensure the set of most important genes is correct. Additionally more machine learning algorithms will be implemented using the full and reduced data sets to further validate the current prediction accuracy of the partial correlation method.
23

STATISTICAL METHODS FOR VARIABLE SELECTION IN THE CONTEXT OF HIGH-DIMENSIONAL DATA: LASSO AND EXTENSIONS

Yang, Xiao Di 10 1900 (has links)
<p>With the advance of technology, the collection and storage of data has become routine. Huge amount of data are increasingly produced from biological experiments. the advent of DNA microarray technologies has enabled scientists to measure expressions of tens of thousands of genes simultaneously. Single nucleotide polymorphism (SNP) are being used in genetic association with a wide range of phenotypes, for example, complex diseases. These high-dimensional problems are becoming more and more common. The "large p, small n" problem, in which there are more variables than samples, currently a challenge that many statisticians face. The penalized variable selection method is an effective method to deal with "large p, small n" problem. In particular, The Lasso (least absolute selection and shrinkage operator) proposed by Tibshirani has become an effective method to deal with this type of problem. the Lasso works well for the covariates which can be treated individually. When the covariates are grouped, it does not work well. Elastic net, group lasso, group MCP and group bridge are extensions of the Lasso. Group lasso enforces sparsity at the group level, rather than at the level of the individual covariates. Group bridge, group MCP produces sparse solutions both at the group level and at the level of the individual covariates within a group. Our simulation study shows that the group lasso forces complete grouping, group MCP encourages grouping to a rather slight extent, and group bridge is somewhere in between. If one expects that the proportion of nonzero group members to be greater than one-half, group lasso maybe a good choice; otherwise group MCP would be preferred. If one expects this proportion to be close to one-half, one may wish to use group bridge. A real data analysis example is also conducted for genetic variation (SNPs) data to find out the associations between SNPs and West Nile disease.</p> / Master of Science (MSc)
24

Contributions to estimation and interpretation of intervention effects and heterogeneity in meta-analysis

Thorlund, Kristian 10 1900 (has links)
<p><strong><em>Background and objectives</em></strong><strong> </strong></p> <p>Despite great statistical advances in meta-analysis methodology, most published meta-analyses make use of out-dated statistical methods and authors are unaware of the shortcomings associated with the widely employed methods. There is a need for statistical contributions to meta-analysis where: 1) improvements to current statistical practice in meta-analysis are conveyed at the level that most systematic review authors will be able to understand; and where: 2) current statistical methods that are widely applied in meta-analytic practice undergo thorough testing and examination. The objective of this thesis is to address some of this demand.</p> <p><strong><em>Methods</em></strong></p> <p>Four studies were conducted that would each meet one or both of the objectives. Simulation was used to explore the number of patients and events required to limit the risk of overestimation of intervention effects to ‘acceptable’ levels. Empirical assessment was used to explore the performance of the popular measure of heterogeneity, <em>I<sup>2</sup></em>, and its associated 95% confidence intervals (CIs) as evidence accumulates. Empirical assessment was also used to compare inferential agreement between the widely used DerSimonian-Laird random-effects model and four alternative models. Lastly, a narrative review was undertaken to identify and appraise available methods for combining health related quality of life (HRQL) outcomes.</p> <p><strong><em>Results and conclusion</em></strong></p> <p>The information required to limit the risk of overestimation of intervention effects is typically close to what is known as the optimal information size (OIS, i.e., the required meta-analysis sample size). <em>I<sup>2</sup> </em>estimates fluctuate considerably in meta-analyses with less than 15 trials and 500 events; their 95% confidence intervals provide desired coverage. The choice of random-effects has ignorable impact on the inferences about the intervention effect, but not on inferences about the degree of heterogeneity. Many approaches are available for pooling HRQL outcomes. Recommendations are provided to enhance interpretability. Overall, each manuscript met at least one thesis objective.</p> / Doctor of Philosophy (PhD)
25

Robust Estimation of Autoregressive Conditional Duration Models

El, Sebai S Rola 10 1900 (has links)
<p>In this thesis, we apply the Ordinary Least Squares (OLS) and the Generalized Least Squares (GLS) methods for the estimation of Autoregressive Conditional Duration (ACD) models, as opposed to the typical approach of using the Quasi Maximum Likelihood Estimation (QMLE).</p> <p>The advantages of OLS and GLS as the underlying methods of estimation lie in their theoretical ease and computational convenience. The latter property is crucial for high frequency trading, where a transaction decision needs to be made within a minute. We show that both OLS and GLS estimates are asymptotically consistent and normally distributed. The normal approximation does not seem to be satisfactory in small samples. We also apply Residual Bootstrap to construct the confidence intervals based on the OLS and GLS estimates. The properties of the proposed methods are illustrated with intensive numerical simulations as well as by a case study on the IBM transaction data.</p> / Master of Science (MSc)
26

Predicting Customer Satisfaction from Dental Implants Perception Data

Elmassad, Omnya January 2013 (has links)
<p>In recent years, measuring customer satisfaction has become one of the key concerns of market research studies. One of the basic features of leading companies is their success in fulfilling their customers’ demands. For that reason, companies attempt to find out what essential factors dominate their customers’ purchasing habits.</p> <p>Millennium Research Group (MRG) - a global authority on medical tech- nology market intelligence - uses a web-based survey tool to collect informa- tion about customers’ level of satisfaction. One of their surveys is designed to gather information about the practitioner’s level of satisfaction on different brands of dental implants. The Dental Implants dataset obtained from the survey tool has thirty-four attributes, and practitioners were asked to rank or specify their level of satisfaction by assigning a score to each attribute.</p> <p>The basic question asked by the company was whether the attributes were useful to make customer behavior predictions. The aim of this study is to assess the reliability and accuracy of these measures and to build a model for future predictions, then, determine the attributes that are most influential</p> <p>in the practitioners’ purchasing decisions. Classification and regression trees (CART) and Partial least squares regression (PLSR) are the two statistical approaches used in this study to build a prediction model for the Dental Implants dataset.</p> <p>The prediction models generated, using both of the techniques, have rel- atively small prediction powers; which may be perceived as an indication of deficiency in the dataset. However, getting a small prediction power is gener- ally expected in market research studies. The research then attempts to find ways to improve the power of these models to get more accurate results. The model generated by CART analysis tends to have better prediction power and is more suitable for future predictions. Although PLSR provides extremely small prediction power, it helps finding out the most important attributes that influence the practitioners’ purchasing decisions. Improvements in pre- diction are sought by restricting the cases in the data to subsets that show better alignment between predictors and customer purchasing behaviour.</p> / Master of Science (MSc)
27

Multivariate Statistical Methods for Testing a Set of Variables Between Groups with Application to Genomics

Alsulami, Abdulhadi Huda 10 1900 (has links)
<p>The use of traditional univariate analyses for comparing groups in high-dimensional genomic studies, such as the ordinary t-test that is typically used to compare two independent groups, might be suboptimal because of methodological challenges including multiple testing problem and failure to incorporate correlation among genes. Hence, multivariate methods are preferred for the joint analysis of a group or set of variables. These methods aim to test for differences in average values of a set of variables across groups. The variables that make the set could be determined statistically (using exploratory methods such as cluster analysis) or biologically (based on membership to known pathways). In this thesis, the traditional One-Way Multivariate Analysis of Variance (MANOVA) method and a robustifed version of MANOVA (Robustifed MANOVA) are compared with respect to Type I error rates and power through a simulation study. We generated data from multivariate normal as well as multivariate gamma distributions with different parameter settings. The methods are illustrated using a real gene expression data. In addition, we investigated a popular method known as Gene Set Enrichment Analysis (GSEA), where sets of genes (variables) that belong to known biological pathways are considered jointly and assessed whether or not they are "enriched" with respect to their association with a disease or phenotype of interest. We applied this method to a real genotype data.</p> / Master of Science (MSc)
28

A Bayesian Semi-parametric Model for Realized Volatility

Feng, Tian 10 1900 (has links)
<p>Due to the advancements in computing power and the availability of high-frequency data, the analyses of the high frequency stock data and market microstructure has become more and more important in econometrics. In the high frequency data setting, volatility is a very important indicator on the movement of stock prices and measure of risk. It is a key input in pricing of assets, portfolio reallocation, and risk management. In this thesis, we use the Heterogeneous Autoregressive model of realized volatility, combined with Bayesian inference as well as Markov chain Monte Carlo method’s to estimate the innovation density of the daily realized volatility. A Dirichlet process is used as the prior in a countably infinite mixture model. The semi-parametric model provides a robust alternative to the models used in the literature. I find evidence of thick tails in the density of innovations to log-realized volatility.</p> / Master of Science (MSc)
29

A New Right Tailed Test of the Ratio of Variances

Lesser, Elizabeth Rochelle 01 January 2016 (has links)
It is important to be able to compare variances efficiently and accurately regardless of the parent populations. This study proposes a new right tailed test for the ratio of two variances using the Edgeworth’s expansion. To study the Type I error rate and Power performance, simulation was performed on the new test with various combinations of symmetric and skewed distributions. It is found to have more controlled Type I error rates than the existing tests. Additionally, it also has sufficient power. Therefore, the newly derived test provides a good robust alternative to the already existing methods.
30

A Multi-Indexed Logistic Model for Time Series

Liu, Xiang 01 December 2016 (has links)
In this thesis, we explore a multi-indexed logistic regression (MILR) model, with particular emphasis given to its application to time series. MILR includes simple logistic regression (SLR) as a special case, and the hope is that it will in some instances also produce significantly better results. To motivate the development of MILR, we consider its application to the analysis of both simulated sine wave data and stock data. We looked at well-studied SLR and its application in the analysis of time series data. Using a more sophisticated representation of sequential data, we then detail the implementation of MILR. We compare their performance using forecast accuracy and an area under the curve score via simulated sine waves with various intensities of Gaussian noise and Standard & Poors 500 historical data. Overall, that MILR outperforms SLR is validated on both realistic and simulated data. Finally, some possible future directions of research are discussed.

Page generated in 0.1121 seconds