Global ETD Search

1	Recovering software tuning parameters Brake, Nevon 08 July 2008 (has links) Autonomic Computing is an approach to designing systems that are capable of self-management. Fundamental to the autonomic ideal is a software's awareness of and ability to tune parameters that affect metrics like performance and security. Traditionally, these parameters are tuned by human experts with extensive knowledge of parameter names and effects---existing software was not designed to be self-tuning. Efforts to automate the isolation and tuning of parameters have yielded encouraging results. However, the parameters are identified manually. This thesis proposes the adaptation of reverse engineering techniques for automating the recovery of software tuning parameters. Tuning parameters from several industrially relevant applications are studied for patterns of use. These patterns are used to classify the parameters into a taxonomy, and to develop a metamodel of the source code elements and relationships needed to express them. An extractor is then built to obtain instances of the relationships from source code. The relationships are represented as graphs, which are manipulated and queried for instances of tuning parameter patterns. The recovery is implemented as a tool for finding tuning parameters in applications. Experimental results show that the approach is effective at recovering documented tuning parameters, as well as other undocumented ones. The results also indicate that the tuning parameter patterns are not specific to a particular application, or application domain. / Thesis (Master, Computing) -- Queen's University, 2008-06-28 19:36:43.291 Software Reverse-engineering Tuning parameter Autonomic computing
2	A Bayesian Group Sparse Multi-Task Regression Model for Imaging Genomics Greenlaw, Keelin 26 August 2015 (has links) Recent advances in technology for brain imaging and high-throughput genotyping have motivated studies examining the influence of genetic variation on brain structure. In this setting, high-dimensional regression for multi-SNP association analysis is challenging as the brain imaging phenotypes are multivariate and there is a desire to incorporate a biological group structure among SNPs based on their belonging genes. Wang et al. (Bioinformatics, 2012) have recently developed an approach for simultaneous estimation and SNP selection based on penalized regression with regularization based on a novel group l_{2,1}-norm penalty, which encourages sparsity at the gene level. A problem with the proposed approach is that it only provides a point estimate. We solve this problem by developing a corresponding Bayesian formulation based on a three-level hierarchical model that allows for full posterior inference using Gibbs sampling. For the selection of tuning parameters, we consider techniques based on: (i) a fully Bayes approach with hyperpriors, (ii) empirical Bayes with implementation based on a Monte Carlo EM algorithm, and (iii) cross-validation (CV). When the number of SNPs is greater than the number of observations we find that both the fully Bayes and empirical Bayes approaches overestimate the tuning parameters, leading to overshrinkage of regression coefficients. To understand this problem we derive an approximation to the marginal likelihood and investigate its shape under different settings. Our investigation sheds some light on the problem and suggests the use of cross-validation or its approximation with WAIC (Watanabe, 2010) when the number of SNPs is relatively large. Properties of our Gibbs-WAIC approach are investigated using a simulation study and we apply the methodology to a large dataset collected as part of the Alzheimer's Disease Neuroimaging Initiative. / Graduate Bayesian Shrinkage Imaging Genomics Tuning Parameter Selection Multivariate Regression
3	Tuning Parameter Selection in L1 Regularized Logistic Regression Shi, Shujing 05 December 2012 (has links) Variable selection is an important topic in regression analysis and is intended to select the best subset of predictors. Least absolute shrinkage and selection operator (Lasso) was introduced by Tibshirani in 1996. This method can serve as a tool for variable selection because it shrinks some coefficients to exact zero by a constraint on the sum of absolute values of regression coefficients. For logistic regression, Lasso modifies the traditional parameter estimation method, maximum log likelihood, by adding the L1 norm of the parameters to the negative log likelihood function, so it turns a maximization problem into a minimization one. To solve this problem, we first need to give the value for the parameter of the L1 norm, called tuning parameter. Since the tuning parameter affects the coefficients estimation and variable selection, we want to find the optimal value for the tuning parameter to get the most accurate coefficient estimation and best subset of predictors in the L1 regularized regression model. There are two popular methods to select the optimal value of the tuning parameter that results in a best subset of predictors, Bayesian information criterion (BIC) and cross validation (CV). The objective of this paper is to evaluate and compare these two methods for selecting the optimal value of tuning parameter in terms of coefficients estimation accuracy and variable selection through simulation studies. Variable Selection Logistic Regression Lasso Tuning Parameter BIC Cross Validation Physical Sciences and Mathematics
4	Discrete Parameter Estimation for Rare Events: From Binomial to Extreme Value Distributions Schneider, Laura Fee 26 April 2019 (has links) No description available. 510 Bayesian estimation Extreme value analysis Posterior contraction Tuning parameter selection Hill estimator Mathematics (PPN61756535X)
5	NONPARAMETRIC ESTIMATION OF DERIVATIVES WITH APPLICATIONS Hall, Benjamin 01 January 2010 (has links) We review several nonparametric regression techniques and discuss their various strengths and weaknesses with an emphasis on derivative estimation and confidence band creation. We develop a generalized C(p) criterion for tuning parameter selection when interest lies in estimating one or more derivatives and the estimator is both linear in the observed responses and self-consistent. We propose a method for constructing simultaneous confidence bands for the mean response and one or more derivatives, where simultaneous now refers both to values of the covariate and to all derivatives under consideration. In addition we generalize the simultaneous confidence bands to account for heteroscedastic noise. Finally, we consider the characterization of nanoparticles and propose a method for identifying a proper subset of the covariate space that is most useful for characterization purposes. compound estimator bias correction tuning parameter confidence bands heteroscedastic noise Mechanical Engineering Statistics and Probability
6	Výběr modelu na základě penalizované věrohodnosti / Variable selection based on penalized likelihood Chlubnová, Tereza January 2016 (has links) Selection of variables and estimation of regression coefficients in datasets with the number of variables exceeding the number of observations consti- tutes an often discussed topic in modern statistics. Today the maximum penalized likelihood method with an appropriately selected function of the parameter as the penalty is used for solving this problem. The penalty should evaluate the benefit of the variable and possibly mitigate or nullify the re- spective regression coefficient. The SCAD and LASSO penalty functions are popular for their ability to choose appropriate regressors and at the same time estimate the parameters in a model. This thesis presents an overview of up to date results in the area of characteristics of estimates obtained by using these two methods for both small number of regressors and multidimensional datasets in a normal linear model. Due to the fact that the amount of pe- nalty and therefore also the choice of the model is heavily influenced by the tuning parameter, this thesis further discusses its selection. The behavior of the LASSO and SCAD penalty functions for different values and possibili- ties for selection of the tuning parameter is tested with various numbers of regressors on simulated datasets.
7	Estimating the Local False Discovery Rate via a Bootstrap Solution to the Reference Class Problem: Application to Genetic Association Data Abbas Aghababazadeh, Farnoosh January 2015 (has links) Modern scientific technology such as microarrays, imaging devices, genome-wide association studies or social science surveys provide statisticians with hundreds or even thousands of tests to consider simultaneously. Testing many thousands of null hypotheses may increase the number of Type $I$ errors. In large-scale hypothesis testing, researchers can use different statistical techniques such as family-wise error rates, false discovery rates, permutation methods, local false discovery rate, where all available data usually should be analyzed together. In applications, the thousands of tests are related by a scientifically meaningful structure. Ignoring that structure can be misleading as it may increase the number of false positives and false negatives. As an example, in genome-wide association studies each test corresponds to a specific genetic marker. In such a case, the scientific structure for each genetic marker can be its minor allele frequency. In this research, the local false discovery rate as a relevant statistical approach is considered to analyze the thousands of tests together. We present a model for multiple hypothesis testing when the scientific structure of each test is incorporated as a co-variate. The purpose of this model is to incorporate the co-variate to improve the performance of testing procedures. The method we consider has different estimates depending on the tuning parameter. We would like to estimate the optimal value of that parameter by considering observed statistics. Thus, among those estimators, the one which minimizes the estimated errors due to bias and to variance is chosen by applying the bootstrap approach. Such an estimation method is called an adaptive reference class method. Under the combined reference class method, the effect of the co-variates is ignored and all null hypotheses should be analyzed together. In this research, under some assumptions for the co-variates and the prior probabilities, the proposed adaptive reference class method shows smaller error than the combined reference class method in estimating the local false discovery rate, when the number of tests gets large. We describe the adaptive reference class method to the coronary artery disease data, and we use simulation data to evaluate the performance of the estimator associated with the adaptive reference class method. Multiple Testing Local False Discovery Rtae Reference Class Bias-variance Trade Off Tuning Parameter Bootstrap Approach
8	Clustering, Classification, and Factor Analysis in High Dimensional Data Analysis Wang, Yanhong 17 December 2013 (has links) Clustering, classification, and factor analysis are three popular data mining techniques. In this dissertation, we investigate these methods in high dimensional data analysis. Since there are much more features than the sample sizes and most of the features are non-informative in high dimensional data, dimension reduction is necessary before clustering or classification can be made. In the first part of this dissertation, we reinvestigate an existing clustering procedure, optimal discriminant clustering (ODC; Zhang and Dai, 2009), and propose to use cross-validation to select the tuning parameter. Then we develop a variation of ODC, sparse optimal discriminant clustering (SODC) for high dimensional data, by adding a group-lasso type of penalty to ODC. We also demonstrate that both ODC and SDOC can be used as a dimension reduction tool for data visualization in cluster analysis. In the second part, three existing sparse principal component analysis (SPCA) methods, Lasso-PCA (L-PCA), Alternative Lasso PCA (AL-PCA), and sparse principal component analysis by choice of norm (SPCABP) are applied to a real data set the International HapMap Project for AIM selection to genome-wide SNP data, the classification accuracy is compared for them and it is demonstrated that SPCABP outperforms the other two SPCA methods. Third, we propose a novel method called sparse factor analysis by projection (SFABP) based on SPCABP, and propose to use cross-validation method for the selection of the tuning parameter and the number of factors. Our simulation studies show that SFABP has better performance than the unpenalyzed factor analysis when they are applied to classification problems. Cluster analysis Classification Cross-validation High-dimensional data Optimal score Principal components analysis Tuning parameter Variable selection Factor Analysis
9	Représentation parcimonieuse et procédures de tests multiples : application à la métabolomique / Sparse representation and multiple testing procedures : application to metabolimics Tardivel, Patrick 24 November 2017 (has links) Considérons un vecteur gaussien Y de loi N (m,sigma²Idn) et X une matrice de dimension n x p avec Y observé, m inconnu, Sigma et X connus. Dans le cadre du modèle linéaire, m est supposé être une combinaison linéaire des colonnes de X. En petite dimension, lorsque n ≥ p et que ker (X) = 0, il existe alors un unique paramètre Beta* tel que m = X Beta* ; on peut alors réécrire Y sous la forme Y = X Beta* + Epsilon. Dans le cadre du modèle linéaire gaussien en petite dimension, nous construisons une nouvelle procédure de tests multiples contrôlant le FWER pour tester les hypothèses nulles Betai = 0 pour i appartient à [[1,p]]. Cette procédure est appliquée en métabolomique au travers du programme ASICS qui est disponible en ligne. ASICS permet d'identifier et de quantifier les métabolites via l'analyse des spectres RMN. En grande dimension, lorsque n < p on a ker (X) ≠ 0, ainsi le paramètre Beta décrit précédemment n'est pas unique. Dans le cas non bruité lorsque Sigma = 0, impliquant que Y = m, nous montrons que les solutions du système linéaire d'équations Y = X Beta avant un nombre de composantes non nulles minimales s'obtiennent via la minimisation de la "norme" lAlpha avec Alpha suffisamment petit. / Let Y be a Gaussian vector distributed according to N (m,sigma²Idn) and X a matrix of dimension n x p with Y observed, m unknown, sigma and X known. In the linear model, m is assumed to be a linear combination of the columns of X In small dimension, when n ≥ p and ker (X) = 0, there exists a unique parameter Beta* such that m = X Beta; then we can rewrite Y = Beta + Epsilon. In the small-dimensional linear Gaussian model framework, we construct a new multiple testing procedure controlling the FWER to test the null hypotheses Betai = 0 for i belongs to [[1,p]]. This procedure is applied in metabolomics through the freeware ASICS available online. ASICS allows to identify and to qualify metabolites via the analyse of RMN spectra. In high dimension, when n < p we have ker (X) ≠ 0 consequently the parameter Beta described above is no longer unique. In the noiseless case when Sigma = 0, implying thus Y = m, we show that the solutions of the linear system of equation Y = X Beta having a minimal number of non-zero components are obtained via the lalpha with alpha small enough. Procédure de tests multiples FWER Estimateur lasso Paramètre de régularisation Minimisation de la norme l1 Minimisation de la "norme" l0 Représentation parcimonieuse Résonance magnétique nucléaire Identification de métabolites Quantification de métabolites Multiple testing procedure Familywise error rate Lasso Estimator Tuning parameter Basis pursuit Alpha minimization Sparsest representation Nuclear magnetic resonance Identification of metabolites Quantification of metabolites

Search results