• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • No language data
  • Tagged with
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

STATISTICAL METHODS FOR VARIABLE SELECTION IN THE CONTEXT OF HIGH-DIMENSIONAL DATA: LASSO AND EXTENSIONS

Yang, Xiao Di 10 1900 (has links)
<p>With the advance of technology, the collection and storage of data has become routine. Huge amount of data are increasingly produced from biological experiments. the advent of DNA microarray technologies has enabled scientists to measure expressions of tens of thousands of genes simultaneously. Single nucleotide polymorphism (SNP) are being used in genetic association with a wide range of phenotypes, for example, complex diseases. These high-dimensional problems are becoming more and more common. The "large p, small n" problem, in which there are more variables than samples, currently a challenge that many statisticians face. The penalized variable selection method is an effective method to deal with "large p, small n" problem. In particular, The Lasso (least absolute selection and shrinkage operator) proposed by Tibshirani has become an effective method to deal with this type of problem. the Lasso works well for the covariates which can be treated individually. When the covariates are grouped, it does not work well. Elastic net, group lasso, group MCP and group bridge are extensions of the Lasso. Group lasso enforces sparsity at the group level, rather than at the level of the individual covariates. Group bridge, group MCP produces sparse solutions both at the group level and at the level of the individual covariates within a group. Our simulation study shows that the group lasso forces complete grouping, group MCP encourages grouping to a rather slight extent, and group bridge is somewhere in between. If one expects that the proportion of nonzero group members to be greater than one-half, group lasso maybe a good choice; otherwise group MCP would be preferred. If one expects this proportion to be close to one-half, one may wish to use group bridge. A real data analysis example is also conducted for genetic variation (SNPs) data to find out the associations between SNPs and West Nile disease.</p> / Master of Science (MSc)

Page generated in 0.1259 seconds