Subsampling is an important method in the analysis of Big Data. Subsample size determination (SSSD) plays a crucial part in extracting information from data and in breaking<br>the challenges resulted from huge data sizes. In this thesis, (1) Sample size determination<br>(SSD) is investigated in multivariate parameters, and sample size formulas are obtained for<br>multivariate normal distribution. (2) Sample size formulas are obtained based on concentration inequalities. (3) Improved bounds for McDiarmid’s inequalities are obtained. (4) The<br>obtained results are applied to nonuniform subsampling in Big Data high dimensional linear<br>regression. (5) Numerical studies are conducted.<br>The sample size formula in univariate normal distribution is a melody in elementary<br>statistics. It appears that its generalization to multivariate normal (or more generally multivariate parameters) hasn’t been caught much attention to the best of our knowledge. In<br>this thesis, we introduce a definition for SSD, and obtain explicit formulas for multivariate<br>normal distribution, in gratifying analogy of the sample size formula in univariate normal.<br>Commonly used concentration inequalities provide exponential rates, and sample sizes<br>based on these inequalities are often loose. Talagrand (1995) provided the missing factor to<br>sharpen these inequalities. We obtained the numeric values of the constants in the missing<br>factor and slightly improved his results. Furthermore, we provided the missing factor in<br>McDiarmid’s inequality. These improved bounds are used to give shrunken sample sizes
<br>
Identifer | oai:union.ndltd.org:purdue.edu/oai:figshare.com:article/17158982 |
Date | 20 December 2021 |
Creators | Yu Wang (11821553) |
Source Sets | Purdue University |
Detected Language | English |
Type | Text, Thesis |
Rights | CC BY 4.0 |
Relation | https://figshare.com/articles/thesis/Sample_Size_Determination_in_Multivariate_Parameters_With_Applications_to_Nonuniform_Subsampling_in_Big_Data_High_Dimensional_Linear_Regression/17158982 |
Page generated in 0.0023 seconds