The analysis of big data is of great interest today, and this comes with challenges of improving precision and efficiency in estimation and prediction. We study binary data with covariates from numerous small areas, where direct estimation is not reliable, and there is a need to borrow strength from the ensemble. This is generally done using Bayesian logistic regression, but because there are numerous small areas, the exact computation for the logistic regression model becomes challenging. Therefore, we develop an integrated multivariate normal approximation (IMNA) method for binary data with covariates within the Bayesian paradigm, and this procedure is assisted by the empirical logistic transform. Our main goal is to provide the theory of IMNA and to show that it is many times faster than the exact logistic regression method with almost the same accuracy. We apply the IMNA method to the health status binary data (excellent health or otherwise) from the Nepal Living Standards Survey with more than 60,000 households (small areas). We estimate the proportion of Nepalese in excellent health condition for each household. For these data IMNA gives estimates of the household proportions as precise as those from the logistic regression model and it is more than fifty times faster (20 seconds versus 1,066 seconds), and clearly this gain is transferable to bigger data problems.
Identifer | oai:union.ndltd.org:wpi.edu/oai:digitalcommons.wpi.edu:etd-theses-1450 |
Date | 28 April 2016 |
Creators | Fu, Shuting |
Contributors | Balgobin Nandram, Advisor, , |
Publisher | Digital WPI |
Source Sets | Worcester Polytechnic Institute |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | Masters Theses (All Theses, All Years) |
Page generated in 0.0024 seconds