Return to search

Bayesian Logistic Regression Model with Integrated Multivariate Normal Approximation for Big Data

The analysis of big data is of great interest today, and this comes with challenges of improving precision and efficiency in estimation and prediction. We study binary data with covariates from numerous small areas, where direct estimation is not reliable, and there is a need to borrow strength from the ensemble. This is generally done using Bayesian logistic regression, but because there are numerous small areas, the exact computation for the logistic regression model becomes challenging. Therefore, we develop an integrated multivariate normal approximation (IMNA) method for binary data with covariates within the Bayesian paradigm, and this procedure is assisted by the empirical logistic transform. Our main goal is to provide the theory of IMNA and to show that it is many times faster than the exact logistic regression method with almost the same accuracy. We apply the IMNA method to the health status binary data (excellent health or otherwise) from the Nepal Living Standards Survey with more than 60,000 households (small areas). We estimate the proportion of Nepalese in excellent health condition for each household. For these data IMNA gives estimates of the household proportions as precise as those from the logistic regression model and it is more than fifty times faster (20 seconds versus 1,066 seconds), and clearly this gain is transferable to bigger data problems.

Identiferoai:union.ndltd.org:wpi.edu/oai:digitalcommons.wpi.edu:etd-theses-1450
Date28 April 2016
CreatorsFu, Shuting
ContributorsBalgobin Nandram, Advisor, ,
PublisherDigital WPI
Source SetsWorcester Polytechnic Institute
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceMasters Theses (All Theses, All Years)

Page generated in 0.002 seconds