Return to search

GEOGRAPHIC DATA MINING AND GEOVISUALIZATION FOR UNDERSTANDING ENVIRONMENTAL AND PUBLIC HEALTH DATA

Within the theoretical framework of this study it is recognized that a very large amount of real-world facts and geospatial data are collected and stored. Decision makers cannot consider all the available disparate raw facts and data. Problem-specific variables, including complex geographic identifiers have to be selected from this data and be validated. The problems associated with environmental- and public-health data are that (1) geospatial components of the data are not considered in analysis and decision making process, (2) meaningful geospatial patterns and clusters are often overlooked, and (3) public health practitioners find it difficult to comprehend geospatial data. Inspired by the advent of geographic data mining and geovisualization in public and environmental health, the goal of this study is to unveil the spatiotemporal dynamics in the prevalence of overweight and obesity in United States youths at regional and local levels over a twelve-year study period. Specific objectives of this dissertation are to (1) apply regionalization algorithms effective for the identification of meaningful clusters that are in spatial uniformity to youth overweight and obesity, and (2) use Geographic Information System (GIS), spatial analysis techniques, and statistical methods to explore the data sets for health outcomes, and (3) explore geovisualization techniques to transform discovered patterns in the data sets for recognition, flexible interaction and improve interpretation. To achieve the goal and the specific objectives of this dissertation, we used data sets from the National Longitudinal Survey of Youth 1997 (NLSY'97) early release (1997-2004), NLSY'97 current release (2005 - 2008), census 2000 data and yearly population estimates from 2001 to 2008, and synthetic data sets. The NLSY97 Cohort database range varied from 6,923 to 8,565 individuals during the period. At the beginning of the cohort study the age of individuals participating in this study was between 12 and 17 years, and in 2008, they were between 24 and 28 years. For the data mining tool, we applied the Regionalization with Dynamically Constrained Agglomerative clustering and Partitioning (REDCAP) algorithms to identify hierarchical regions based on measures of weight metrics of the U.S. youths. The applied algorithms are the single linkage clustering (SLK), average linkage clustering (ALK), complete linkage clustering (CLK), and the Ward's method. Moreover, we used GIS, spatial analysis techniques, and statistical methods to analyze the spatial varying association of overweight and obesity prevalence in the youth and to geographically visualize the results. The methods used included the ordinary least square (OLS) model, the spatial generalized linear mixed model (GLMM), Kulldorff's Scan space-time analysis, and the spatial interpolation techniques (inverse distance weighting). The three main findings for this study are: first, among the four algorithms ALK, Ward and CLK identified regions effectively than SLK which performed very poorly. The ALK provided more promising regions than the rest of the algorithms by producing spatial uniformity effectively related to the weight variable (body mass index). The regionalization algorithm-ALK provided new insights about overweight and obesity, by detecting new spatial clusters with over 30% prevalence. New meaningful clusters were detected in 15 counties, including Yazoo, Holmes, Lincoln, and Attala, in Mississippi; Wise, Delta, Hunt, Liberty, and Hardin in Texas; St Charles, St James, and Calcasieu in Louisiana; Choctaw, Sumter, and Tuscaloosa in Alabama. Demographically, these counties have race/ethnic composition of about 75% White, 11.6% Black and 13.4% others. Second, results from this study indicated that there is an upward trend in the prevalence of overweight and obesity in United States youths both in males and in females. Male youth obesity increased from 10.3% (95% CI=9.0, 11.0) in 1999 to 27.0% (95% CI=26.0, 28.0) in 2008. Likewise, female obesity increased from 9.6% (95% CI=8.0, 11.0) in 1999 to 28.9% (95% CI=27.0, 30.0) during the same period. Youth obesity prevalence was higher among females than among males. Aging is a substantial factor that has statistically highly significant association (p < 0.001) with prevalence of overweight and obesity. Third, significant cluster years for high rates were detected in 2003-2008 (relative risk 1.92, 3.4 annual prevalence cases per 100000, p < 0.0001) and that of low rates in 1997-2002 (relative risk 0.39, annual prevalence cases per 100000, p < 0.0001). Three meaningful spatiotemporal clusters of obesity (p < 0.0001) were detected in counties located within the South, Lower North Eastern, and North Central regions. Counties identified as consistently experiencing high prevalence of obesity and with the potential of becoming an obesogenic environment in the future are Copiah, Holmes, and Hinds in Mississippi; Harris and Chamber, Texas; Oklahoma and McCain, Oklahoma; Jefferson, Louisiana; and Chicot and Jefferson, Arkansas. Surprisingly, there were mixed trends in youth obesity prevalence patterns in rural and urban areas. Finally, from a public health perspective, this research have shown that in-depth knowledge of whether and in what respect certain areas have worse health outcomes can be helpful in designing effective community interventions to promote healthy living. Furthermore, specific information obtained from this dissertation can help guide geographically-targeted programs, policies, and preventive initiatives for overweight and obesity prevalence in the United States.

Identiferoai:union.ndltd.org:siu.edu/oai:opensiuc.lib.siu.edu:dissertations-1659
Date01 May 2013
CreatorsAdu-Prah, Samuel
PublisherOpenSIUC
Source SetsSouthern Illinois University Carbondale
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceDissertations

Page generated in 0.0027 seconds