McMaster University MASTER OF PUBLIC HEALTH (2020) Hamilton, Ontario (Health Research Methods, Evidence, and Impact)
TITLE: Applying Machine Learning to Determine Nutrients Predictive of Cardiovascular Disease Using Canadian Linked Population-Based Data AUTHOR: Jason D. Morgenstern, B.Sc. (University of Guelph), M.D. (Western University) SUPERVISOR: Professor L.N. Anderson, NUMBER OF PAGES: xv, 121 / The use of big data and machine learning may help to address some challenges in nutritional epidemiology. The first objective of this thesis was to explore the use of machine learning prediction models in a hypothesis-generating approach to evaluate how detailed dietary features contribute to CVD risk prediction. The second objective was to assess the predictive performance of the models. A population-based retrospective cohort study was conducted using linked Canadian data from 2004 – 2018. Study participants were adults age 20 and older (n=12 130 ) who completed the 2004 Canadian Community Health Survey, Cycle 2.2, Nutrition (CCHS 2.2). Statistics Canada has linked the CCHS 2.2 data to the Discharge Abstracts Database and the Canadian Vital Statistics Death database, which were used to determine cardiovascular outcomes (stroke or ischemic heart disease events or deaths). Conditional inference forests were used to develop models. Then, permutation feature importance (PFI) and accumulated local effects (ALEs) were calculated to explore contributions of nutrients to predicted disease. Supplement-use (median PFI (M)=4.09 x 10-4, IQR=8.25 x 10-7 – 1.11 x 10-3) and caffeine (M=2.79 x 10-4, IQR= -9.11 x 10-5 – 5.86 x 10-4) had the highest median PFIs for nutrition-related features. Supplement-use was associated with decreased predicted risk of CVD (accumulated local effects range (ALER)= -3.02 x 10-4 – 2.76 x 10-4) and caffeine was associated with increased predicted risk (ALER= -9.96 x 10-4 – 0.035). The best-performing model had a logarithmic loss of 0.248. Overall, many non-linear relationships were observed, including threshold, j-shaped, and u-shaped. The results of this exploratory study suggest that applying machine learning to the nutritional epidemiology of CVD, particularly using big datasets, may help elucidate risks and improve predictive models. Given the limited application thus far, work such as this could lead to improvements in public health recommendations and policy related to dietary behaviours. / Thesis / Master of Public Health (MPH) / This work explores the potential for machine learning to improve the study of diet and disease. In chapter 2, opportunities are identified for big data to make diet easier to measure. Also, we highlight how machine learning could find new, complex relationships between diet and disease. In chapter 3, we apply a machine learning algorithm, called conditional inference forests, to a unique Canadian dataset to predict whether people developed strokes or heart attacks. This dataset included responses to a health survey conducted in 2004, where participants’ responses have been linked to administrative databases that record when people go to hospital or die up until 2017. Using these techniques, we identified aspects of nutrition that predicted disease, including caffeine, alcohol, and supplement-use. This work suggests that machine learning may be helpful in our attempts to understand the relationships between diet and health.
Identifer | oai:union.ndltd.org:mcmaster.ca/oai:macsphere.mcmaster.ca:11375/25973 |
Date | January 2020 |
Creators | Morgenstern, Jason D. |
Contributors | Anderson, Laura N., Clinical Epidemiology/Clinical Epidemiology & Biostatistics |
Source Sets | McMaster University |
Language | English |
Detected Language | English |
Type | Thesis |
Page generated in 0.0028 seconds