Return to search

A disease classifier for metabolic profiles based on metabolic pathway knowledge

This thesis presents Pathway Informed Analysis (PIA), a classification method for predicting disease states (diagnosis) from metabolic profile measurements that incorporates biological knowledge in the form of metabolic pathways. A metabolic pathway describes a set of chemical reactions that perform a specific biological function. A significant amount of biological knowledge produced by efforts to identify and understand these pathways is formalized in readily accessible databases such as the Kyoto Encyclopedia of Genes and Genomes. PIA uses metabolic pathways to identify relationships among the metabolite concentrations that are measured by a metabolic profile. Specifically, PIA assumes that the class-conditional metabolite concentrations (diseased vs. healthy, respectively) follow multivariate normal distributions. It further assumes that conditional independence statements about these distributions derived from the pathways relate the concentrations of the metabolites to each other. The two assumptions allow for a natural representation of the class-conditional distributions using a type of probabilistic graphical model called a Gaussian Markov Random Field. PIA efficiently estimates the parameters defining these distributions from example patients to produce a classifier. It classifies an undiagnosed patient by evaluating both models to determine the most probable class given their metabolic profile.

We apply PIA to a data set of cancer patients to diagnose those with a muscle wasting disease called cachexia. Standard machine learning algorithms such as Naive Bayes, Tree-augmented Naive Bayes, Support Vector Machines and C4.5 are used to evaluate the performance of PIA. The overall classification accuracy of PIA is better than these algorithms on this data set but the difference is not statistically significant. We also apply PIA to several other classification tasks. Some involve predicting various manipulations of the metabolic processes performed in experiments with worms. Other tasks are to classify pigs according to properties of their dietary intake. The accuracy of PIA at these tasks is not significantly better than the standard algorithms.

Identiferoai:union.ndltd.org:LACETR/oai:collectionscanada.gc.ca:AEU.10048/992
Date06 1900
CreatorsEastman, Thomas
ContributorsGreiner, Russell (Computing Science), Baracos, Vickie (Oncology), Schuurmans, Dale (Computing Science)
Source SetsLibrary and Archives Canada ETDs Repository / Centre d'archives des thèses électroniques de Bibliothèque et Archives Canada
LanguageEnglish
Detected LanguageEnglish
TypeThesis
Format670271 bytes, application/pdf

Page generated in 0.0021 seconds