Return to search

Transfer rule learning for biomarker discovery and verification from related data sets

Biomarkers are a critical tool for the detection, diagnosis,
monitoring and prognosis of diseases, and for understanding
disease mechanisms in order to create treatments. Unfortunately,
finding reliable biomarkers is often hampered by a number of practical
problems, including scarcity of samples, the high dimensionality of the data, and measurement error. An important opportunity to make the most of
these scarce data is to combine information from multiple related
data sets for more effective biomarker discovery. Because the costs
of creating large data sets for every disease of interest are likely
to remain prohibitive, methods for more effectively making use of
related biomarker data sets continues to be important.
This thesis develops TRL, a novel framework for integrative biomarker
discovery from related but separate data sets, such as those generated
for similar biomarker profiling studies. TRL alleviates the problem
of data scarcity by providing a way to validate
knowledge learned from one data set and simultaneously learn new
knowledge on a related data set. Unlike other transfer learning
approaches, TRL takes prior knowledge in the form of interpretable,
modular classification rules, and uses them to seed learning on a new
data set.
We evaluated TRL on 13 pairs of real-world biomarker discovery data
sets, and found TRL improves accuracy twice as often as
degrading it. TRL consists of four alternative methods for transfer
and three measures of the amount of information transferred. By
experimenting with these methods, we investigate the kinds of
information necessary to preserve for transfer learning from related
data sets. We found it is important to keep track of the
relationships between biomarker values and disease state, and to
consider during learning how rules will interact in the final model.
If the source and target data are drawn from the same distribution, we
found the performance improvement and amount of transfer increase with
increasing size of the source compared to the target data.

Identiferoai:union.ndltd.org:PITT/oai:PITTETD:etd-11242010-001158
Date30 January 2011
CreatorsGanchev, Philip
ContributorsVanathi Gopalakrishnan, Robert Bowser, Shyam Visweswaran, Fuchiang Tsui
PublisherUniversity of Pittsburgh
Source SetsUniversity of Pittsburgh
LanguageEnglish
Detected LanguageEnglish
Typetext
Formatapplication/pdf
Sourcehttp://etd.library.pitt.edu/ETD/available/etd-11242010-001158/
Rightsrestricted, I hereby certify that, if appropriate, I have obtained and attached hereto a written permission statement from the owner(s) of each third party copyrighted matter to be included in my thesis, dissertation, or project report, allowing distribution as specified below. I certify that the version I submitted is the same as that approved by my advisory committee. I hereby grant to University of Pittsburgh or its agents the non-exclusive license to archive and make accessible, under the conditions specified below, my thesis, dissertation, or project report in whole or in part in all forms of media, now or hereafter known. I retain all other ownership rights to the copyright of the thesis, dissertation or project report. I also retain the right to use in future works (such as articles or books) all or part of this thesis, dissertation, or project report.

Page generated in 0.0154 seconds