Return to search

IDENTIFYING HIGH QUALITY MEDLINE ARTICLES AND WEB SITES USING MACHINE LEARNING

In this dissertation, I explore the applicability of text categorization machine learning methods to identify clinically pertinent and evidence-based articles in the literature and web pages on the internet. In the first series of experiments, I found that text categorization techniques identify high quality articles in internal medicine in the content categories of prognosis, diagnosis, etiology, and treatment better than the Clinical Query Filters of Pubmed. In a second set of experiments, I established that the text categorization models generalized both to time periods outside the training set and to areas outside of internal medicine including pediatrics, oncology, and surgery. My third set of experiments revealed that text categorization models built for a specific purpose identified articles better than both bibliometric (number of citations and impact factor) and web-based measures (Google PageRank, Yahoo WebRanks, and total web page hit count). In the fourth set of experiments, I built models for purpose, format, and additional content categories from a labeled gold standard that have high discriminatory power. Furthermore, we built a system called EBMSearch that implements these models to all of MEDLINE. Finally I extended these methods to the web and built the first validated models that identify websites that make false cancer treatment claims outperforming previous unvalidated models and PageRank by 30% area under the receiver operating curve. In conclusion, machine learning-based text categorization methods provide a powerful framework for identifying clinically applicable articles in the medical literature and the Internet.

Identiferoai:union.ndltd.org:VANDERBILT/oai:VANDERBILTETD:etd-12072007-141136
Date28 December 2007
CreatorsAphinyanaphongs, Yindalon
ContributorsDouglas Hardin, Ioannis Tsamardinos, Steven Brown, Dan Masys, Constantin Aliferis
PublisherVANDERBILT
Source SetsVanderbilt University Theses
LanguageEnglish
Detected LanguageEnglish
Typetext
Formatapplication/pdf
Sourcehttp://etd.library.vanderbilt.edu//available/etd-12072007-141136/
Rightsunrestricted, I hereby certify that, if appropriate, I have obtained and attached hereto a written permission statement from the owner(s) of each third party copyrighted matter to be included in my thesis, dissertation, or project report, allowing distribution as specified below. I certify that the version I submitted is the same as that approved by my advisory committee. I hereby grant to Vanderbilt University or its agents the non-exclusive license to archive and make accessible, under the conditions specified below, my thesis, dissertation, or project report in whole or in part in all forms of media, now or hereafter known. I retain all other ownership rights to the copyright of the thesis, dissertation or project report. I also retain the right to use in future works (such as articles or books) all or part of this thesis, dissertation, or project report.

Page generated in 0.0118 seconds