Return to search

Learning predictive models from graph data using pattern mining

Learning from graphs has become a popular research area due to the ubiquity of graph data representing web pages, molecules, social networks, protein interaction networks etc. However, standard graph learning approaches are often challenged by the computational cost involved in the learning process, due to the richness of the representation. Attempts made to improve their efficiency are often associated with the risk of degrading the performance of the predictive models, creating tradeoffs between the efficiency and effectiveness of the learning. Such a situation is analogous to an optimization problem with two objectives, efficiency and effectiveness, where improving one objective without the other objective being worse off is a better solution, called a Pareto improvement. In this thesis, it is investigated how to improve the efficiency and effectiveness of learning from graph data using pattern mining methods. Two objectives are set where one concerns how to improve the efficiency of pattern mining without reducing the predictive performance of the learning models, and the other objective concerns how to improve predictive performance without increasing the complexity of pattern mining. The employed research method mainly follows a design science approach, including the development and evaluation of artifacts. The contributions of this thesis include a data representation language that can be characterized as a form in between sequences and itemsets, where the graph information is embedded within items. Several studies, each of which look for Pareto improvements in efficiency and effectiveness are conducted using sets of small graphs. Summarizing the findings, some of the proposed methods, namely maximal frequent itemset mining and constraint based itemset mining, result in a dramatically increased efficiency of learning, without decreasing the predictive performance of the resulting models. It is also shown that additional background knowledge can be used to enhance the performance of the predictive models, without increasing the complexity of the graphs.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:su-100713
Date January 2014
CreatorsKarunaratne, Thashmee M.
PublisherStockholms universitet, Institutionen för data- och systemvetenskap, Stockholm : Department of Computer and Systems Sciences, Stockholm University
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeDoctoral thesis, comprehensive summary, info:eu-repo/semantics/doctoralThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess
RelationReport Series / Department of Computer & Systems Sciences, 1101-8526 ; 14-003

Page generated in 0.002 seconds