Return to search

Improving Scalability And Efficiency Of Ilp-based And Graph-based Concept Discovery Systems

Concept discovery is the problem of finding definitions of target relation in terms or other relation given
as a background knowledge. Inductive Logic Programming (ILP)-based and graph-based approaches
are two competitors in concept discovery problem. Although ILP-based systems have long dominated
the area, graph-based systems have recently gained popularity as they overcome certain shortcomings
of ILP-based systems. While having applications in numerous domains, ILP-based concept discovery systems still sustain scalability and efficiency issues. These issues generally arose due to the large search spaces such systems build. In this work we propose memoization-based and parallelization-based methods that modify the search space construction step and the evaluation step of ILP-based concept discovery systems to overcome these problem.

In this work we propose three memoization-based methods, called Tabular CRIS, Tabular CRIS-wEF,
and Selective Tabular CRIS. In these methods, basically, evaluation queries are stored in look-up tables
for later uses. While preserving some core functions in common, each proposed method improves
e_ciency and scalability of its predecessor by introducing constraints on what kind of evaluation
queries to store in look-up tables and for how long.
The proposed parallelization method, called pCRIS, parallelizes the search space construction and
evaluation steps of ILP-based concept discovery systems in a data-parallel manner. The proposed
method introduces policies to minimize the redundant work and waiting time among the workers at
synchronization points.

Graph-based approaches were first introduced to the concept discovery domain to handle the so called local plateau problem. Graph-based approaches have recently gained more popularity in concept discovery system as they provide convenient environment to represent relational data and are able to
overcome certain shortcomings of ILP-based concept discovery systems. Graph-based approaches can
be classified as structure-based approaches and path-finding approaches. The first class of approaches
need to employ expensive algorithms such as graph isomorphism to find frequently appearing substructures.
The methods that fall into the second class need to employ sophisticated indexing mechanisms
to find out the frequently appearing paths that connect some nodes in interest. In this work, we also
propose a hybrid method for graph-based concept discovery which does not require costly substructure
matching algorithms and path indexing mechanism. The proposed method builds the graph in such a
way that similar facts are grouped together and paths that eventually turn to be concept descriptors are
build while the graph is constructed.

Identiferoai:union.ndltd.org:METU/oai:etd.lib.metu.edu.tr:http://etd.lib.metu.edu.tr/upload/12615670/index.pdf
Date01 July 2013
CreatorsMutlu, Alev
ContributorsKaragoz, Pinar
PublisherMETU
Source SetsMiddle East Technical Univ.
LanguageEnglish
Detected LanguageEnglish
TypePh.D. Thesis
Formattext/pdf
RightsTo liberate the content for METU campus

Page generated in 0.0196 seconds