In the data mining process, it is necessary to prepare the source dataset - for example, to select the cutting or grouping of continuous data attributes etc. and use the knowledge from the problem area. Such a preparation process can be guided by background (domain) knowledge obtained from experts. In the SEWEBAR project, we collect the knowledge from experts in a rich XML-based representation language, called BKEF, using a dedicated editor, and save into the database of our custom-tailored (Joomla!-based) CMS system. Data mining tools are then able to generate, from this dataset, mining models represented in the standardized PMML format. It is then necessary to map a particular column (attribute) from the dataset (in PMML) to a relevant 'metaattribute' of the BKEF representation. This specific type of schema mapping problem is addressed in my thesis in terms of algorithms for automatic suggestion of mapping of columns to metaattributes and from values of these columns to BKEF 'metafields'. Manual corrections of this mapping by the user are also supported. The implementation is based on the PHP language and then it was tested on datasets with information about courses taught in 5 universities in the U.S.A. from Illinois Semantic Integration Archive. On this datasets, the auto-mapping suggestion process archieved the precision about 70% and recall about 77% on unknown columns, but when mapping the previously user-mapped data (using implemented learning module), the recall is between 90% and 100%.
Identifer | oai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:75744 |
Date | January 2010 |
Creators | Vojíř, Stanislav |
Contributors | Kliegr, Tomáš, Zamazal, Ondřej |
Publisher | Vysoká škola ekonomická v Praze |
Source Sets | Czech ETDs |
Language | Czech |
Detected Language | English |
Type | info:eu-repo/semantics/masterThesis |
Rights | info:eu-repo/semantics/restrictedAccess |
Page generated in 0.001 seconds