Global ETD Search

Return to search

Multiple Entity Reconciliation

Living in the age of "Big Data" is both a blessing and a curse. On he one hand, the raw data can be analysed and then used for weather redictions, user recommendations, targeted advertising and more. On he other hand, when data is aggregated from multiple sources, there is no guarantee that each source has stored the data in a standardized or even compatible format to what is required by the application. So there is a need to parse the available data and convert it to the desired form. Here is where the problems start to arise: often the correspondences are not quite so straightforward between data instances that belong to the same domain, but come from different sources. For example, in the film industry, information about movies (cast, characters, ratings etc.) can be found on numerous websites such as IMDb or Rotten Tomatoes. Finding and matching all the data referring to the same movie is a challenge. The aim of this project is to select the most efficient algorithm to correlate movie related information gathered from various websites automatically. We have implemented a flexible application that allows us to make the performance comparison of multiple algorithms based on machine learning techniques. According to our experimental results, a well chosen set of rules is on par with the results from a neural network, these two proving to be the most effective classifiers for records with movie information as content.

http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-187010

Computer and Information Sciences

Data- och informationsvetenskap

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:kth-187010
Date	January 2015
Creators	Samoila, Lavinia Andreea
Publisher	KTH, Skolan för informations- och kommunikationsteknik (ICT)
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess
Relation	TRITA-ICT-EX ; 2015:211

Page generated in 0.0018 seconds

Multiple Entity Reconciliation

Description

Links & Downloads

Tags

Additional Fields