In recent years, there has been a
ood of continuously changing information
from a variety of web resources such as web databases, web sites,
web services and programs. Online Social Networks (OSNs) represent
such a eld where huge amounts of information are being posted online
over time. Due to the nature of OSNs, which o er a productive source
for qualitative and quantitative personal information, researchers from
various disciplines contribute to developing methods for extracting data
from OSNs. However, there is limited research which addresses extracting
data automatically. To the best of the author's knowledge, there
is no research which focuses on tracking the real time changes of information
retrieved from OSN pro les over time and this motivated the
present work.
This thesis presents di erent approaches for automated Data Extraction
(DE) from OSN: crawler, parser, Multi Agent System (MAS) and Application
Programming Interface (API). Initially, a parser was implemented
as a centralized system to traverse the OSN graph and extract the pro-
le's attributes and list of friends from Myspace, the top OSN at that
time, by parsing the Myspace pro les and extracting the relevant tokens
from the parsed HTML source les. A Breadth First Search (BFS) algorithm
was used to travel across the generated OSN friendship graph
in order to select the next pro le for parsing. The approach was implemented
and tested on two types of friends: top friends and all friends.
In case of top friends, 500 seed pro les have been visited; 298 public
pro les were parsed to get 2197 top friends pro les and 2747 friendship
edges, while in case of all friends, 250 public pro les have been parsed
to extract 10,196 friends' pro les and 17,223 friendship edges.
This approach has two main limitations. The system is designed as
a centralized system that controlled and retrieved information of each
user's pro le just once. This means that the extraction process will stop
if the system fails to process one of the pro les; either the seed pro le
( rst pro le to be crawled) or its friends. To overcome this problem,
an Online Social Network Retrieval System (OSNRS) is proposed to
decentralize the DE process from OSN through using MAS. The novelty
of OSNRS is its ability to monitor pro les continuously over time.
The second challenge is that the parser had to be modi ed to cope with
changes in the pro les' structure. To overcome this problem, the proposed
OSNRS is improved through use of an API tool to enable OSNRS
agents to obtain the required elds of an OSN pro le despite modi cations
in the representation of the pro le's source web pages. The experimental
work shows that using API and MAS simpli es and speeds up the
process of tracking a pro le's history. It also helps security personnel,
parents, guardians, social workers and marketers in understanding the
dynamic behaviour of OSN users. This thesis proposes solutions for web
database processing on data extraction from OSNs by the use of parser
and MAS and discusses the limitations and improvements. / Taibah University
Identifer | oai:union.ndltd.org:BRADFORD/oai:bradscholars.brad.ac.uk:10454/5502 |
Date | January 2012 |
Creators | Abdulrahman, Ruqayya |
Contributors | Neagu, Daniel, Holton, David R.W., Awan, Irfan U. |
Publisher | University of Bradford, Department of Computing |
Source Sets | Bradford Scholars |
Language | English |
Detected Language | English |
Type | Thesis, doctoral, PhD |
Rights | <a rel="license" href="http://creativecommons.org/licenses/by-nc-nd/3.0/"><img alt="Creative Commons License" style="border-width:0" src="http://i.creativecommons.org/l/by-nc-nd/3.0/88x31.png" /></a><br />The University of Bradford theses are licenced under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-nd/3.0/">Creative Commons Licence</a>. |
Page generated in 0.002 seconds