Return to search

Comparing and compressing fuzzy concepts : methods and application

In recent years, the volume of data has risen so rapidly due to the Internet and World Wide Web development. This phenomenon called information overload or digital obesity has caused data explosion and may lead to storage problems in the future. Many forms of data are stored and transmitted via internet including textual data. Textual data, which is usually in unstructured form can be processed or mined to yield useful information. In order to represent that, we need to know the underlying concepts. The most suitable approach to model the concepts is to design an ontology. Formal Concept Analysis (FCA) is complementary to the ontology approach, and provides a hierarchical structure of the concepts. However, an ontology is a fixed structure which does not change; in contrast, data is typically updated from day to day. The focus of this research is quantifying the changes in the content and structure of these concept hierarchies. it is beneficial if we quantify the changes. There are two types of measurements. The first measures the changes between two lattices which have identical sets of objects, but disjoint sets of attributes. We pair the overlapped concepts and compute the cost to transform each concept to its counterpart. We adapt the Levenstein distance to measure the changes. The second is Support-based Distance measurement, where we quantify the change in two lattices which have different sets of objects but the same set of attributes. We compute the support (or relative cardinality) for each concept's extension. Nowadays, online shopping becomes more common, and many customers, retailers, and manufacturers give attention to the product reviews. Because of that, we apply both measurements to an illustrative application using product review datasets. We monitor the differences between positive and negative sentiment orientations based on a product over fixed period of time using Edit Distance measurement. Additionally, we track the changes between lattices which represent the sentiment orientation on a product in two different time periods using Support-based Distance measurement. The phenomenon of information overload leads to problems using FCA, as it can be difficult to read the lattices and very costly to compute them. These large datasets are often high-dimensional datasets. We enhance an approach to select the important dimensions using Principal Component Analysis (PCA) through the Singular Value Decomposition (SVD) method, so that FCA computation becomes more tractable.

Identiferoai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:682484
Date January 2015
CreatorsAbd Rahim, Noor Hafhizah
PublisherUniversity of Bristol
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation

Page generated in 0.0018 seconds