Return to search

Learning from semantically heterogeneous aggregate data in a distributed environment

Information cooperation, reuse and integration can be developed on the platform of rapidly growing open distributed environments and can support development of Ambient Intelligence. However, in such environments, information may be only partially observed due to the unreliability of data collection technologies and heterogeneity in the ontologies employed caused by distributed and independent system development. These challenges need to be overcome to facilitate intelligent data analysis. We focus on the use of large-scale databases such as statistical databases and data warehouses, where aggregates can be obtained to summarise information; such aggregates are valuable in providing efficient access, computation and communication. A principle-based learning framework is proposed and developed for semantically heterogeneous aggregate data using maximum likelihood techniques via the EM (Expectation-Maximisation) algorithm. The learning framework inherently handles data incompleteness and schema heterogeneity from unreliable, incomplete or uncertain information sources. The framework is developed for supervised and unsupervised learning from data in a distributed environment. This development is demonstrated using two scenarios. In the first scenario a decision-making mechanism is proposed to support assistive living for elderly people in a smart home environment. The mechanism incorporates modules for learning inhabitants' activities of daily living based on partially observed and unlabelled data, enabling hierarchical activity prediction and assisting inhabitants in completing activities by providing personalised reminders. Real data have been collected in a smart kitchen laboratory, and realistic synthetic data are also used for evaluation. Results show consistent and robust performance and other information and insights are also obtained. In the second scenano a model-based clustering algorithm is proposed for independently developed distributed heterogeneous databases to support cooperation between organisations, including distributed smart homes from different institutions. Clustering in the presence of data heterogeneity enables the characteristics of similar contexts to be captured. The algorithm is systematically evaluated using simulated data, with encouraging results and good scalability to large numbers of databases.

Identiferoai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:551564
Date January 2009
CreatorsZhang, Shuai
PublisherUniversity of Ulster
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation

Page generated in 0.0016 seconds