Statistical data about cities, regions and at country level is collected for various purposes and from various institutions. Yet, while
access to high quality and recent such data is crucial both for decision makers as well as for the public, all to often such collections of
data remain isolated and not re-usable, let alone properly integrated. In this paper we present the Open City Data Pipeline, a focused
attempt to collect, integrate, and enrich statistical data collected at city level worldwide, and republish this data in a reusable manner
as Linked Data. The main feature of the Open City Data Pipeline are: (i) we integrate and cleanse data from several sources in a
modular and extensible, always up-to-date fashion; (ii) we use both Machine Learning techniques as well as ontological reasoning
over equational background knowledge to enrich the data by imputing missing values, (iii) we assess the estimated accuracy of such
imputations per indicator. Additionally, (iv) we make the integrated and enriched data available both in a we browser interface and as
machine-readable Linked Data, using standard vocabularies such as QB and PROV, and linking to e.g. DBpedia.
Lastly, in an exhaustive evaluation of our approach, we compare our enrichment and cleansing techniques to a preliminary version
of the Open City Data Pipeline presented at ISWC2015: firstly, we demonstrate that the combination of equational knowledge and
standard machine learning techniques significantly helps to improve the quality of our missing value imputations; secondly, we
arguable show that the more data we integrate, the more reliable our predictions become. Hence, over time, the Open City Data
Pipeline shall provide a sustainable effort to serve Linked Data about cities in increasing quality. / Series: Working Papers on Information Systems, Information Business and Operations
Identifer | oai:union.ndltd.org:VIENNA/oai:epub.wu-wien.ac.at:5438 |
Date | 02 1900 |
Creators | Bischof, Stefan, Kämpgen, Benedikt, Harth, Andreas, Polleres, Axel, Schneider, Patrik |
Publisher | Department für Informationsverarbeitung und Prozessmanagement, WU Vienna University of Economics and Business |
Source Sets | Wirtschaftsuniversität Wien |
Language | English |
Detected Language | English |
Type | Paper, NonPeerReviewed |
Format | application/pdf |
Relation | https://doi.org/10.1016/j.websem.2017.09.003, http://epub.wu.ac.at/5438/ |
Page generated in 0.0017 seconds