The meaning of spatial words can only be evaluated by establishing a reference to the properties of the environment in which the word is used. For example, in order to evaluate what is to the left of something or how fast is fast in a given context, we need to evaluate properties such as the position of objects in the scene, their typical function and behaviour, the size of the scene and the perspective from which the scene is viewed. Rather than encoding the semantic rules that define spatial expressions by hand, we developed a system where such rules are learned from descriptions produced by human commentators and information that a mobile robot has about itself and its environment. We concentrate on two scenarios and words that are used in them. In the first scenario, the robot is moving in an enclosed space and the descriptions refer to its motion ('You're going forward slowly' and 'Now you're turning right'). In the second scenario, the robot is static in an enclosed space which contains real-size objects such as desks, chairs and walls. Here we are primarily interested in prepositional phrases that describe relationships between objects ('The chair is to the left of you' and 'The table is further away than the chair'). The perspective can be varied by changing the location of the robot. Following the learning stage, which is performed offline, the system is able to use this domain specific knowledge to generate new descriptions in new environments or to 'understand' these expressions by providing feedback to the user, either linguistically or by performing motion actions. If a robot can be taught to 'understand' and use such expressions in a manner that would seem natural to a human observer, then we can be reasonably sure that we have captured at least something important about their semantics. Two kinds of evaluation were performed. First, the performance of machine learning classifiers was evaluated on independent test sets using 10-fold cross-validation. A comparison of classifier performance (in regard to their accuracy, the Kappa coefficient (κ), ROC and Precision-Recall graphs) is made between (a) the machine learning algorithms used to build them, (b) conditions under which the learning datasets were created and (c) the method by which data was structured into examples or instances for learning. Second, with some additional knowledge required to build a simple dialogue interface, the classifiers were tested live against human evaluators in a new environment. The results show that the system is able to learn semantics of spatial expressions from low level robotic data. For example, a group of human evaluators judged that the live system generated a correct description of motion in 93.47% of cases (the figure is averaged over four categories) and that it generated the correct description of object relation in 59.28% of cases.
Identifer | oai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:517107 |
Date | January 2009 |
Creators | Dobnik, Simon |
Contributors | Klein, Ewan ; Dalrymple, Mary ; Pulman, Stephen |
Publisher | University of Oxford |
Source Sets | Ethos UK |
Detected Language | English |
Type | Electronic Thesis or Dissertation |
Source | http://ora.ox.ac.uk/objects/uuid:d3e8d606-212b-4a8e-ba9b-9c59cfd3f485 |
Page generated in 0.0014 seconds