Language grounding is the linking of a concept or object with its representation.This thesis tries to answer the question of whether stateof-the-art solutions for object detection and speech recognition are sufficientfor a simple language grounding system implemented on a Pepperrobot. To answer this question, three different speech recognitionand three different language grounding implementations were tested incustom datasets. The solutions for speech recognition and languagegrounding that fit better the system needs were chosen and they wereimplemented in the robot Pepper to conduct an experiment in which anobject was referred to by describing it out loud. This was done 19 timeswith different queries. The results obtained from the speech recognitionshow that Vosk is the best implementation of the tested software havingan average word error rate of 8.37%. For the language grounding,the best results were obtained from the MAttNet model, providing up toa 75.00% accuracy. Finally, the language grounding system using Pepperwas able to identify 52.63% of the queries. The conclusion is that thestate-of-the-art solutions are sufficient for a simple language groundingsystem implemented on a Pepper robot, but the implementation used inthis thesis may be improved as it is explained in the last section of thisdocument. Also, future work is proposed.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:umu-196485 |
Date | January 2022 |
Creators | Pey Comas, Ferriol |
Publisher | Umeå universitet, Institutionen för datavetenskap |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Relation | UMNAD ; 1319 |
Page generated in 0.0017 seconds