Return to search

GENERATING SQL FROM NATURAL LANGUAGE IN FEW-SHOT AND ZERO-SHOT SCENARIOS

Making information stored in databases more accessible to users inexperienced in structured query language (SQL) by converting natural language to SQL queries has long been a prominent research area in both the database and natural language processing (NLP) communities. There have been numerous approaches proposed for this task, such as encoder-decoder frameworks, semantic grammars, and more recently with the use of large language models (LLMs). When training LLMs to successfully generate SQL queries from natural language questions there are three notable methods used, pretraining, transfer learning and in-context learning (ICL). ICL is particularly advantageous in scenarios where the hardware at hand is limited, time is of concern and large amounts of task specific labled data is nonexistent. This study seeks to evaluate two strategies in ICL, namely zero-shot and few-shot scenarios using the Mistral-7B-Instruct LLM. Evaluation of the few-shot scenarios was conducted using two techniques, random selection and Jaccard Similarity. The zero-shot scenarios served as a baseline for the few-shot scenarios to overcome, which ended as anticipated, with the few-shot scenarios using Jaccard similarity outperforming the other two methods, followed by few-shot scenarios using random selection coming in at second best, and the zero-shot scenarios performing the worst. Evaluation results acquired based on execution accuracy and exact matching accuracy confirm that leveraging similarity in demonstrating examples when prompting the LLM will enhance the models knowledge about the database schema and table names which is used during the inference phase leadning to more accurately generated SQL queries than leveraging diversity in demonstrating examples.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:umu-227279
Date January 2024
CreatorsAsplund, Liam
PublisherUmeå universitet, Institutionen för datavetenskap
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess
RelationUMNAD ; 1494

Page generated in 0.002 seconds