• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

GENERATING SQL FROM NATURAL LANGUAGE IN FEW-SHOT AND ZERO-SHOT SCENARIOS

Asplund, Liam January 2024 (has links)
Making information stored in databases more accessible to users inexperienced in structured query language (SQL) by converting natural language to SQL queries has long been a prominent research area in both the database and natural language processing (NLP) communities. There have been numerous approaches proposed for this task, such as encoder-decoder frameworks, semantic grammars, and more recently with the use of large language models (LLMs). When training LLMs to successfully generate SQL queries from natural language questions there are three notable methods used, pretraining, transfer learning and in-context learning (ICL). ICL is particularly advantageous in scenarios where the hardware at hand is limited, time is of concern and large amounts of task specific labled data is nonexistent. This study seeks to evaluate two strategies in ICL, namely zero-shot and few-shot scenarios using the Mistral-7B-Instruct LLM. Evaluation of the few-shot scenarios was conducted using two techniques, random selection and Jaccard Similarity. The zero-shot scenarios served as a baseline for the few-shot scenarios to overcome, which ended as anticipated, with the few-shot scenarios using Jaccard similarity outperforming the other two methods, followed by few-shot scenarios using random selection coming in at second best, and the zero-shot scenarios performing the worst. Evaluation results acquired based on execution accuracy and exact matching accuracy confirm that leveraging similarity in demonstrating examples when prompting the LLM will enhance the models knowledge about the database schema and table names which is used during the inference phase leadning to more accurately generated SQL queries than leveraging diversity in demonstrating examples.

Page generated in 0.0523 seconds