Fuhad Abdulla
Can we chat with SQL databases using Natural Language instead of writing complex queries? π€π¬
π Tech Stack:
- ποΈ Database: SQLite
- π₯οΈ Interface: Dash (to integrate with the Saudi International Trade Analysis Project)
- π Programming Language: Python
- π§ LLM: OpenAI (model gpt-4o)
- π Data: The Saudi International Trade data was originally in an Excel file, which I converted to SQLite using Python for this project. πΈπ¦

As a Data Analyst, staying updated with the latest technology is crucial. ππ
I recently came across a video where Uber leveraged Large Language Models (LLMs) to generate SQL queries. This approach significantly reduced query creation timeβfrom 10 minutes using traditional methods to just 3 minutes with AI. ππ¨
Inspired by this, I wanted to build a smaller-scale version of their system.
How Uber Implemented It π¨βπ»
Uber used multiple specialized LLM agents, each trained for specific tasks, such as:
- ποΈ Selecting the appropriate table
- π§βπ» Constructing the SQL query
- β Validating the output
With a similar approach, we can achieve 70-80% accuracy using two primary methods:
1. Table Schema Method ποΈ
This method involves passing the database schema to an LLM, such as OpenAIβs models, open-source LLMs on Groq, or locally hosted models via Ollama.
My Attempts:
- π Attempt 1: OpenAI Python Library
The LLM-generated queries didnβt make sense because the model lacked schema awareness. - π Attempt 2: System Prompt with Schema and OpenAI Python Library
I created asystem_prompt.txt
file containing:- π Table schema details
- π§ Table descriptions and when to use them
- π Instructions on which tables to JOIN
Result: The LLM started generating meaningful queries with accurate results! β
2. Retrieval-Augmented Generation (RAG) Method π
In this approach, a vector database stores previously successful queries, allowing the LLM to retrieve and adapt them for new queries. This improves accuracy as more data is added.
Why RAG is Better for Production:
- π The more past queries stored, the more accurate the AI becomes
- βοΈ Several tools, like Vanna AI, provide this functionality out of the box.
In Vanna AI, we can input documentation about the business, instructions for the AI, and it gets better and better as we use it! ππ€
If Security & Privacy Are a Concern π
- π₯οΈ Use local LLMs like Ollama or Mistral.
- ποΈ Store embeddings in a local vector database.
- π« Implement query validation (e.g., prevent DROP TABLE or other dangerous SQL injections).
- π‘οΈ Restrict execution to read-only databases for safety.
Cost & Performance Considerations πΈβ‘
Using OpenAIβs API or similar services can be expensive and slow, especially for large business databases with many tables. Instead, local or open-source LLMs running on a GPU offer a faster, cost-effective alternative. π°π₯οΈ