Skip to content

Natural Language to Bug Free SQL

Note

To download this example as a Jupyter notebook, click here.

In this example, we will use Guardrails to generate SQL queries from natural language. We will check that the SQL is free of any syntax errors.

Objective

We want to generate SQL queries from natural language, and check that the SQL is free of any syntax errors.

Step 0: Setup

In order to run this example, you will need to install the sqlvalidator package. You can do so by running the following command:

!pip install sqlvalidator
Requirement already satisfied: sqlvalidator in /home/zayd/workspace/guardrails-poc/.venv/lib/python3.11/site-packages (0.0.20)

Step 1: Create the RAIL Spec

Ordinarily, we would create an RAIL spec in a separate file. For the purposes of this example, we will create the spec in this notebook as a string following the RAIL syntax. For more information on RAIL, see the RAIL documentation. We will also show the same RAIL spec in a code-first format using a Pydantic model.

In this RAIL spec, we:

  1. Create an output schema that returns a single key-value pair. The key should be 'generated_sql' and the value should be the SQL query generated from the natural language, which is syntactically correct.

XML option:

rail_str = """
<rail version="0.1">
<output>
<string description="Generate SQL for the given natural language instruction." format="bug-free-sql" name="generated_sql" on-fail-bug-free-sql="reask"></string>
</output>
<prompt>

Generate a valid SQL query for the following natural language instruction:

${nl_instruction}

${gr.complete_json_suffix}
</prompt>
</rail>
"""

Pydantic model option:

from guardrails.validators import BugFreeSQL
from pydantic import BaseModel, Field

prompt = """

Generate a valid SQL query for the following natural language instruction:

${nl_instruction}

${gr.complete_json_suffix}
"""

class ValidSql(BaseModel):
    generated_sql: str = Field(description="Generate SQL for the given natural language instruction.", validators=[BugFreeSQL(on_fail="reask")])

Note

In order to ensure that the SQL is syntactically correct, we use bug-free-sql as the formatter. This is a light-weight formatter that uses the sqlvalidator package to check that the SQL is free of any syntax errors. For your use case, you can create a custom SQL validator that connects to your database and checks that the SQL is valid.

Step 2: Create a Guard object with the RAIL Spec

We create a gd.Guard object that will check, validate and correct the output of the LLM. This object:

  1. Enforces the quality criteria specified in the RAIL spec.
  2. Takes corrective action when the quality criteria are not met.
  3. Compiles the schema and type info from the RAIL spec and adds it to the prompt.
import guardrails as gd

from rich import print

Create the guard from the XML RAIL spec string:

guard = gd.Guard.from_rail_string(rail_str)

Or from the Pydantic model:

guard = gd.Guard.from_pydantic(output_class=ValidSql, prompt=prompt)

We see the prompt that will be sent to the LLM:

print(guard.base_prompt)

Generate a valid SQL query for the following natural language instruction:

${nl_instruction}


Given below is XML that describes the information to extract from this document and the tags to extract it into.

<output>
    <string name="generated_sql" format="bug-free-sql" description="Generate SQL for the given natural language 
instruction."/>
</output>


ONLY return a valid JSON object (no other text is necessary), where the key of the field in JSON is the `name` 
attribute of the corresponding XML, and the value is of the type specified by the corresponding XML's tag. The JSON
MUST conform to the XML format, including any types and format requests e.g. requests for lists, objects and 
specific types. Be correct and concise. If you are unsure anywhere, enter `null`.

Here are examples of simple (XML, JSON) pairs that show the expected behavior:
- `<string name='foo' format='two-words lower-case' />` => `{'foo': 'example one'}`
- `<list name='bar'><string format='upper-case' /></list>` => `{"bar": ['STRING ONE', 'STRING TWO', etc.]}`
- `<object name='baz'><string name="foo" format="capitalize two-words" /><integer name="index" format="1-indexed" 
/></object>` => `{'baz': {'foo': 'Some String', 'index': 1}}`


Here, nl_language is the natural language instruction and will be provided by the user at runtime.

Step 3: Wrap the LLM API call with Guard

import openai

raw_llm_response, validated_response = guard(
    openai.Completion.create,
    prompt_params={
        "nl_instruction": "Select the name of the employee who has the highest salary."
    },
    engine="text-davinci-003",
    max_tokens=2048,
    temperature=0,
)
Async event loop found, but guard was invoked synchronously.For validator parallelization, please call `validate_async` instead.

The guard wrapper returns the raw_llm_respose (which is a simple string), and the validated and corrected output (which is a dictionary).

We can see that the output is a dictionary with the correct schema and types.

print(validated_response)
{'generated_sql': 'SELECT name FROM employee ORDER BY salary DESC LIMIT 1'}
print(guard.state.most_recent_call.tree)
Logs
└── ╭────────────────────────────────────────────────── Step 0 ───────────────────────────────────────────────────╮
    │ ╭──────────────────────────────────────────────── Prompt ─────────────────────────────────────────────────╮ │
    │ │                                                                                                         │ │
    │ │                                                                                                         │ │
    │ │ Generate a valid SQL query for the following natural language instruction:                              │ │
    │ │                                                                                                         │ │
    │ │ Select the name of the employee who has the highest salary.                                             │ │
    │ │                                                                                                         │ │
    │ │                                                                                                         │ │
    │ │ Given below is XML that describes the information to extract from this document and the tags to extract │ │
    │ │ it into.                                                                                                │ │
    │ │                                                                                                         │ │
    │ │ <output>                                                                                                │ │
    │ │     <string name="generated_sql" format="bug-free-sql" description="Generate SQL for the given natural  │ │
    │ │ language instruction."/>                                                                                │ │
    │ │ </output>                                                                                               │ │
    │ │                                                                                                         │ │
    │ │                                                                                                         │ │
    │ │ ONLY return a valid JSON object (no other text is necessary), where the key of the field in JSON is the │ │
    │ │ `name` attribute of the corresponding XML, and the value is of the type specified by the corresponding  │ │
    │ │ XML's tag. The JSON MUST conform to the XML format, including any types and format requests e.g.        │ │
    │ │ requests for lists, objects and specific types. Be correct and concise. If you are unsure anywhere,     │ │
    │ │ enter `null`.                                                                                           │ │
    │ │                                                                                                         │ │
    │ │ Here are examples of simple (XML, JSON) pairs that show the expected behavior:                          │ │
    │ │ - `<string name='foo' format='two-words lower-case' />` => `{'foo': 'example one'}`                     │ │
    │ │ - `<list name='bar'><string format='upper-case' /></list>` => `{"bar": ['STRING ONE', 'STRING TWO',     │ │
    │ │ etc.]}`                                                                                                 │ │
    │ │ - `<object name='baz'><string name="foo" format="capitalize two-words" /><integer name="index"          │ │
    │ │ format="1-indexed" /></object>` => `{'baz': {'foo': 'Some String', 'index': 1}}`                        │ │
    │ │                                                                                                         │ │
    │ │                                                                                                         │ │
    │ │                                                                                                         │ │
    │ │ Json Output:                                                                                            │ │
    │ │                                                                                                         │ │
    │ │                                                                                                         │ │
    │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │
    │ ╭──────────────────────────────────────────── Message History ────────────────────────────────────────────╮ │
    │ │ ┏━━━━━━┳━━━━━━━━━┓                                                                                      │ │
    │ │ ┃ Role  Content ┃                                                                                      │ │
    │ │ ┡━━━━━━╇━━━━━━━━━┩                                                                                      │ │
    │ │ └──────┴─────────┘                                                                                      │ │
    │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │
    │ ╭──────────────────────────────────────────── Raw LLM Output ─────────────────────────────────────────────╮ │
    │ │ {"generated_sql": "SELECT name FROM employee ORDER BY salary DESC LIMIT 1"}                             │ │
    │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │
    │ ╭─────────────────────────────────────────── Validated Output ────────────────────────────────────────────╮ │
    │ │ {'generated_sql': 'SELECT name FROM employee ORDER BY salary DESC LIMIT 1'}                             │ │
    │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │
    ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯