Skip to content

Summarize text accurately

Note

To download this example as a Jupyter notebook, click here.

In this example, we will use Guardrails in the summarization of a text document. We will check whether the summarized document has a high semantic similarity with the original document.

Objective

Summarize a text document and check whether the summarized document has a high semantic similarity with the original document.

Step 0: Setup

In order to run this example, you will need to install the numpy package. You can do so by running the following commands:

!pip install numpy
Requirement already satisfied: numpy in /Users/calebcourier/Projects/guardrails-ai/guardrails/.venv/lib/python3.11/site-packages (1.25.2)

Step 1: Create the RAIL Spec

Ordinarily, we would create an RAIL spec in a separate file. For the purposes of this example, we will create the spec in this notebook as a string following the RAIL syntax. For more information on RAIL, see the RAIL documentation. We will also show the same RAIL spec in a code-first format using a Pydantic model.

In this RAIL spec, we:

  1. Create an output schema that returns a single key-value pair. The key should be 'summary', and the value should be the summary of the given document.

First let's open our document:

with open('data/article1.txt', 'r') as file:
    document = file.read()
    file.seek(0)
    content = ''.join(line.strip() for line in file.readlines())

Next we can define our RAIL spec either as a XML string:

from string import Template

rail_str = Template("""
<rail version="0.1">
<output>
<string description="Summarize the given document faithfully." format="similar-to-document: {${document}}, 0.60" name="summary" on-fail-similar-to-document="filter"></string>
</output>
<prompt>
Summarize the following document:

${document}

${gr.complete_json_suffix}
</prompt>
</rail>
""").safe_substitute(document=document)

Or as a Pydantic model:

from pydantic import BaseModel, Field
from guardrails.validators import SimilarToDocument

prompt = """
Summarize the following document:

${document}

${gr.complete_json_suffix}
"""

class DocumentSummary(BaseModel):
    summary: str = Field(
        description="Summarize the given document faithfully.",
        validators=[SimilarToDocument(document=f"'{content}'", threshold=0.60, on_fail="filter")]
    )

Note

In order to ensure the summary is similar to the document, we use similar-to-document as the validator. This validator embeds the document and the summary and checks whether the cosine similarity between the two embeddings is above a threshold.

Step 2: Create a Guard object with the RAIL Spec

We create a gd.Guard object that will check, validate and correct the output of the LLM. This object:

  1. Enforces the quality criteria specified in the RAIL spec.
  2. Takes corrective action when the quality criteria are not met.
  3. Compiles the schema and type info from the RAIL spec and adds it to the prompt.
import guardrails as gd

from rich import print

From our RAIL string:

guard = gd.Guard.from_rail_string(rail_str)

Or from our Pydantic model:

guard = gd.Guard.from_pydantic(output_class=DocumentSummary, prompt=prompt)

We see the prompt that will be sent to the LLM:

print(guard.base_prompt)
Summarize the following document:

${document}


Given below is XML that describes the information to extract from this document and the tags to extract it into.

<output>
    <string name="summary" description="Summarize the given document faithfully."/>
</output>


ONLY return a valid JSON object (no other text is necessary), where the key of the field in JSON is the `name` 
attribute of the corresponding XML, and the value is of the type specified by the corresponding XML's tag. The JSON
MUST conform to the XML format, including any types and format requests e.g. requests for lists, objects and 
specific types. Be correct and concise. If you are unsure anywhere, enter `null`.

Here are examples of simple (XML, JSON) pairs that show the expected behavior:
- `<string name='foo' format='two-words lower-case' />` => `{'foo': 'example one'}`
- `<list name='bar'><string format='upper-case' /></list>` => `{"bar": ['STRING ONE', 'STRING TWO', etc.]}`
- `<object name='baz'><string name="foo" format="capitalize two-words" /><integer name="index" format="1-indexed" 
/></object>` => `{'baz': {'foo': 'Some String', 'index': 1}}`


Here, statement_to_be_translated is the the statement and will be provided by the user at runtime.

Step 3: Wrap the LLM API call with Guard

First, let's try translating a statement that doesn't have any profanity in it.

import openai

raw_llm_response, validated_response = guard(
    openai.Completion.create,
    prompt_params={'document': document},
    engine='text-davinci-003',
    max_tokens=2048,
    temperature=0
)

print(f"Validated Output: {validated_response}")
Async event loop found, but guard was invoked synchronously.For validator parallelization, please call `validate_async` instead.

Validated Output: {'summary': 'The US Congress consists of a Senate and House of Representatives, with the House of
Representatives being chosen every two years by the people of the several states. Representatives must be at least 
25 years old and have been a citizen of the US for seven years. Representation and taxes are apportioned among the 
states according to their population, and the number of representatives cannot exceed one for every 30,000 people. 
Vacancies are filled by the executive authority of the state. The House of Representatives chooses its speaker and 
other officers, and has the sole power of impeachment.'}

In order to see a detailed look into the logs of the Guard object, we can print the Guard state history:

print(guard.state.most_recent_call.tree)
Logs
└── ╭────────────────────────────────────────────────── Step 0 ───────────────────────────────────────────────────╮
    │ ╭──────────────────────────────────────────────── Prompt ─────────────────────────────────────────────────╮ │
    │ │                                                                                                         │ │
    │ │ Summarize the following document:                                                                       │ │
    │ │                                                                                                         │ │
    │ │ Section. 1.                                                                                             │ │
    │ │ All legislative Powers herein granted shall be vested in a Congress of the United States, which shall   │ │
    │ │ consist of a Senate and House of Representatives.                                                       │ │
    │ │                                                                                                         │ │
    │ │ Section. 2.                                                                                             │ │
    │ │ The House of Representatives shall be composed of Members chosen every second Year by the People of the │ │
    │ │ several States, and the Electors in each State shall have the Qualifications requisite for Electors of  │ │
    │ │ the most numerous Branch of the State Legislature.                                                      │ │
    │ │                                                                                                         │ │
    │ │ No Person shall be a Representative who shall not have attained to the Age of twenty five Years, and    │ │
    │ │ been seven Years a Citizen of the United States, and who shall not, when elected, be an Inhabitant of   │ │
    │ │ that State in which he shall be chosen.                                                                 │ │
    │ │                                                                                                         │ │
    │ │ Representatives and direct Taxes shall be apportioned among the several States which may be included    │ │
    │ │ within this Union, according to their respective Numbers, which shall be determined by adding to the    │ │
    │ │ whole Number of free Persons, including those bound to Service for a Term of Years, and excluding       │ │
    │ │ Indians not taxed, three fifths of all other Persons. The actual Enumeration shall be made within three │ │
    │ │ Years after the first Meeting of the Congress of the United States, and within every subsequent Term of │ │
    │ │ ten Years, in such Manner as they shall by Law direct. The Number of Representatives shall not exceed   │ │
    │ │ one for every thirty Thousand, but each State shall have at Least one Representative; and until such    │ │
    │ │ enumeration shall be made, the State of New Hampshire shall be entitled to chuse three, Massachusetts   │ │
    │ │ eight, Rhode-Island and Providence Plantations one, Connecticut five, New-York six, New Jersey four,    │ │
    │ │ Pennsylvania eight, Delaware one, Maryland six, Virginia ten, North Carolina five, South Carolina five, │ │
    │ │ and Georgia three.                                                                                      │ │
    │ │                                                                                                         │ │
    │ │ When vacancies happen in the Representation from any State, the Executive Authority thereof shall issue │ │
    │ │ Writs of Election to fill such Vacancies.                                                               │ │
    │ │                                                                                                         │ │
    │ │ The House of Representatives shall chuse their Speaker and other Officers; and shall have the sole      │ │
    │ │ Power of Impeachment.                                                                                   │ │
    │ │                                                                                                         │ │
    │ │                                                                                                         │ │
    │ │ Given below is XML that describes the information to extract from this document and the tags to extract │ │
    │ │ it into.                                                                                                │ │
    │ │                                                                                                         │ │
    │ │ <output>                                                                                                │ │
    │ │     <string name="summary" description="Summarize the given document faithfully."/>                     │ │
    │ │ </output>                                                                                               │ │
    │ │                                                                                                         │ │
    │ │                                                                                                         │ │
    │ │ ONLY return a valid JSON object (no other text is necessary), where the key of the field in JSON is the │ │
    │ │ `name` attribute of the corresponding XML, and the value is of the type specified by the corresponding  │ │
    │ │ XML's tag. The JSON MUST conform to the XML format, including any types and format requests e.g.        │ │
    │ │ requests for lists, objects and specific types. Be correct and concise. If you are unsure anywhere,     │ │
    │ │ enter `null`.                                                                                           │ │
    │ │                                                                                                         │ │
    │ │ Here are examples of simple (XML, JSON) pairs that show the expected behavior:                          │ │
    │ │ - `<string name='foo' format='two-words lower-case' />` => `{'foo': 'example one'}`                     │ │
    │ │ - `<list name='bar'><string format='upper-case' /></list>` => `{"bar": ['STRING ONE', 'STRING TWO',     │ │
    │ │ etc.]}`                                                                                                 │ │
    │ │ - `<object name='baz'><string name="foo" format="capitalize two-words" /><integer name="index"          │ │
    │ │ format="1-indexed" /></object>` => `{'baz': {'foo': 'Some String', 'index': 1}}`                        │ │
    │ │                                                                                                         │ │
    │ │                                                                                                         │ │
    │ │                                                                                                         │ │
    │ │ Json Output:                                                                                            │ │
    │ │                                                                                                         │ │
    │ │                                                                                                         │ │
    │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │
    │ ╭──────────────────────────────────────────── Message History ────────────────────────────────────────────╮ │
    │ │ ┏━━━━━━┳━━━━━━━━━┓                                                                                      │ │
    │ │ ┃ Role  Content ┃                                                                                      │ │
    │ │ ┡━━━━━━╇━━━━━━━━━┩                                                                                      │ │
    │ │ └──────┴─────────┘                                                                                      │ │
    │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │
    │ ╭──────────────────────────────────────────── Raw LLM Output ─────────────────────────────────────────────╮ │
    │ │ {"summary": "The US Congress consists of a Senate and House of Representatives, with the House of       │ │
    │ │ Representatives being chosen every two years by the people of the several states. Representatives must  │ │
    │ │ be at least 25 years old and have been a citizen of the US for seven years. Representation and taxes    │ │
    │ │ are apportioned among the states according to their population, and the number of representatives       │ │
    │ │ cannot exceed one for every 30,000 people. Vacancies are filled by the executive authority of the       │ │
    │ │ state. The House of Representatives chooses its speaker and other officers, and has the sole power of   │ │
    │ │ impeachment."}                                                                                          │ │
    │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │
    │ ╭─────────────────────────────────────────── Validated Output ────────────────────────────────────────────╮ │
    │ │ {                                                                                                       │ │
    │ │     'summary': 'The US Congress consists of a Senate and House of Representatives, with the House of    │ │
    │ │ Representatives being chosen every two years by the people of the several states. Representatives must  │ │
    │ │ be at least 25 years old and have been a citizen of the US for seven years. Representation and taxes    │ │
    │ │ are apportioned among the states according to their population, and the number of representatives       │ │
    │ │ cannot exceed one for every 30,000 people. Vacancies are filled by the executive authority of the       │ │
    │ │ state. The House of Representatives chooses its speaker and other officers, and has the sole power of   │ │
    │ │ impeachment.'                                                                                           │ │
    │ │ }                                                                                                       │ │
    │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │
    ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

The guard wrapper returns the raw_llm_respose (which is a simple string), and the validated and corrected output (which is a dictionary). We can see that the output is a dictionary with the correct schema and types.

Next, let's try using a smaller model, which is not going to be good at summarization. We can see that the output is filtered out.

raw_llm_response, validated_response = guard(
    openai.Completion.create,
    prompt_params={'document': open("data/article1.txt", "r").read()},
    engine='text-ada-001',
    max_tokens=512,
    temperature=0
)

print(f"Validated Output: {validated_response}")
Validated Output: None

We can see the step-wise history of the Guard object below:

print(guard.state.most_recent_call.tree)
Logs
└── ╭────────────────────────────────────────────────── Step 0 ───────────────────────────────────────────────────╮
    │ ╭──────────────────────────────────────────────── Prompt ─────────────────────────────────────────────────╮ │
    │ │                                                                                                         │ │
    │ │ Summarize the following document:                                                                       │ │
    │ │                                                                                                         │ │
    │ │ Section. 1.                                                                                             │ │
    │ │ All legislative Powers herein granted shall be vested in a Congress of the United States, which shall   │ │
    │ │ consist of a Senate and House of Representatives.                                                       │ │
    │ │                                                                                                         │ │
    │ │ Section. 2.                                                                                             │ │
    │ │ The House of Representatives shall be composed of Members chosen every second Year by the People of the │ │
    │ │ several States, and the Electors in each State shall have the Qualifications requisite for Electors of  │ │
    │ │ the most numerous Branch of the State Legislature.                                                      │ │
    │ │                                                                                                         │ │
    │ │ No Person shall be a Representative who shall not have attained to the Age of twenty five Years, and    │ │
    │ │ been seven Years a Citizen of the United States, and who shall not, when elected, be an Inhabitant of   │ │
    │ │ that State in which he shall be chosen.                                                                 │ │
    │ │                                                                                                         │ │
    │ │ Representatives and direct Taxes shall be apportioned among the several States which may be included    │ │
    │ │ within this Union, according to their respective Numbers, which shall be determined by adding to the    │ │
    │ │ whole Number of free Persons, including those bound to Service for a Term of Years, and excluding       │ │
    │ │ Indians not taxed, three fifths of all other Persons. The actual Enumeration shall be made within three │ │
    │ │ Years after the first Meeting of the Congress of the United States, and within every subsequent Term of │ │
    │ │ ten Years, in such Manner as they shall by Law direct. The Number of Representatives shall not exceed   │ │
    │ │ one for every thirty Thousand, but each State shall have at Least one Representative; and until such    │ │
    │ │ enumeration shall be made, the State of New Hampshire shall be entitled to chuse three, Massachusetts   │ │
    │ │ eight, Rhode-Island and Providence Plantations one, Connecticut five, New-York six, New Jersey four,    │ │
    │ │ Pennsylvania eight, Delaware one, Maryland six, Virginia ten, North Carolina five, South Carolina five, │ │
    │ │ and Georgia three.                                                                                      │ │
    │ │                                                                                                         │ │
    │ │ When vacancies happen in the Representation from any State, the Executive Authority thereof shall issue │ │
    │ │ Writs of Election to fill such Vacancies.                                                               │ │
    │ │                                                                                                         │ │
    │ │ The House of Representatives shall chuse their Speaker and other Officers; and shall have the sole      │ │
    │ │ Power of Impeachment.                                                                                   │ │
    │ │                                                                                                         │ │
    │ │                                                                                                         │ │
    │ │ Given below is XML that describes the information to extract from this document and the tags to extract │ │
    │ │ it into.                                                                                                │ │
    │ │                                                                                                         │ │
    │ │ <output>                                                                                                │ │
    │ │     <string name="summary" description="Summarize the given document faithfully."/>                     │ │
    │ │ </output>                                                                                               │ │
    │ │                                                                                                         │ │
    │ │                                                                                                         │ │
    │ │ ONLY return a valid JSON object (no other text is necessary), where the key of the field in JSON is the │ │
    │ │ `name` attribute of the corresponding XML, and the value is of the type specified by the corresponding  │ │
    │ │ XML's tag. The JSON MUST conform to the XML format, including any types and format requests e.g.        │ │
    │ │ requests for lists, objects and specific types. Be correct and concise. If you are unsure anywhere,     │ │
    │ │ enter `null`.                                                                                           │ │
    │ │                                                                                                         │ │
    │ │ Here are examples of simple (XML, JSON) pairs that show the expected behavior:                          │ │
    │ │ - `<string name='foo' format='two-words lower-case' />` => `{'foo': 'example one'}`                     │ │
    │ │ - `<list name='bar'><string format='upper-case' /></list>` => `{"bar": ['STRING ONE', 'STRING TWO',     │ │
    │ │ etc.]}`                                                                                                 │ │
    │ │ - `<object name='baz'><string name="foo" format="capitalize two-words" /><integer name="index"          │ │
    │ │ format="1-indexed" /></object>` => `{'baz': {'foo': 'Some String', 'index': 1}}`                        │ │
    │ │                                                                                                         │ │
    │ │                                                                                                         │ │
    │ │                                                                                                         │ │
    │ │ Json Output:                                                                                            │ │
    │ │                                                                                                         │ │
    │ │                                                                                                         │ │
    │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │
    │ ╭──────────────────────────────────────────── Message History ────────────────────────────────────────────╮ │
    │ │ ┏━━━━━━┳━━━━━━━━━┓                                                                                      │ │
    │ │ ┃ Role  Content ┃                                                                                      │ │
    │ │ ┡━━━━━━╇━━━━━━━━━┩                                                                                      │ │
    │ │ └──────┴─────────┘                                                                                      │ │
    │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │
    │ ╭──────────────────────────────────────────── Raw LLM Output ─────────────────────────────────────────────╮ │
    │ │     <string name="summary" description="Summarize the given document faithfully."/>                     │ │
    │ │ </string>                                                                                               │ │
    │ │                                                                                                         │ │
    │ │ The House of Representatives shall chuse their Speaker and other Officers; and shall have the sole      │ │
    │ │ Power of Impeachment.                                                                                   │ │
    │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │
    │ ╭─────────────────────────────────────────── Validated Output ────────────────────────────────────────────╮ │
    │ │ None                                                                                                    │ │
    │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │
    ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯