Skip to content

Validators

This module contains the validators for the Guardrails framework.

The name with which a validator is registered is the name that is used in the RAIL spec to specify formatters.

BugFreePython

BugFreePython(on_fail: Optional[Union[Callable, str]] = None, **kwargs)

Validates that there are no Python syntactic bugs in the generated code.

This validator checks for syntax errors by running ast.parse(code), and will raise an exception if there are any. Only the packages in the python environment are available to the code snippet.

Key Properties

Property Description
Name for format attribute bug-free-python
Supported data types string
Programmatic fix None

BugFreeSQL

BugFreeSQL(conn: Optional[str] = None, schema_file: Optional[str] = None, on_fail: Optional[Callable] = None)

Validates that there are no SQL syntactic bugs in the generated code.

This is a very minimal implementation that uses the Pypi sqlvalidator package to check if the SQL query is valid. You can implement a custom SQL validator that uses a database connection to check if the query is valid.

Key Properties

Property Description
Name for format attribute bug-free-sql
Supported data types string
Programmatic fix None

DetectSecrets

DetectSecrets(on_fail: Union[Callable[..., Any], None] = None, **kwargs)

Validates whether the generated code snippet contains any secrets.

Key Properties | Property | Description | | ----------------------------- | --------------------------------- | | Name for format attribute | detect-secrets | | Supported data types | string | | Programmatic fix | None |

This validator uses the detect-secrets library to check whether the generated code snippet contains any secrets. If any secrets are detected, the validator fails and returns the generated code snippet with the secrets replaced with asterisks. Else the validator returns the generated code snippet.

Following are some caveats
  • Multiple secrets on the same line may not be caught. e.g.
    • Minified code
    • One-line lists/dictionaries
    • Multi-variable assignments
  • Multi-line secrets may not be caught. e.g.
    • RSA/SSH keys
Example

```py

guard = Guard.from_string(validators=[ DetectSecrets(on_fail="fix") ]) guard.parse( llm_output=code_snippet, )

mask instance-attribute

mask = '********'

EndpointIsReachable

EndpointIsReachable(on_fail: Optional[Union[Callable, str]] = None, **kwargs)

Validates that a value is a reachable URL.

Key Properties

Property Description
Name for format attribute is-reachable
Supported data types string,
Programmatic fix None

EndsWith

EndsWith(end: str, on_fail: str = 'fix')

Validates that a list ends with a given value.

Key Properties

Property Description
Name for format attribute ends-with
Supported data types list
Programmatic fix Append the given value to the list.

Arguments

Name Type Description Default
end str

The required last element.

required

ExcludeSqlPredicates

ExcludeSqlPredicates(predicates: List[str], on_fail: Optional[Callable] = None)

Validates that the SQL query does not contain certain predicates.

Key Properties

Property Description
Name for format attribute exclude-sql-predicates
Supported data types string
Programmatic fix None

Arguments

Name Type Description Default
predicates List[str]

The list of predicates to avoid.

required

ExtractedSummarySentencesMatch

ExtractedSummarySentencesMatch(threshold: float = 0.7, on_fail: Optional[Callable] = None, **kwargs: Optional[Dict[str, Any]])

Validates that the extracted summary sentences match the original text by performing a cosine similarity in the embedding space.

Key Properties

Property Description
Name for format attribute extracted-summary-sentences-match
Supported data types string
Programmatic fix Remove any sentences that can not be verified.

Parameters: Arguments

threshold: The minimum cosine similarity to be considered similar. Default to 0.7.

Other parameters: Metadata

filepaths (List[str]): A list of strings that specifies the filepaths for any documents that should be used for asserting the summary's similarity.
document_store (DocumentStoreBase, optional): The document store to use during validation. Defaults to EphemeralDocumentStore.
vector_db (VectorDBBase, optional): A vector database to use for embeddings.  Defaults to Faiss.
embedding_model (EmbeddingBase, optional): The embeddig model to use. Defaults to OpenAIEmbedding.

ExtractiveSummary

ExtractiveSummary(threshold: int = 85, on_fail: Optional[Callable] = None, **kwargs)

Validates that a string is a valid extractive summary of a given document.

This validator does a fuzzy match between the sentences in the summary and the sentences in the document. Each sentence in the summary must be similar to at least one sentence in the document. After the validation, the summary is updated to include the sentences from the document that were matched, and the citations for those sentences are added to the end of the summary.

Key Properties

Property Description
Name for format attribute extractive-summary
Supported data types string
Programmatic fix Remove any sentences that can not be verified.

Parameters: Arguments

threshold: The minimum fuzz ratio to be considered summarized.  Defaults to 85.

Other parameters: Metadata

filepaths (List[str]): A list of strings that specifies the filepaths for any documents that should be used for asserting the summary's similarity.

IsHighQualityTranslation

IsHighQualityTranslation(*args, **kwargs)

Using inpiredco.critique to check if a translation is high quality.

Key Properties

Property Description
Name for format attribute is-high-quality-translation
Supported data types string
Programmatic fix None

Metadata

Name Type Description
translation_source str

The source of the translation.

IsProfanityFree

IsProfanityFree(on_fail: Optional[Union[Callable, str]] = None, **kwargs)

Validates that a translated text does not contain profanity language.

This validator uses the alt-profanity-check package to check if a string contains profanity language.

Key Properties

Property Description
Name for format attribute is-profanity-free
Supported data types string
Programmatic fix None

LowerCase

LowerCase(on_fail: Optional[Union[Callable, str]] = None, **kwargs)

Validates that a value is lower case.

Key Properties

Property Description
Name for format attribute lower-case
Supported data types string
Programmatic fix Convert to lower case.

OneLine

OneLine(on_fail: Optional[Union[Callable, str]] = None, **kwargs)

Validates that a value is a single line or sentence.

Key Properties

Property Description
Name for format attribute one-line
Supported data types string
Programmatic fix Pick the first line.

ProvenanceV0

ProvenanceV0(threshold: float = 0.8, validation_method: str = 'sentence', on_fail: Optional[Callable] = None, **kwargs)

Validates that LLM-generated text matches some source text based on distance in embedding space.

Key Properties

Property Description
Name for format attribute provenance-v0
Supported data types string
Programmatic fix None

Arguments

Name Type Description Default
threshold float

The minimum cosine similarity between the generated text and the source text. Defaults to 0.8.

0.8
validation_method str

Whether to validate at the sentence level or over the full text. Must be one of sentence or full. Defaults to sentence

'sentence'

Metadata

Name Type Description
query_function Callable

A callable that takes a string and returns a list of (chunk, score) tuples.

sources List[str]

The source text.

embed_function Callable

A callable that creates embeddings for the sources. Must accept a list of strings and return an np.array of floats.

In order to use this validator, you must provide either a query_function or sources with an embed_function in the metadata.

If providing query_function, it should take a string as input and return a list of (chunk, score) tuples. The chunk is a string and the score is a float representing the cosine distance between the chunk and the input string. The list should be sorted in ascending order by score.

Note: The score should represent distance in embedding space, not similarity. I.e., lower is better and the score should be 0 if the chunk is identical to the input string.

Example
def query_function(text: str, k: int) -> List[Tuple[str, float]]:
    return [("This is a chunk", 0.9), ("This is another chunk", 0.8)]

guard = Guard.from_rail(...)
guard(
    openai.ChatCompletion.create(...),
    prompt_params={...},
    temperature=0.0,
    metadata={"query_function": query_function},
)

If providing sources, it should be a list of strings. The embed_function should take a string or a list of strings as input and return a np array of floats. The vector should be normalized to unit length.

Example
def embed_function(text: Union[str, List[str]]) -> np.ndarray:
    return np.array([[0.1, 0.2, 0.3]])

guard = Guard.from_rail(...)
guard(
    openai.ChatCompletion.create(...),
    prompt_params={...},
    temperature=0.0,
    metadata={
        "sources": ["This is a source text"],
        "embed_function": embed_function
    },
)

ProvenanceV1

ProvenanceV1(validation_method: str = 'sentence', llm_callable: Union[str, Callable] = 'gpt-3.5-turbo', top_k: int = 3, max_tokens: int = 2, on_fail: Optional[Callable] = None, **kwargs)

Validates that the LLM-generated text is supported by the provided contexts.

This validator uses an LLM callable to evaluate the generated text against the provided contexts (LLM-ception).

In order to use this validator, you must provide either: 1. a 'query_function' in the metadata. That function should take a string as input (the LLM-generated text) and return a list of relevant chunks. The list should be sorted in ascending order by the distance between the chunk and the LLM-generated text.

Example using str callable

def query_function(text: str, k: int) -> List[str]: ... return ["This is a chunk", "This is another chunk"]

guard = Guard.from_string(validators=[ ProvenanceV1(llm_callable="gpt-3.5-turbo", ...) ] ) guard.parse( ... llm_output=..., ... metadata={"query_function": query_function} ... )

Example using a custom llm callable

def query_function(text: str, k: int) -> List[str]: ... return ["This is a chunk", "This is another chunk"]

guard = Guard.from_string(validators=[ ProvenanceV1(llm_callable=your_custom_callable, ...) ] ) guard.parse( ... llm_output=..., ... metadata={"query_function": query_function} ... )

OR

  1. sources with an embed_function in the metadata. The embed_function should take a string or a list of strings as input and return a np array of floats. The vector should be normalized to unit length.
Example

```py def embed_function(text: Union[str, List[str]]) -> np.ndarray: return np.array([[0.1, 0.2, 0.3]])

guard = Guard.from_rail(...) guard( openai.ChatCompletion.create(...), prompt_params={...}, temperature=0.0, metadata={ "sources": ["This is a source text"], "embed_function": embed_function }, )

Parameters:

Name Type Description Default
validation_method str

Whether to validate at the sentence level or over the full text. One of sentence or full. Defaults to sentence

'sentence'
llm_callable Union[str, Callable]

Either the name of the OpenAI model, or a callable that takes a prompt and returns a response.

'gpt-3.5-turbo'
top_k int

The number of chunks to return from the query function. Defaults to 3.

3
max_tokens int

The maximum number of tokens to send to the LLM. Defaults to 2.

2

Metadata

Name Type Description
query_function Callable

A callable that takes a string and returns a list of chunks.

sources List[str]

The source text.

embed_function Callable

A callable that creates embeddings for the sources. Must accept a list of strings and returns float np.array.

QARelevanceLLMEval

QARelevanceLLMEval(llm_callable: Optional[Callable] = None, on_fail: Optional[Callable] = None, **kwargs)

Validates that an answer is relevant to the question asked by asking the LLM to self evaluate.

Key Properties

Property Description
Name for format attribute qa-relevance-llm-eval
Supported data types string
Programmatic fix None

Metadata

Name Type Description
question str

The original question the llm was given to answer.

ReadingTime

ReadingTime(reading_time: int, on_fail: str = 'fix')

Validates that the a string can be read in less than a certain amount of time.

Key Properties

Property Description
Name for format attribute reading-time
Supported data types string
Programmatic fix None

Parameters: Arguments

reading_time: The maximum reading time.

RegexMatch

RegexMatch(regex: str, match_type: Optional[str] = None, on_fail: Optional[Callable] = None)

Validates that a value matches a regular expression.

Key Properties

Property Description
Name for format attribute regex_match
Supported data types string
Programmatic fix Generate a string that matches the regular expression

Arguments

Name Type Description Default
regex str

Str regex pattern

required
match_type Optional[str]

Str in {"search", "fullmatch"} for a regex search or full-match option

None

RemoveRedundantSentences

RemoveRedundantSentences(threshold: int = 70, on_fail: Optional[Callable] = None, **kwargs)

Removes redundant sentences from a string.

This validator removes sentences from a string that are similar to other sentences in the string. This is useful for removing repetitive sentences from a string.

Key Properties

Property Description
Name for format attribute remove-redundant-sentences
Supported data types string
Programmatic fix Remove any redundant sentences.

Parameters: Arguments

threshold: The minimum fuzz ratio to be considered redundant.  Defaults to 70.

SaliencyCheck

SaliencyCheck(docs_dir: str, llm_callable: Optional[Callable] = None, on_fail: Optional[Callable] = None, threshold: float = 0.25, **kwargs)

Checks that the summary covers the list of topics present in the document.

Key Properties

Property Description
Name for format attribute saliency-check
Supported data types string
Programmatic fix None

Parameters: Arguments

docs_dir: Path to the directory containing the documents.
threshold: Threshold for overlap between topics in document and summary. Defaults to 0.25

Initialize the SalienceCheck validator.

Parameters:

Name Type Description Default
docs_dir str

Path to the directory containing the documents.

required
on_fail Optional[Callable]

Function to call when validation fails.

None
threshold float

Threshold for overlap between topics in document and summary.

0.25

SimilarToDocument

SimilarToDocument(document: str, threshold: float = 0.7, model: str = 'text-embedding-ada-002', on_fail: Optional[Callable] = None)

Validates that a value is similar to the document.

This validator checks if the value is similar to the document by checking the cosine similarity between the value and the document, using an embedding.

Key Properties

Property Description
Name for format attribute similar-to-document
Supported data types string
Programmatic fix None

Arguments

Name Type Description Default
document str

The document to use for the similarity check.

required
threshold float

The minimum cosine similarity to be considered similar. Defaults to 0.7.

0.7
model str

The embedding model to use. Defaults to text-embedding-ada-002.

'text-embedding-ada-002'

SimilarToList

SimilarToList(standard_deviations: int = 3, threshold: float = 0.1, on_fail: Optional[Callable] = None, **kwargs)

Validates that a value is similar to a list of previously known values.

Key Properties

Property Description
Name for format attribute similar-to-list
Supported data types string
Programmatic fix None

Arguments

Name Type Description Default
standard_deviations int

The number of standard deviations from the mean to check.

3
threshold float

The threshold for the average semantic similarity for strings.

0.1

For integer values, this validator checks whether the value lies within 'k' standard deviations of the mean of the previous values. (Assumes that the previous values are normally distributed.) For string values, this validator checks whether the average semantic similarity between the generated value and the previous values is less than a threshold.

SqlColumnPresence

SqlColumnPresence(cols: List[str], on_fail: Optional[Callable] = None)

Validates that all columns in the SQL query are present in the schema.

Key Properties

Property Description
Name for format attribute sql-column-presence
Supported data types string
Programmatic fix None

Arguments

Name Type Description Default
cols List[str]

The list of valid columns.

required

TwoWords

TwoWords(on_fail: Optional[Union[Callable, str]] = None, **kwargs)

Validates that a value is two words.

Key Properties

Property Description
Name for format attribute two-words
Supported data types string
Programmatic fix Pick the first two words.

UpperCase

UpperCase(on_fail: Optional[Union[Callable, str]] = None, **kwargs)

Validates that a value is upper case.

Key Properties

Property Description
Name for format attribute upper-case
Supported data types string
Programmatic fix Convert to upper case.

ValidChoices

ValidChoices(choices: List[Any], on_fail: Optional[Callable] = None)

Validates that a value is within the acceptable choices.

Key Properties

Property Description
Name for format attribute valid-choices
Supported data types all
Programmatic fix None

Arguments

Name Type Description Default
choices List[Any]

The list of valid choices.

required

ValidLength

ValidLength(min: Optional[int] = None, max: Optional[int] = None, on_fail: Optional[Callable] = None)

Validates that the length of value is within the expected range.

Key Properties

Property Description
Name for format attribute length
Supported data types string, list, object
Programmatic fix If shorter than the minimum, pad with empty last elements. If longer than the maximum, truncate.

Arguments

Name Type Description Default
min Optional[int]

The inclusive minimum length.

None
max Optional[int]

The inclusive maximum length.

None

ValidRange

ValidRange(min: Optional[int] = None, max: Optional[int] = None, on_fail: Optional[Callable] = None)

Validates that a value is within a range.

Key Properties

Property Description
Name for format attribute valid-range
Supported data types integer, float, percentage
Programmatic fix Closest value within the range.

Arguments

Name Type Description Default
min Optional[int]

The inclusive minimum value of the range.

None
max Optional[int]

The inclusive maximum value of the range.

None

ValidURL

ValidURL(on_fail: Optional[Union[Callable, str]] = None, **kwargs)

Validates that a value is a valid URL.

Key Properties

Property Description
Name for format attribute valid-url
Supported data types string
Programmatic fix None