Validators
This module contains the validators for the Guardrails framework.
The name with which a validator is registered is the name that is used
in the RAIL
spec to specify formatters.
BugFreePython
Validates that there are no Python syntactic bugs in the generated code.
This validator checks for syntax errors by running ast.parse(code)
,
and will raise an exception if there are any.
Only the packages in the python
environment are available to the code snippet.
Key Properties
Property | Description |
---|---|
Name for format attribute |
bug-free-python |
Supported data types | string |
Programmatic fix | None |
BugFreeSQL
BugFreeSQL(conn: Optional[str] = None, schema_file: Optional[str] = None, on_fail: Optional[Callable] = None)
Validates that there are no SQL syntactic bugs in the generated code.
This is a very minimal implementation that uses the Pypi sqlvalidator
package
to check if the SQL query is valid. You can implement a custom SQL validator
that uses a database connection to check if the query is valid.
Key Properties
Property | Description |
---|---|
Name for format attribute |
bug-free-sql |
Supported data types | string |
Programmatic fix | None |
DetectSecrets
Validates whether the generated code snippet contains any secrets.
Key Properties
| Property | Description |
| ----------------------------- | --------------------------------- |
| Name for format
attribute | detect-secrets
|
| Supported data types | string
|
| Programmatic fix | None |
This validator uses the detect-secrets library to check whether the generated code snippet contains any secrets. If any secrets are detected, the validator fails and returns the generated code snippet with the secrets replaced with asterisks. Else the validator returns the generated code snippet.
Following are some caveats
- Multiple secrets on the same line may not be caught. e.g.
- Minified code
- One-line lists/dictionaries
- Multi-variable assignments
- Multi-line secrets may not be caught. e.g.
- RSA/SSH keys
Example
```py
guard = Guard.from_string(validators=[ DetectSecrets(on_fail="fix") ]) guard.parse( llm_output=code_snippet, )
EndpointIsReachable
Validates that a value is a reachable URL.
Key Properties
Property | Description |
---|---|
Name for format attribute |
is-reachable |
Supported data types | string , |
Programmatic fix | None |
EndsWith
Validates that a list ends with a given value.
Key Properties
Property | Description |
---|---|
Name for format attribute |
ends-with |
Supported data types | list |
Programmatic fix | Append the given value to the list. |
Arguments
Name | Type | Description | Default |
---|---|---|---|
end |
str
|
The required last element. |
required |
ExcludeSqlPredicates
Validates that the SQL query does not contain certain predicates.
Key Properties
Property | Description |
---|---|
Name for format attribute |
exclude-sql-predicates |
Supported data types | string |
Programmatic fix | None |
Arguments
Name | Type | Description | Default |
---|---|---|---|
predicates |
List[str]
|
The list of predicates to avoid. |
required |
ExtractedSummarySentencesMatch
ExtractedSummarySentencesMatch(threshold: float = 0.7, on_fail: Optional[Callable] = None, **kwargs: Optional[Dict[str, Any]])
Validates that the extracted summary sentences match the original text by performing a cosine similarity in the embedding space.
Key Properties
Property | Description |
---|---|
Name for format attribute |
extracted-summary-sentences-match |
Supported data types | string |
Programmatic fix | Remove any sentences that can not be verified. |
Parameters: Arguments
threshold: The minimum cosine similarity to be considered similar. Default to 0.7.
Other parameters: Metadata
filepaths (List[str]): A list of strings that specifies the filepaths for any documents that should be used for asserting the summary's similarity.
document_store (DocumentStoreBase, optional): The document store to use during validation. Defaults to EphemeralDocumentStore.
vector_db (VectorDBBase, optional): A vector database to use for embeddings. Defaults to Faiss.
embedding_model (EmbeddingBase, optional): The embeddig model to use. Defaults to OpenAIEmbedding.
ExtractiveSummary
Validates that a string is a valid extractive summary of a given document.
This validator does a fuzzy match between the sentences in the summary and the sentences in the document. Each sentence in the summary must be similar to at least one sentence in the document. After the validation, the summary is updated to include the sentences from the document that were matched, and the citations for those sentences are added to the end of the summary.
Key Properties
Property | Description |
---|---|
Name for format attribute |
extractive-summary |
Supported data types | string |
Programmatic fix | Remove any sentences that can not be verified. |
Parameters: Arguments
threshold: The minimum fuzz ratio to be considered summarized. Defaults to 85.
Other parameters: Metadata
filepaths (List[str]): A list of strings that specifies the filepaths for any documents that should be used for asserting the summary's similarity.
IsHighQualityTranslation
Using inpiredco.critique to check if a translation is high quality.
Key Properties
Property | Description |
---|---|
Name for format attribute |
is-high-quality-translation |
Supported data types | string |
Programmatic fix | None |
Metadata
Name | Type | Description |
---|---|---|
translation_source |
str
|
The source of the translation. |
IsProfanityFree
Validates that a translated text does not contain profanity language.
This validator uses the alt-profanity-check
package to check if a string
contains profanity language.
Key Properties
Property | Description |
---|---|
Name for format attribute |
is-profanity-free |
Supported data types | string |
Programmatic fix | None |
LowerCase
Validates that a value is lower case.
Key Properties
Property | Description |
---|---|
Name for format attribute |
lower-case |
Supported data types | string |
Programmatic fix | Convert to lower case. |
OneLine
Validates that a value is a single line or sentence.
Key Properties
Property | Description |
---|---|
Name for format attribute |
one-line |
Supported data types | string |
Programmatic fix | Pick the first line. |
ProvenanceV0
ProvenanceV0(threshold: float = 0.8, validation_method: str = 'sentence', on_fail: Optional[Callable] = None, **kwargs)
Validates that LLM-generated text matches some source text based on distance in embedding space.
Key Properties
Property | Description |
---|---|
Name for format attribute |
provenance-v0 |
Supported data types | string |
Programmatic fix | None |
Arguments
Name | Type | Description | Default |
---|---|---|---|
threshold |
float
|
The minimum cosine similarity between the generated text and the source text. Defaults to 0.8. |
0.8
|
validation_method |
str
|
Whether to validate at the sentence level or over the full text. Must be one of |
'sentence'
|
Metadata
Name | Type | Description |
---|---|---|
query_function |
Callable
|
A callable that takes a string and returns a list of (chunk, score) tuples. |
sources |
List[str]
|
The source text. |
embed_function |
Callable
|
A callable that creates embeddings for the sources. Must accept a list of strings and return an np.array of floats. |
In order to use this validator, you must provide either a query_function
or
sources
with an embed_function
in the metadata.
If providing query_function, it should take a string as input and return a list of (chunk, score) tuples. The chunk is a string and the score is a float representing the cosine distance between the chunk and the input string. The list should be sorted in ascending order by score.
Note: The score should represent distance in embedding space, not similarity. I.e., lower is better and the score should be 0 if the chunk is identical to the input string.
Example
If providing sources, it should be a list of strings. The embed_function should take a string or a list of strings as input and return a np array of floats. The vector should be normalized to unit length.
Example
ProvenanceV1
ProvenanceV1(validation_method: str = 'sentence', llm_callable: Union[str, Callable] = 'gpt-3.5-turbo', top_k: int = 3, max_tokens: int = 2, on_fail: Optional[Callable] = None, **kwargs)
Validates that the LLM-generated text is supported by the provided contexts.
This validator uses an LLM callable to evaluate the generated text against the provided contexts (LLM-ception).
In order to use this validator, you must provide either: 1. a 'query_function' in the metadata. That function should take a string as input (the LLM-generated text) and return a list of relevant chunks. The list should be sorted in ascending order by the distance between the chunk and the LLM-generated text.
Example using str callable
def query_function(text: str, k: int) -> List[str]: ... return ["This is a chunk", "This is another chunk"]
guard = Guard.from_string(validators=[ ProvenanceV1(llm_callable="gpt-3.5-turbo", ...) ] ) guard.parse( ... llm_output=..., ... metadata={"query_function": query_function} ... )
Example using a custom llm callable
def query_function(text: str, k: int) -> List[str]: ... return ["This is a chunk", "This is another chunk"]
guard = Guard.from_string(validators=[ ProvenanceV1(llm_callable=your_custom_callable, ...) ] ) guard.parse( ... llm_output=..., ... metadata={"query_function": query_function} ... )
OR
sources
with anembed_function
in the metadata. The embed_function should take a string or a list of strings as input and return a np array of floats. The vector should be normalized to unit length.
Example
```py def embed_function(text: Union[str, List[str]]) -> np.ndarray: return np.array([[0.1, 0.2, 0.3]])
guard = Guard.from_rail(...) guard( openai.ChatCompletion.create(...), prompt_params={...}, temperature=0.0, metadata={ "sources": ["This is a source text"], "embed_function": embed_function }, )
Parameters:
Name | Type | Description | Default |
---|---|---|---|
validation_method |
str
|
Whether to validate at the sentence level or over
the full text. One of |
'sentence'
|
llm_callable |
Union[str, Callable]
|
Either the name of the OpenAI model, or a callable that takes a prompt and returns a response. |
'gpt-3.5-turbo'
|
top_k |
int
|
The number of chunks to return from the query function. Defaults to 3. |
3
|
max_tokens |
int
|
The maximum number of tokens to send to the LLM. Defaults to 2. |
2
|
Metadata
Name | Type | Description |
---|---|---|
query_function |
Callable
|
A callable that takes a string and returns a list of chunks. |
sources |
List[str]
|
The source text. |
embed_function |
Callable
|
A callable that creates embeddings for the sources. Must accept a list of strings and returns float np.array. |
QARelevanceLLMEval
QARelevanceLLMEval(llm_callable: Optional[Callable] = None, on_fail: Optional[Callable] = None, **kwargs)
Validates that an answer is relevant to the question asked by asking the LLM to self evaluate.
Key Properties
Property | Description |
---|---|
Name for format attribute |
qa-relevance-llm-eval |
Supported data types | string |
Programmatic fix | None |
Metadata
Name | Type | Description |
---|---|---|
question |
str
|
The original question the llm was given to answer. |
ReadingTime
Validates that the a string can be read in less than a certain amount of time.
Key Properties
Property | Description |
---|---|
Name for format attribute |
reading-time |
Supported data types | string |
Programmatic fix | None |
Parameters: Arguments
reading_time: The maximum reading time.
RegexMatch
Validates that a value matches a regular expression.
Key Properties
Property | Description |
---|---|
Name for format attribute |
regex_match |
Supported data types | string |
Programmatic fix | Generate a string that matches the regular expression |
Arguments
Name | Type | Description | Default |
---|---|---|---|
regex |
str
|
Str regex pattern |
required |
match_type |
Optional[str]
|
Str in {"search", "fullmatch"} for a regex search or full-match option |
None
|
RemoveRedundantSentences
Removes redundant sentences from a string.
This validator removes sentences from a string that are similar to other sentences in the string. This is useful for removing repetitive sentences from a string.
Key Properties
Property | Description |
---|---|
Name for format attribute |
remove-redundant-sentences |
Supported data types | string |
Programmatic fix | Remove any redundant sentences. |
Parameters: Arguments
threshold: The minimum fuzz ratio to be considered redundant. Defaults to 70.
SaliencyCheck
SaliencyCheck(docs_dir: str, llm_callable: Optional[Callable] = None, on_fail: Optional[Callable] = None, threshold: float = 0.25, **kwargs)
Checks that the summary covers the list of topics present in the document.
Key Properties
Property | Description |
---|---|
Name for format attribute |
saliency-check |
Supported data types | string |
Programmatic fix | None |
Parameters: Arguments
docs_dir: Path to the directory containing the documents.
threshold: Threshold for overlap between topics in document and summary. Defaults to 0.25
Initialize the SalienceCheck validator.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
docs_dir |
str
|
Path to the directory containing the documents. |
required |
on_fail |
Optional[Callable]
|
Function to call when validation fails. |
None
|
threshold |
float
|
Threshold for overlap between topics in document and summary. |
0.25
|
SimilarToDocument
SimilarToDocument(document: str, threshold: float = 0.7, model: str = 'text-embedding-ada-002', on_fail: Optional[Callable] = None)
Validates that a value is similar to the document.
This validator checks if the value is similar to the document by checking the cosine similarity between the value and the document, using an embedding.
Key Properties
Property | Description |
---|---|
Name for format attribute |
similar-to-document |
Supported data types | string |
Programmatic fix | None |
Arguments
Name | Type | Description | Default |
---|---|---|---|
document |
str
|
The document to use for the similarity check. |
required |
threshold |
float
|
The minimum cosine similarity to be considered similar. Defaults to 0.7. |
0.7
|
model |
str
|
The embedding model to use. Defaults to text-embedding-ada-002. |
'text-embedding-ada-002'
|
SimilarToList
SimilarToList(standard_deviations: int = 3, threshold: float = 0.1, on_fail: Optional[Callable] = None, **kwargs)
Validates that a value is similar to a list of previously known values.
Key Properties
Property | Description |
---|---|
Name for format attribute |
similar-to-list |
Supported data types | string |
Programmatic fix | None |
Arguments
Name | Type | Description | Default |
---|---|---|---|
standard_deviations |
int
|
The number of standard deviations from the mean to check. |
3
|
threshold |
float
|
The threshold for the average semantic similarity for strings. |
0.1
|
For integer values, this validator checks whether the value lies within 'k' standard deviations of the mean of the previous values. (Assumes that the previous values are normally distributed.) For string values, this validator checks whether the average semantic similarity between the generated value and the previous values is less than a threshold.
SqlColumnPresence
Validates that all columns in the SQL query are present in the schema.
Key Properties
Property | Description |
---|---|
Name for format attribute |
sql-column-presence |
Supported data types | string |
Programmatic fix | None |
Arguments
Name | Type | Description | Default |
---|---|---|---|
cols |
List[str]
|
The list of valid columns. |
required |
TwoWords
Validates that a value is two words.
Key Properties
Property | Description |
---|---|
Name for format attribute |
two-words |
Supported data types | string |
Programmatic fix | Pick the first two words. |
UpperCase
Validates that a value is upper case.
Key Properties
Property | Description |
---|---|
Name for format attribute |
upper-case |
Supported data types | string |
Programmatic fix | Convert to upper case. |
ValidChoices
Validates that a value is within the acceptable choices.
Key Properties
Property | Description |
---|---|
Name for format attribute |
valid-choices |
Supported data types | all |
Programmatic fix | None |
Arguments
Name | Type | Description | Default |
---|---|---|---|
choices |
List[Any]
|
The list of valid choices. |
required |
ValidLength
ValidLength(min: Optional[int] = None, max: Optional[int] = None, on_fail: Optional[Callable] = None)
Validates that the length of value is within the expected range.
Key Properties
Property | Description |
---|---|
Name for format attribute |
length |
Supported data types | string , list , object |
Programmatic fix | If shorter than the minimum, pad with empty last elements. If longer than the maximum, truncate. |
Arguments
Name | Type | Description | Default |
---|---|---|---|
min |
Optional[int]
|
The inclusive minimum length. |
None
|
max |
Optional[int]
|
The inclusive maximum length. |
None
|
ValidRange
ValidRange(min: Optional[int] = None, max: Optional[int] = None, on_fail: Optional[Callable] = None)
Validates that a value is within a range.
Key Properties
Property | Description |
---|---|
Name for format attribute |
valid-range |
Supported data types | integer , float , percentage |
Programmatic fix | Closest value within the range. |
Arguments
Name | Type | Description | Default |
---|---|---|---|
min |
Optional[int]
|
The inclusive minimum value of the range. |
None
|
max |
Optional[int]
|
The inclusive maximum value of the range. |
None
|