Skip to content

Document Store

Base module-attribute

Base = declarative_base()

EphemeralDocumentStore module-attribute

EphemeralDocumentStore = RealEphemeralDocumentStore

PageCoordinates module-attribute

PageCoordinates = namedtuple('PageCoordinates', ['doc_id', 'page_num'])

SQLDocument module-attribute

SQLDocument = RealSqlDocument

SQLMetadataStore module-attribute

SQLMetadataStore = RealSQLMetadataStore

Document dataclass

Document holds text and metadata of a document.

Examples of documents are PDFs, Word documents, etc. A collection of related text in an NLP application can be thought of a document as well.

id instance-attribute

id: str

metadata class-attribute instance-attribute

metadata: Dict[Any, Any] = Field(default_factory=dict)

pages instance-attribute

pages: Dict[int, str]

DocumentStoreBase

DocumentStoreBase(vector_db: VectorDBBase, path: Optional[str] = None)

Abstract class for a store that can store text, and metadata from documents.

The store can be queried by text for similar documents.

add_document abstractmethod

add_document(document: Document) -> None

Adds a document to the store.

Parameters:

Name Type Description Default
document Document

Document object to be added

required

Returns:

Type Description
None

None if the document was added successfully

add_text abstractmethod

add_text(text: str, meta: Dict[Any, Any]) -> str

Adds a text to the store. Args: text: Text to add. meta: Metadata to associate with the text.

Returns:

Type Description
str

The id of the text.

add_texts abstractmethod

add_texts(texts: Dict[str, Dict[Any, Any]]) -> List[str]

Adds a list of texts to the store. Args: texts: List of texts to add, and their associalted metadata. example: [{"I am feeling good", {"sentiment": "postive"}}]

Returns:

Type Description
List[str]

List of ids of the texts.

flush abstractmethod

flush()

Flushes the store to disk.

search abstractmethod

search(query: str, k: int = 4) -> List[Page]

Searches for pages which contain the text similar to the query.

Parameters:

Name Type Description Default
query str

Text to search for.

required
k int

Number of similar pages to return.

4

Returns:

Type Description
List[Page]

List[Pages] List of pages which contains similar texts

FallbackEphemeralDocumentStore

FallbackEphemeralDocumentStore()

FallbackSQLDocument

FallbackSQLDocument()

FallbackSQLMetadataStore

FallbackSQLMetadataStore()

Page dataclass

Page holds text and metadata of a page in a document.

It also containts the coordinates of the page in the document.

cordinates instance-attribute

cordinates: PageCoordinates

metadata instance-attribute

metadata: Dict[Any, Any]

text instance-attribute

text: str

RealEphemeralDocumentStore

RealEphemeralDocumentStore(vector_db: VectorDBBase, path: Optional[str] = None)

EphemeralDocumentStore is a document store that stores the documents on local disk and use a ephemeral vector store like Faiss.

Creates a new EphemeralDocumentStore.

Parameters:

Name Type Description Default
vector_db VectorDBBase

VectorDBBase instance to use for storing the vectors.

required
path Optional[str]

Path to the database file store metadata.

None

add_document

add_document(document: Document)

add_text

add_text(text: str, meta: Dict[Any, Any]) -> str

add_texts

add_texts(texts: Dict[str, Dict[Any, Any]]) -> List[str]

flush

flush(path: Optional[str] = None)

search

search(query: str, k: int = 4) -> List[Page]

search_with_threshold

search_with_threshold(query: str, threshold: float, k: int = 4) -> List[Page]

RealSQLMetadataStore

RealSQLMetadataStore(path: Optional[str] = None)

add_docs

add_docs(docs: List[Document], vdb_last_index: int)

get_pages_for_for_indexes

get_pages_for_for_indexes(indexes: List[int]) -> List[Page]

RealSqlDocument

__tablename__ class-attribute instance-attribute

__tablename__ = 'documents'

id class-attribute instance-attribute

id: Mapped[int] = mapped_column(primary_key=True)

meta class-attribute instance-attribute

meta: Mapped[dict] = mapped_column(sqlalchemy.PickleType)

page_num class-attribute instance-attribute

page_num: Mapped[int] = mapped_column(sqlalchemy.Integer, primary_key=True)

text class-attribute instance-attribute

text: Mapped[str] = mapped_column(sqlalchemy.String)

vector_index class-attribute instance-attribute

vector_index: Mapped[int] = mapped_column(sqlalchemy.Integer)