pw.xpacks.llm.parsers

A library for document parsers: functions that take raw bytes and return a list of text chunks along with their metadata.

class pw.xpacks.llm.parsers.ParseUnstructured(mode='single', post_processors=None, **unstructured_kwargs)

[source]
Parse document using [https://unstructured.io/](https://unstructured.io/).

All arguments can be overridden during UDF application.

  • Parameters
    • mode (-) – single, elements or paged. When single, each document is parsed as one long text string. When elements, each document is split into unstructured’s elements. When paged, each pages’s text is separately extracted.
    • post_processors (-) – list of callables that will be applied to all extracted texts.
    • **unstructured_kwargs (-) – extra kwargs to be passed to unstructured.io’s partition function

class pw.xpacks.llm.parsers.ParseUtf8(*, return_type=Ellipsis, deterministic=False, propagate_none=False, executor=AutoExecutor(), cache_strategy=None)

[source]