pathway.persistence package

class Backend(engine_data_storage, fs_path=None)

[source]

The settings of a backend, which is used to persist the computation state.

classmethod azure(root_path, account, password, container)

sourceConfigure the Azure Blob Storage backend.

  • Parameters
    • root_path (str) – path to the root in the Azure Blob Storage container, which will be used to store persisted data;
    • account (str) – account name for Azure Blob Storage;
    • password (str) – password for the specified account;
    • container (str) – container name to store the data in.
  • Returns
    Class instance denoting the Azure Blob Storage backend with root directory as root_path and connection settings given by the extra parameters.

classmethod filesystem(path)

sourceConfigure the filesystem backend.

  • Parameters
    path (str | PathLike[str]) – the path to the root directory in the file system, which will be used to store the persisted data.
  • Returns
    Class instance denoting the filesystem storage backend with root directory at path.

classmethod s3(root_path, bucket_settings)

sourceConfigure the S3 backend.

  • Parameters
    • root_path (str) – path to the root in the S3 storage, which will be used to store persisted data;
    • bucket_settings (AwsS3Settings) – the settings for S3 bucket connection in the same format as they are used by S3 connectors.
  • Returns
    Class instance denoting the S3 storage backend with root directory as root_path and connection settings given by bucket_settings.

class Config(backend, *, snapshot_interval_ms=0, snapshot_access=<pathway.engine.SnapshotAccess object>, persistence_mode=<pathway.engine.PersistenceMode object>, continue_after_replay=True)

[source]

Configure the data persistence. An instance of this class should be passed as a parameter to pw.run in case persistence is enabled.

  • Parameters
    • backend (Backend) – persistence backend configuration;
    • snapshot_interval_ms (int) – the desired duration between snapshot updates in milliseconds;

classmethod simple_config(backend, snapshot_interval_ms=0, snapshot_access=api.SnapshotAccess.FULL, persistence_mode=api.PersistenceMode.PERSISTING, continue_after_replay=True)

sourceConstruct config from a single instance of the Backend class, using this backend to persist metadata and snapshot.

Note that this method is deprecated and is left for the backward compatibility purposes only. Please use the pw.persistence.Config constructor instead.

  • Parameters
    • backend (Backend) – storage backend settings;
    • snapshot_interval_ms – the desired freshness of the persisted snapshot in milliseconds. The greater the value is, the more the amount of time that the snapshot may fall behind, and the less computational resources are required.
    • persistence_mode – Can be set to one of the following values. api.PersistenceMode.PERSISTING: the default value and means that all data will be persisted. When this parameter is specified, or when it is omitted, and the configuration is passed to pw.run, no additional actions are required to persist the state of your program. Alternatively, you can use api.PersistenceMode.UDF_CACHING meaning that only user-defined function (UDF) calls will be cached. The cache stores the mapping from function input parameters to their results, so if a function is called again with the same inputs, the cached result is returned.
  • Returns
    Persistence config.

Note:

api.PersistenceMode.UDF_CACHING currently works either when the File System is used as the backend for persistent storage, or, if another backend is used, a temporary directory will be created for writing the cache. In the latter case, persistence guarantees are not provided.

By default, api.PersistenceMode.UDF_CACHING does not persist data from input sources. This means that if the program restarts, it will re-read all input streams from the beginning. However, this behavior can be overridden by assigning names to specific input sources. If an input connector has a name parameter, the input stream for this source will also be persisted. Upon restart, the program will resume reading from the point where it previously stopped.