pathway.persistence package
class Backend(engine_data_storage, fs_path=None)
[source]The settings of a backend, which is used to persist the computation state.
classmethod azure(root_path, account, password, container)
sourceConfigure the Azure Blob Storage backend.
- Parameters
- root_path (
str) – path to the root in the Azure Blob Storage container, which will be used to store persisted data; - account (
str) – account name for Azure Blob Storage; - password (
str) – password for the specified account; - container (
str) – container name to store the data in.
- root_path (
- Returns
Class instance denoting the Azure Blob Storage backend with root directory asroot_pathand connection settings given by the extra parameters.
classmethod filesystem(path)
sourceConfigure the filesystem backend.
- Parameters
path (str|PathLike[str]) – the path to the root directory in the file system, which will be used to store the persisted data. - Returns
Class instance denoting the filesystem storage backend with root directory atpath.
classmethod s3(root_path, bucket_settings)
sourceConfigure the S3 backend.
- Parameters
- root_path (
str) – path to the root in the S3 storage, which will be used to store persisted data; - bucket_settings (
AwsS3Settings) – the settings for S3 bucket connection in the same format as they are used by S3 connectors.
- root_path (
- Returns
Class instance denoting the S3 storage backend with root directory asroot_pathand connection settings given bybucket_settings.
class Config(backend, *, snapshot_interval_ms=0, snapshot_access=<pathway.engine.SnapshotAccess object>, persistence_mode=<pathway.engine.PersistenceMode object>, continue_after_replay=True)
[source]Configure the data persistence. An instance of this class should be passed as a parameter to pw.run in case persistence is enabled.
- Parameters
- backend (
Backend) – persistence backend configuration; - snapshot_interval_ms (
int) – the desired duration between snapshot updates in milliseconds;
- backend (
classmethod simple_config(backend, snapshot_interval_ms=0, snapshot_access=api.SnapshotAccess.FULL, persistence_mode=api.PersistenceMode.PERSISTING, continue_after_replay=True)
sourceConstruct config from a single instance of the Backend class, using this
backend to persist metadata and snapshot.
Note that this method is deprecated and is left for the backward compatibility purposes only. Please use the pw.persistence.Config constructor instead.
- Parameters
- backend (
Backend) – storage backend settings; - snapshot_interval_ms – the desired freshness of the persisted snapshot in milliseconds. The greater the value is, the more the amount of time that the snapshot may fall behind, and the less computational resources are required.
- persistence_mode – Can be set to one of the following values.
api.PersistenceMode.PERSISTING: the default value and means that all data will be persisted. When this parameter is specified, or when it is omitted, and the configuration is passed topw.run, no additional actions are required to persist the state of your program. Alternatively, you can useapi.PersistenceMode.UDF_CACHINGmeaning that only user-defined function (UDF) calls will be cached. The cache stores the mapping from function input parameters to their results, so if a function is called again with the same inputs, the cached result is returned.
- backend (
- Returns
Persistence config.
Note:
api.PersistenceMode.UDF_CACHING currently works either when the File System
is used as the backend for persistent storage, or, if another backend is used, a
temporary directory will be created for writing the cache. In the latter case,
persistence guarantees are not provided.
By default, api.PersistenceMode.UDF_CACHING does not persist data from input
sources. This means that if the program restarts, it will re-read all input streams
from the beginning. However, this behavior can be overridden by assigning names
to specific input sources. If an input connector has a name parameter, the input
stream for this source will also be persisted. Upon restart, the program will
resume reading from the point where it previously stopped.