pw.io.sqlite

Pathway provides both Input and Output connectors for SQLite.

Storage Classes

SQLite only has five native storage classes — NULL, INTEGER, REAL, TEXT and BLOB (see Datatypes In SQLite). Pathway has many more value variants, so types that lack a native mapping are stored as TEXT using the exact JSON encoding produced by pathway.io.jsonlines.write(). The same encoding is understood by the input connector, so a write / read pair round-trips every supported Pathway type losslessly.

Type Conversion (Input Connector)

The table below describes how SQLite column values are parsed into Pathway values, and what type to declare in your pw.Schema to receive them correctly.

SQLite storage parsed by the input connector

Pathway schema typeSQLite storage classEncoding / notes
intINTEGERAs-is.
floatREAL or INTEGERAs-is for REAL; INTEGER values are implicitly widened to f64 so integer-valued literals in a float column still parse.
boolINTEGER or TEXT0 / 1 in INTEGER; or PostgreSQL-style literals in TEXTtrue / false, yes / no, on / off, t / f, y / n, 1 / 0 (case-insensitive, surrounding whitespace ignored).
strTEXTUTF-8 text.
bytesBLOBRaw bytes.
pw.DateTimeNaiveTEXT%Y-%m-%dT%H:%M:%S%.f (ISO-8601, written by Pathway), or %Y-%m-%d %H:%M:%S%.f (SQL-92 with space separator, as produced by SQLite’s CURRENT_TIMESTAMP / datetime()). Fractional seconds are optional in both forms.
pw.DateTimeUtcTEXT%Y-%m-%dT%H:%M:%S%.f%z or %Y-%m-%d %H:%M:%S%.f%z — e.g. 2026-01-15T10:30:00+0000 or 2026-01-15 10:30:00+0000.
pw.DurationINTEGERNanoseconds.
pw.JsonTEXTA JSON document stored verbatim.
pw.PointerTEXTPathway’s base32 pointer encoding, e.g. ^Z5QKEQ….
tuple or list (generic)TEXTJSON array — each element is encoded as it would be in pathway.io.jsonlines.write() (e.g. bytes as base64, nested tuples as nested arrays).
np.ndarrayTEXTJSON object with a shape integer array and a flat elements array, row-major. Only int and float element types are supported.
pw.PyObjectWrapperTEXTBase64-encoded bincode payload produced by the output connector.
Any nullable columnNULL or any of the aboveNULL is read as None; non-null values follow the mapping above.

Type Conversion (Output Connector)

Pathway values are written back into the storage classes the input connector accepts — all types round-trip losslessly when the destination column has the encoding shown below.

Pathway types written by the output connector

Pathway typeSQLite storage classEncoding
boolINTEGER1 for True, 0 for False.
intINTEGERAs-is.
floatREALAs-is.
strTEXTUTF-8.
bytesBLOBRaw bytes.
pw.DateTimeNaiveTEXT%Y-%m-%dT%H:%M:%S.%9f — nanosecond precision.
pw.DateTimeUtcTEXT%Y-%m-%dT%H:%M:%S.%9f%z.
pw.DurationINTEGERNanoseconds.
pw.JsonTEXTSerialized JSON document.
pw.PointerTEXTPathway’s base32 pointer encoding.
tuple or list (generic)TEXTJSON array; leaf values encoded per the jsonlines rules.
np.ndarrayTEXTJSON object with a shape integer array and a flat elements array.
pw.PyObjectWrapperTEXTBase64-encoded bincode payload.
None (in optional columns)NULL

Output Connector: Initialization and Schema Checks

The output connector uses init_mode to decide what state the destination table should be in before the first write:

  • "default" — the table must already exist. The writer verifies at construction time that it is present and carries every column it will INSERT into (the columns of the Pathway table, plus time / diff in stream_of_changes mode). A ValueError is raised if the table is missing or its columns do not match.
  • "create_if_not_exists" — equivalent to a CREATE TABLE IF NOT EXISTS with columns derived from the Pathway table. If the SQLite table already exists, the same compatibility check as "default" runs.
  • "replace" — drops the existing SQLite table first, then recreates it from the Pathway table.

For every mode, the path must point at a valid SQLite 3 database (an empty or missing file is accepted — SQLite treats it as a new empty database that the writer may populate). Pointing the writer at a VIEW, index, or trigger is rejected at construction: SQLite does not accept direct INSERT into these objects.

Generated columns. Columns declared as GENERATED ALWAYS AS (...) (stored or virtual) cannot be written to. If the Pathway table has a column that matches a generated column in the SQLite table, the writer raises ValueError and asks you to drop that column from the Pathway table — the reader still returns its computed value on SELECT.

NOT NULL columns. When the destination SQLite table has an extra NOT NULL column without a DEFAULT that the Pathway table does not have, the writer raises ValueError naming the missing column(s). The INTEGER PRIMARY KEY rowid alias is exempt (SQLite auto-assigns it), but only for a single-column INTEGER PRIMARY KEY on a regular (rowid) table — WITHOUT ROWID tables and composite primary keys have no rowid alias, so every NOT NULL column there must be supplied. The mirror case is also caught: if a column is optional in the Pathway table but the matching destination column is NOT NULL and not the rowid alias, the writer rejects it — relax one side or the other.

Output Connector: Snapshot Mode

When output_table_type="snapshot", primary_key must list one or more columns of table that identify each row in the destination. It is required in snapshot mode and not allowed otherwise.

Primary-key columns must be non-nullable: SQLite’s INTEGER PRIMARY KEY silently replaces a NULL value with an auto-assigned rowid, which can collide with later UPSERTs and cause silent data loss. A ValueError is raised at write() time if any primary-key column is declared nullable.

When init_mode="default" or "create_if_not_exists" is used and the destination table already exists, it must carry a full (non-partial) PRIMARY KEY or UNIQUE constraint whose columns equal primary_key (compared case-insensitively, per SQLite’s identifier rules). Partial indexes (CREATE UNIQUE INDEX ... WHERE ...) and functional / expression-based unique indexes (CREATE UNIQUE INDEX ... ON t(expr)) are not accepted — SQLite refuses ON CONFLICT against them, so the writer rejects them at construction.

read(path, table_name, schema, *, autocommit_duration_ms=1500, name=None, max_backlog_size=None, debug_data=None)

sourceReads a table or view from a SQLite database. Both rowid tables and WITHOUT ROWID tables / views are supported — the latter require a primary key declared in the Pathway schema (see below).

The reader polls the database for changes and tracks each row by an identity so it can emit insertions, updates, and deletions. That identity is chosen from the Pathway schema passed as schema:

  • If the Pathway schema declares one or more columns with pw.column_definition(primary_key=True), those columns form the identity. The SQLite table must carry a matching PRIMARY KEY or UNIQUE constraint; otherwise the reader would silently conflate rows that share a key value, so pw.io.sqlite.read rejects the setup up-front. This mode is required for SQLite objects that don’t expose an implicit row id — e.g. WITHOUT ROWID tables and views; for views, which can’t carry constraints, the reader trusts the user’s declaration.
  • Otherwise, SQLite’s implicit _rowid_ column is used. pw.io.sqlite.read raises ValueError at call time if the target object has neither a primary key in the Pathway schema nor _rowid_. The same error fires when the target table has a user-defined column named rowid, _rowid_, or oid (case-insensitive — these are SQLite’s rowid aliases): the user column shadows the implicit alias, so the reader cannot fetch the integer rowid identity. Declare a primary key in the Pathway schema to read such tables.

The column names of the Pathway schema are verified against the target SQLite object’s columns (via PRAGMA table_xinfo, which exposes generated columns too) at connector construction; if the Pathway schema declares a column the SQLite table does not carry, the reader refuses to start and names the offending column(s) in the error. Identifier comparison is ASCII case-insensitive, matching SQLite’s own rules — e.g. declaring ID in the Pathway schema matches a table column named id. The path must point at a valid SQLite 3 database — the connector runs PRAGMA schema_version up-front and surfaces a clear error if the file is non-SQLite or encrypted with a key this connection doesn’t have. Empty files are NOT rejected here, because SQLite treats them as brand-new empty databases; in that case the reader raises ValueError because table_name does not exist in the (empty) database.

Datetime values in TEXT columns are accepted in either the ISO-8601 form the writer emits (YYYY-MM-DDTHH:MM:SS, optionally followed by .-prefixed fractional seconds, with a T separator) or the SQL-92 form SQLite itself produces via CURRENT_TIMESTAMP / datetime() (YYYY-MM-DD HH:MM:SS, optionally followed by .-prefixed fractional seconds, with a space separator). The reader normalizes the separator before parsing so pre-existing SQLite tables round-trip without a schema change.

Duplicate primary-key values within a single poll. A matching UNIQUE constraint does not fully rule these out: SQLite permits multiple NULL values in a UNIQUE column (its longstanding historical behavior), and rowid-less primary-key declarations on regular tables share the same leniency. When the reader sees two or more rows with identical primary-key values in one poll, the first row is tracked in the usual way and each subsequent duplicate is forwarded downstream as a per-row error: the primary-key columns on that event are replaced with an error value (the same marker that pw.fill_error and pw.global_error_log surface) naming the duplication, and the row is not added to the reader’s tracked snapshot. This way a single NULL (or any otherwise-legal repeat) still reaches the table while genuine collisions surface as visible parse errors without aborting the pipeline or dropping the rest of the batch.

Persistence is not supported. SQLite has no change-log history to replay, so a pipeline that uses pw.io.sqlite.read cannot be resumed from a Pathway snapshot. Enabling pw.persistence.Config against such a pipeline raises ValueError at startup.

  • Parameters
    • path (PathLike | str) – Path to the database file. If the path resolves to an existing directory (directly or via a symlink), a ValueError is raised at call time.
    • table_name (str) – Name of the table in the database to be read. SQLite resolves identifiers case-insensitively, so a mixed-case table_name (e.g. "Users") matches a table created as users.
    • schema (type[Schema]) – Pathway schema. Optionally annotate one or more columns with pw.column_definition(primary_key=True) to drive row-identity tracking (see above).
    • autocommit_duration_ms (int | None) – The maximum time between two commits. Every autocommit_duration_ms milliseconds, the updates received by the connector are committed and pushed into Pathway’s computation graph.
    • name (str | None) – A unique name for the connector. If provided, this name will be used in logs and monitoring dashboards.
    • max_backlog_size (int | None) – Limit on the number of entries read from the input source and kept in processing at any moment. Reading pauses when the limit is reached and resumes as processing of some entries completes. Useful with large sources that emit an initial burst of data to avoid memory spikes.
  • Returns
    Table – The table read.

write(table, path, table_name, *, max_batch_size=None, init_mode='default', output_table_type='stream_of_changes', primary_key=None, name=None)

sourceWrites table to a table in a SQLite database file. Two types of output tables are supported: stream of changes and snapshot.

When using stream of changes, the output table contains a log of every change seen in the Pathway table. Two extra INTEGER columns, time and diff, are appended: time is the minibatch timestamp and diff is 1 for an insertion or -1 for a deletion.

When using snapshot, the output table holds the current state of the Pathway table. Insertions are emitted as INSERT ... ON CONFLICT (primary_key) DO UPDATE SET ... and deletions as DELETE ... WHERE primary_key = ?, so the destination table always mirrors the logical contents of the Pathway table.

Values are encoded using the same storage-class mapping that pathway.io.sqlite.read() expects, so a write / read pair is a lossless round-trip for every supported Pathway type.

  • Parameters
    • table (Table) – The table to write. Its column names must be unique under SQLite’s ASCII-case-insensitive identifier rules — a pair like A / a is rejected at write() time because CREATE TABLE would treat them as the same destination column.
    • path (PathLike | str) – Path to the SQLite database file. The file is created if it does not exist. If the path resolves to an existing directory (directly or via a symlink), a ValueError is raised at call time.
    • table_name (str) – Name of the destination table. SQLite resolves identifiers case-insensitively, so a mixed-case name (e.g. "Users") targets a table created as users.
    • max_batch_size (int | None) – Optional upper bound on the number of rows buffered between flushes. Each batch is committed inside a single SQLite transaction.
    • init_mode (Literal['default', 'create_if_not_exists', 'replace']) – Controls how the destination SQLite table is initialized. "default" requires the SQLite table to already exist with columns matching the Pathway table; "create_if_not_exists" creates it if missing; "replace" drops and recreates it. See Output Connector: Initialization and Schema Checks above for the full set of compatibility checks.
    • output_table_type (Literal['stream_of_changes', 'snapshot']) – Defines how the output table manages its data. "stream_of_changes" (the default) appends every change with its time and diff metadata. "snapshot" maintains the current state of the table: +1 events become an UPSERT and -1 events become a DELETE.
    • primary_key (list[ColumnReference] | None) – One or more columns of table that form the primary key in the destination SQLite table. Required for snapshot mode and forbidden otherwise. See Output Connector: Snapshot Mode above for the rules on nullability and the matching destination constraint.
    • name (str | None) – A unique name for the connector. If provided, this name will be used in logs and monitoring dashboards.

Examples:

Stream of changes. Every event from the Pathway table is appended to the destination table together with its time and diff metadata, so the result is a complete log of insertions and deletions. The destination table is created on demand when init_mode allows it:

import pathway as pw
t = pw.debug.table_from_markdown('''
age | owner | pet
10  | Alice | dog
9   | Bob   | cat
8   | Alice | cat
''')
pw.io.sqlite.write(
    t,
    "pets.db",
    "pets",
    init_mode="create_if_not_exists",
)

The resulting pets table has the original columns plus the two INTEGER columns time and diff automatically added by the connector.

Snapshot. The destination table is kept in sync with the current state of the Pathway table: every +1 event UPSERTs on the primary key and every -1 event issues a DELETE against the matching row. A primary key must be supplied via primary_key:

pw.io.sqlite.write(
    t,
    "pets.db",
    "pets_snapshot",
    output_table_type="snapshot",
    primary_key=[t.owner, t.pet],
    init_mode="replace",
)

Here (owner, pet) is the primary key, so at any point in time the pets_snapshot table contains one row per live (owner, pet) pair — no history, no time / diff columns.