pw.Table

The Live Data Framework is organized around work with data tables. This page contains reference for the Live Data Framework Table class.

class Table()

[source]

Collection of named columns over identical universes.

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown('''
age | owner | pet
10  | Alice | dog
9   | Bob   | dog
8   | Alice | cat
7   | Bob   | dog
''')
isinstance(t1, pw.Table)

property C: ColumnNamespace

Returns the namespace of all the columns of a joinable. Allows accessing column names that might otherwise be a reserved methods.

import pathway as pw
tab = pw.debug.table_from_markdown('''
age | owner | pet | filter
10  | Alice | dog | True
9   | Bob   | dog | True
8   | Alice | cat | False
7   | Bob   | dog | True
''')
isinstance(tab.C.age, pw.ColumnReference)

pw.debug.compute_and_print(tab.filter(tab.C.filter), include_id=False)

add_update_timestamp_utc(refresh_rate=Timedelta('0 days 00:00:01'), update_timestamp_column_name='updated_timestamp_utc')

sourceAdds a column with the UTC timestamp of the last row update

Parameters
- refresh_rate (pw.Duration, optional) – The interval at which the UTC timestamp is refreshed. Defaults to 1 second.
- update_timestamp_column_name (str, optional) – The name of the column to store the update timestamp. Defaults to “updated_timestamp_utc”.

Returns
pw.Table –
A new table with an additional column containing the UTC

  timestamp of the last update for each row. The id column is preserved.

assert_append_only()

sourceSets the append_only property of all columns from a table to True.

Sometimes the Pathway Live Data Framework can’t automatically deduce that a table is append only. If you know that the table is append-only (contains only insertions), you can tell the Pathway Live Data Framework about it by using this method. At runtime the Pathway Live Data Framework will check if the table is really append-only and exit with an error otherwise.

Returns
Table – A table with the same columns as the original one but with append_only property of the columns set to True.

Example:

import pathway as pw
t = pw.debug.table_from_markdown(
    '''
    a | b | __time__ | __diff__
    1 | 2 |    2     |    1
    3 | 4 |    2     |    1
    5 | 6 |    4     |    1
    3 | 4 |    4     |   -1
    3 | 5 |    4     |    1
    ''',
    id_from=["a"],
) # t is not append only due to the update (row with a=3)
t.is_append_only

t_filtered = t.filter(pw.this.a != 3)
t_append_only = t_filtered.assert_append_only()
t_append_only.is_append_only

await_futures()

sourceWaits for the results of asynchronous computation.

It strips the Future wrapper from table columns where applicable. In practice, it filters out the Pending values and produces a column with a data type that was the argument of Future.

Columns of Future data type are produced by fully asynchronous UDFs. Columns of this type can be propagated further, but can’t be used in most expressions (e.g. arithmetic operations). You can wait for their results using this method and later use the results in expressions you want.

Example:

import pathway as pw
import asyncio

t = pw.debug.table_from_markdown(
    '''
    a | b
    1 | 2
    3 | 4
    5 | 6
'''
)

@pw.udf(executor=pw.udfs.fully_async_executor())
async def long_running_async_function(a: int, b: int) -> int:
    c = a * b
    await asyncio.sleep(0.1 * c)
    return c


result = t.with_columns(res=long_running_async_function(pw.this.a, pw.this.b))
print(result.schema)


awaited_result = result.await_futures()
print(awaited_result.schema)

pw.debug.compute_and_print(awaited_result, include_id=False)

buffer(time_column, threshold)

sourceBuffers the values until the condition time_column <= max(time_column) - threshold is met.

This is a stateful operator. It stores the entries if their time_column > max(time_column) - threshold. Otherwise the entries can pass immediately. Once the current time (defined as max over all time_column values so far) advances and some of the stored entries start to satisfy the condition, they are sent for further processing.

Parameters
- time_column (ColumnExpression) – ColumnExpression that specifies the event time.
- threshold (Union[int, float, timedelta]) – value used to determine which entries are old enough to be sent for further processing. Should match the type of the time_column (int -> int, float -> float, datetime -> timedelta).

Example:

import pathway as pw
t = pw.debug.table_from_markdown(
    '''
    t | v | __time__
    1 | 1 |     2
    2 | 2 |     4
    5 | 3 |     6
    2 | 4 |     8
    7 | 5 |    10
'''
)
res = t.buffer(pw.this.t, 3)
pw.debug.compute_and_print_update_stream(res)

The values of processing time for rows with event time 5, 7 are equal to 18446744073709551614 because there’s no more input and they are released only at the end of the processing. 18446744073709551614 is the maximum possible time.

cast_to_types(**kwargs)

sourceCasts columns to types.

concat(*others)

sourceConcats self with every other ∊ others.

Semantics:

result.columns == self.columns == other.columns
result.id == self.id ∪ other.id

if self.id and other.id collide, throws an exception.

Requires:

other.columns == self.columns
self.id disjoint with other.id

Parameters
other – the other table.
Returns
Table – The concatenated table. Id’s of rows from original tables are preserved.

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown('''
  | age | owner | pet
1 | 10  | Alice | 1
2 | 9   | Bob   | 1
3 | 8   | Alice | 2
''')
t2 = pw.debug.table_from_markdown('''
   | age | owner | pet
11 | 11  | Alice | 30
12 | 12  | Tom   | 40
''')
pw.universes.promise_are_pairwise_disjoint(t1, t2)
t3 = t1.concat(t2)
pw.debug.compute_and_print(t3, include_id=False)

concat_reindex(*tables)

sourceConcatenate contents of several tables.

This is similar to PySpark union. All tables must have the same schema. Each row is reindexed.

Parameters
tables (Table) – List of tables to concatenate. All tables must have the same schema.
Returns
Table – The concatenated table. It will have new, synthetic ids.

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown('''
  | pet
1 | Dog
7 | Cat
''')
t2 = pw.debug.table_from_markdown('''
  | pet
1 | Manul
8 | Octopus
''')
t3 = t1.concat_reindex(t2)
pw.debug.compute_and_print(t3, include_id=False)

copy()

sourceReturns a copy of a table.

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown('''
age | owner | pet
10  | Alice | dog
9   | Bob   | dog
8   | Alice | cat
7   | Bob   | dog
''')
t2 = t1.copy()
pw.debug.compute_and_print(t2, include_id=False)

t1 is t2

deduplicate(*, value, instance=None, acceptor, name=None)

sourceDeduplicates rows in self on value column using acceptor function.

It keeps rows which where accepted by the acceptor function. Acceptor operates on two arguments - CURRENT value and PREVIOUS value.

Parameters
- value (Union[ColumnExpression, None, int, float, str, bytes, bool, Pointer, datetime, timedelta, ndarray, Json, dict[str, Any], tuple[Any, ...], Error, Pending]) – column expression used for deduplication.
- instance (ColumnExpression | None) – Grouping column. For rows with different values in this column, deduplication will be performed separately. Defaults to None.
- acceptor (Callable[[TypeVar(T, bound= Union[None, int, float, str, bytes, bool, Pointer, datetime, timedelta, ndarray, Json, dict[str, Any], tuple[Any, ...], Error, Pending]), TypeVar(T, bound= Union[None, int, float, str, bytes, bool, Pointer, datetime, timedelta, ndarray, Json, dict[str, Any], tuple[Any, ...], Error, Pending])], bool]) – callback telling whether two values are different.
- name (str | None) – An identifier, under which the state of the table will be persisted or None, if there is no need to persist the state of this table. When a program restarts, it restores the state for all input tables according to what was saved for their name. This way it’s possible to configure the start of computations from the moment they were terminated last time.
Returns
Table – the result of deduplication.

Example:

import pathway as pw
table = pw.debug.table_from_markdown(
    '''
    val | __time__
     1  |     2
     2  |     4
     3  |     6
     4  |     8
'''
)

def acceptor(new_value, old_value) -> bool:
    return new_value >= old_value + 2


result = table.deduplicate(value=pw.this.val, acceptor=acceptor)
pw.debug.compute_and_print_update_stream(result, include_id=False)


table = pw.debug.table_from_markdown(
    '''
    val | instance | __time__
     1  |     1    |     2
     2  |     1    |     4
     3  |     2    |     6
     4  |     1    |     8
     4  |     2    |     8
     5  |     1    |    10
'''
)

def acceptor(new_value, old_value) -> bool:
    return new_value >= old_value + 2


result = table.deduplicate(
    value=pw.this.val, instance=pw.this.instance, acceptor=acceptor
)
pw.debug.compute_and_print_update_stream(result, include_id=False)

diff(timestamp, *values, instance=None)

sourceCompute the difference between the values in the values columns and the previous values according to the order defined by the column timestamp.

Parameters
- timestamp (pw.ColumnReference[int | float | datetime | str | bytes]) – The column reference to the timestamp column on which the order is computed.
- *values (pw.ColumnReference[int | float | datetime]) – Variable-length argument representing the column references to the values columns.
- instance (pw.ColumnReference) – Can be used to group the values. The difference is only computed between rows with the same instance value.
Returns
Table – A new table where each column is replaced with a new column containing the difference and whose name is the concatenation of diff_ and the former name.
Raises
ValueError – If the columns are not ColumnReference.

NOTE: * The value of the “first” value (the row with the lowest value in the timestamp column) is None.

Example:

import pathway as pw
table = pw.debug.table_from_markdown('''
timestamp | values
1         | 1
2         | 2
3         | 4
4         | 7
5         | 11
6         | 16
''')
table += table.diff(pw.this.timestamp, pw.this.values)
pw.debug.compute_and_print(table, include_id=False)

table = pw.debug.table_from_markdown(
    '''
timestamp | instance | values
1         | 0        | 1
2         | 1        | 2
3         | 1        | 4
3         | 0        | 7
6         | 1        | 11
6         | 0        | 16
'''
)
table += table.diff(pw.this.timestamp, pw.this.values, instance=pw.this.instance)
pw.debug.compute_and_print(table, include_id=False)

difference(other)

sourceRestrict self universe to keys not appearing in the other table.

Parameters
other (Table) – table with ids to remove from self.
Returns
Table – table with restricted universe, with the same set of columns

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown('''
  | age  | owner  | pet
1 | 10   | Alice  | 1
2 | 9    | Bob    | 1
3 | 8    | Alice  | 2
''')
t2 = pw.debug.table_from_markdown('''
  | cost
2 | 100
3 | 200
4 | 300
''')
t3 = t1.difference(t2)
pw.debug.compute_and_print(t3, include_id=False)

empty()

sourceCreates an empty table with a schema specified by kwargs.

Parameters
kwargs (DType) – Dict whose keys are column names and values are column types.
Returns
Table – Created empty table.

Example:

import pathway as pw
t1 = pw.Table.empty(age=float, pet=float)
pw.debug.compute_and_print(t1, include_id=False)

filter(filter_expression)

sourceFilter a table according to filter_expression condition.

Parameters
filter_expression (ColumnExpression) – ColumnExpression that specifies the filtering condition.
Returns
Table – Result has the same schema as self and its ids are subset of self.id.

Example:

import pathway as pw
vertices = pw.debug.table_from_markdown('''
label outdegree
    1         3
    7         0
''')
filtered = vertices.filter(vertices.outdegree == 0)
pw.debug.compute_and_print(filtered, include_id=False)

filter_out_results_of_forgetting(ensure_consistency=False)

sourceRemove all row-deletion events from the table that were produced by the forget method.

This method has an effect only if forget was previously called with mark_forgetting_records parameter set to True. Only the deletions that are triggered by forgetting will be removed.

Parameters
- ensure_consistency (bool) – When enabled, the Pathway Live Data Framework keeps track of the latest value for
- removed (each key. This ensures that when entries emitted by forgetting are) –

: the sequence of remaining additions and deletions stays consistent.: For example: if an entry is removed due to forgetting and another entry with: the same key appears afterward: the stream would normally have two additions: for the same key: which is inconsistent. With the flag enabled: the Pathway Live Data Framework: tracks the state of each key. It will emit a deletion before the second: addition: guaranteeing that the stream remains consistent. Note that this: feature uses additional memory to store the current snapshot of the table.: If your data and use case guarantee that such inconsistencies won’t occur: : you can leave this check disabled.:

Note:

Using forget with a set mark_forgetting_records immediately followed by filter_out_results_of_forgetting is effectively a no-op.

The first call produces a table that temporarily contains both original and “forgotten” records, each forgotten record appears as an event with the diff equal to -1. The second call removes those deletion events and restores the table to its original state.

The method is, however, useful when you perform intermediate computations between these two calls. For example, you can call forget with a certain time window to limit the scope of processing, effectively creating a bounded window of data. Within that window, you can perform computations that benefit from this limited dataset. After those computations, calling filter_out_results_of_forgetting removes all deletion events and restores the table to a consistent state in which the previous forgetting operation is undone, and you have the complete set of rows, no longer limited to the forgetting window.

This approach lets you compute metrics inside a bounded window and then continue processing the entire data stream without carrying forward deletions for old records. Downstream consumers will receive fewer events because only insertions are propagated further.

flatten(to_flatten, *, origin_id=None)

sourcePerforms a flatmap operation on a column or expression given as a first argument. Datatype of this column or expression has to be iterable or Json array. Other columns of the table are duplicated as many times as the length of the iterable.

It is possible to get ids of source rows by passing origin_id argument, which is a new name of the column with the source ids.

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown('''
  | pet  |  age
1 | Dog  |   2
7 | Cat  |   5
''')
t2 = t1.flatten(t1.pet)
pw.debug.compute_and_print(t2, include_id=False)

forget(time_column, threshold, mark_forgetting_records=False)

sourceRemove old entries when they start to satisfy time_column <= max(time_column) - threshold.

This operator is useful for removing old entries from the stateful operators downstream (like joins, groupbys etc.). It stores the entries and when the current time (defined as max over all time_column values so far) reaches their time plus threshold, a deletion of entries is emitted.

Parameters
- time_column (ColumnExpression) – ColumnExpression that specifies the event time.
- threshold (Union[int, float, timedelta]) – value used to determine which entries are old enough to be removed. Should match the type of the time_column (int -> int, float -> float, datetime -> timedelta).
- mark_forgetting_records (bool) – If set to True, Pathway Live Data Framework marks records corresponding to the deletion of expired entries in a special way, without changing their visible representation. This flag is useful when combined with filter_out_results_of_forgetting, which can later remove those marked deletion records. In other words, it allows you to revert the effects of forgetting at a later stage.

Example:

import pathway as pw
t = pw.debug.table_from_markdown(
    '''
    t | v | __time__
    1 | 1 |     2
    2 | 1 |     2
    4 | 2 |     4
    3 | 3 |     6
'''
)
t_with_forgetting = t.forget(pw.this.t, 3)
s = pw.debug.table_from_markdown(
    '''
  v | a |  __time__
  1 | 1 |      2
  2 | 2 |      4
  1 | 3 |      8
'''
)
res = t_with_forgetting.join(s, pw.left.v == pw.right.v).select(
    pw.left.t, pw.left.v, pw.right.a
)
pw.debug.compute_and_print_update_stream(res)

The entry t=1,v=1 is forgotten at the processing time 6. It gets removed from the join. When at the processing time 8, there’s a new entry with the join key equal to 1, it only gets joined with t=2,v=1 entry because the other entry was already removed.

The removal of t=1,v=1 entry resulted in the retraction of all its results from a join (only t=1,v=1,a=1 in this case). If you would like to filter out retractions, you can do to_stream().filter(pw.this.is_upsert) on the result of a join.

For cases where you don’t need to permanently forget data across the entire pipeline, but only want to temporarily limit the dataset to a specific time window for a computation, and then return to processing the full data stream, you can use the parameter mark_forgetting_records set to True to achieve this.

For example:

t_with_forgetting = t.forget(pw.this.t, 3)
# You computation on a t_with_forgetting, bounded by the 3 time units
t = t_with_forgetting.filter_out_results_of_forgetting()

This way, your table will be temporarily windowed, computations can be applied, and then the stream will return to its normal state.

from_columns(**kwargs)

sourceBuild a table from columns.

All columns must have the same ids. Columns’ names must be pairwise distinct.

Parameters
- args (ColumnReference) – List of columns.
- kwargs (ColumnReference) – Columns with their new names.
Returns
Table – Created table.

Example:

import pathway as pw
t1 = pw.Table.empty(age=float, pet=float)
t2 = pw.Table.empty(foo=float, bar=float).with_universe_of(t1)
t3 = pw.Table.from_columns(t1.pet, qux=t2.foo)
pw.debug.compute_and_print(t3, include_id=False)

from_streams(deletion_stream)

sourceConverts streams of changes (updates and deletions) into a table.

This method reconstructs the current state of the table from such streams by applying the updates and deletions in order. It is a stateful operation: the operator keeps track of the latest value for each id. If there are multiple events for a single id in a single batch in the input streams, the order of applying the actions is not specified.

Parameters
- self – A stream with updates (insertions or modifications).
- deletion_stream (Table) – A stream with deletions. Only ids in this stream are important. The columns don’t have to be compatible with the updates stream.
Returns
Table – A table with the same columns as the updates stream, representing the current state.

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown(
    '''
id | pet | age | __time__
 1 | cat |  3  |     2
 2 | dog | 11  |     2
 1 | cat | 4   |     4
'''
)
t2 = pw.debug.table_from_markdown(
    '''
id | pet | __time__
 2 | dog |     4
'''
)
t3 = pw.Table.from_streams(t1, t2)
pw.debug.compute_and_print_update_stream(t3, include_id=False)

groupby(*args, id=None, sort_by=None, instance=None, )

sourceGroups table by columns from args.

NOTE: Usually followed by .reduce() that aggregates the result and returns a table.

Parameters
- args (ColumnReference) – columns to group by.
- id (ColumnReference | None) – if provided, is the column used to set id’s of the rows of the result
- sort_by (ColumnReference | None) – if provided, column values are used as sorting keys for particular reducers
- instance (ColumnReference | None) – optional argument describing partitioning of the data into separate instances
Returns
GroupedTable – Groupby object.

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown('''
age | owner | pet
10  | Alice | dog
9   | Bob   | dog
8   | Alice | cat
7   | Bob   | dog
''')
t2 = t1.groupby(t1.pet, t1.owner).reduce(t1.owner, t1.pet, ageagg=pw.reducers.sum(t1.age))
pw.debug.compute_and_print(t2, include_id=False)

property id: ColumnReference

Get reference to pseudocolumn containing id’s of a table.

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown('''
age | owner | pet
10  | Alice | dog
9   | Bob   | dog
8   | Alice | cat
7   | Bob   | dog
''')
t2 = t1.select(ids = t1.id)
t2.typehints()['ids']

pw.debug.compute_and_print(t2.select(test=t2.id == t2.ids), include_id=False)

ignore_late(time_column, threshold)

sourceFilter out entries that satisfy time_column <= max(time_column) - threshold.

In contrast to forget, this operator doesn’t store the entries. It just checks if the entries match the condition and, if they do, allows them to pass. The only value stored by this operator is the current time (defined as max over all time_column values so far).

Please note that if the table is non-append-only and there’s a difference in processing time between an insertion and a deletion for some key, the insertion may pass through but the deletion may be filtered out. It’ll happen if the max value in time_column advanced between the insertion and deletion and the insertion didn’t satisfy the filtering-out criterion but the deletion did.

Parameters
- time_column (ColumnExpression) – ColumnExpression that specifies the event time.
- threshold (Union[int, float, timedelta]) – value used to determine which entries should be filtered out. Should match the type of the time_column (int -> int, float -> float, datetime -> timedelta).

Example:

import pathway as pw
t = pw.debug.table_from_markdown(
    '''
    t | v | __time__
    1 | 1 |     2
    2 | 2 |     4
    5 | 3 |     6
    2 | 4 |     8
    7 | 5 |    10
'''
)
res = t.ignore_late(pw.this.t, 3)
pw.debug.compute_and_print_update_stream(res)

inactivity_detection(allowed_inactivity_period, refresh_rate=Timedelta('0 days 00:00:01'), instance=None)

sourceMonitor append only table additions to detect inactivity periods and identify when activity resumes, optionally with instance argument.

This function periodically checks for table additions according to the provided refresh rate. It is limited to append only tables since the function is mostly intended to monitor input data streams. Inactivity periods that exceed the specified threshold are reported. The output table lists the inactivity periods with the UTC timestamp of the last detected activity before the threshold was exceeded and the UTC timestamp of the first detected activity that ends the inactivity period, or None if the inactivity period not yet ended.

Note: the inactivity period limits may differ from the actual values when the refresh rate is lower than the table update rate. It is also assumed that the system latency is neglectable compared to the specified threshold. When used with instance, an inactivity period since the stream start (i.e. no incoming data) is reported with a None value in the instance column.

Parameters
- allowed_inactivity_period (pw.Duration) – maximum allowed inactivity duration. If no activity occurs within this duration, an inactivity period is flagged.
- refresh_rate (pw.Duration, optional) – frequency with which table activities are checked to detect an inactivity period. Defaults to 1 second.
- instance (pw.ColumnExpression | None, optional) – group column to detect inactivity periods separately. Defaults to None.
Returns
Table – inactivity periods table with inactivity_timestamp_utc and resumed_activity_timestamp_utc columns, optionally instance column.

interpolate(timestamp, *values, mode=InterpolateMode.LINEAR)

sourceInterpolates missing values in a column using the previous and next values based on a timestamps column.

Parameters
- timestamp (ColumnReference) – Reference to the column containing timestamps.
- *values (ColumnReference) – References to the columns containing values to be interpolated.
- mode (InterpolateMode, optional) – The interpolation mode. Currently, only InterpolateMode.LINEAR is supported. Default is InterpolateMode.LINEAR.
Returns
Table – A new table with the interpolated values.
Raises
ValueError – If the columns are not ColumnReference or if the interpolation mode is not supported.

NOTE: * The interpolation is performed based on linear interpolation between the previous and next values.

If a value is missing at the beginning or end of the column, no interpolation is performed.

Example:

import pathway as pw
table = pw.debug.table_from_markdown('''
timestamp | values_a | values_b
1         | 1        | 10
2         |          |
3         | 3        |
4         |          |
5         |          |
6         | 6        | 60
''')
table = table.interpolate(pw.this.timestamp, pw.this.values_a, pw.this.values_b)
pw.debug.compute_and_print(table, include_id=False)

intersect(*tables)

sourceRestrict self universe to keys appearing in all of the tables.

Parameters
tables (Table) – tables keys of which are used to restrict universe.
Returns
Table – table with restricted universe, with the same set of columns

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown('''
  | age  | owner  | pet
1 | 10   | Alice  | 1
2 | 9    | Bob    | 1
3 | 8    | Alice  | 2
''')
t2 = pw.debug.table_from_markdown('''
  | cost
2 | 100
3 | 200
4 | 300
''')
t3 = t1.intersect(t2)
pw.debug.compute_and_print(t3, include_id=False)

ix(expression, *, optional=False, context=None, allow_misses=False)

sourceReindexes the table using expression values as keys. Uses keys from context, or tries to infer proper context from the expression. If optional is True, then None in expression values result in None values in the result columns. Missing values in table keys result in RuntimeError. If allow_misses is set to True, they result in None value on the output.

Context can be anything that allows for select or reduce, or pathway.this construct (latter results in returning a delayed operation, and should be only used when using ix inside join().select() or groupby().reduce() sequence).

Returns
Reindexed table with the same set of columns.

Example:

import pathway as pw
t_animals = pw.debug.table_from_markdown('''
  | epithet    | genus
1 | upupa      | epops
2 | acherontia | atropos
3 | bubo       | scandiacus
4 | dynastes   | hercules
''')
t_birds = pw.debug.table_from_markdown('''
  | desc
2 | hoopoe
4 | owl
''')
ret = t_birds.select(t_birds.desc, latin=t_animals.ix(t_birds.id).genus)
pw.debug.compute_and_print(ret, include_id=False)

ix_ref(*args, optional=False, context=None, instance=None, allow_misses=False)

sourceReindexes the table using expressions as primary keys. Uses keys from context, or tries to infer proper context from the expression. If optional is True, then None in expression values result in None values in the result columns. Missing values in table keys result in RuntimeError. If allow_misses is set to True, they result in None value on the output.

Parameters
args (Union[ColumnExpression, None, int, float, str, bytes, bool, Pointer, datetime, timedelta, ndarray, Json, dict[str, Any], tuple[Any, ...], Error, Pending]) – Column references.
Returns
Row – indexed row.

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown('''
name   | pet
Alice  | dog
Bob    | cat
Carole | cat
David  | dog
''')
t2 = t1.with_id_from(pw.this.name)
t2 = t2.select(*pw.this, new_value=pw.this.ix_ref("Alice").pet)
pw.debug.compute_and_print(t2, include_id=False)

Tables obtained by a groupby/reduce scheme always have primary keys:

import pathway as pw
t1 = pw.debug.table_from_markdown('''
name   | pet
Alice  | dog
Bob    | cat
Carole | cat
David  | cat
''')
t2 = t1.groupby(pw.this.pet).reduce(pw.this.pet, count=pw.reducers.count())
t3 = t1.select(*pw.this, new_value=t2.ix_ref(t1.pet).count)
pw.debug.compute_and_print(t3, include_id=False)

Single-row tables can be accessed via ix_ref():

import pathway as pw
t1 = pw.debug.table_from_markdown('''
name   | pet
Alice  | dog
Bob    | cat
Carole | cat
David  | cat
''')
t2 = t1.reduce(count=pw.reducers.count())
t3 = t1.select(*pw.this, new_value=t2.ix_ref(context=t1).count)
pw.debug.compute_and_print(t3, include_id=False)

join(other, *on, id=None, how=JoinMode.INNER, left_instance=None, right_instance=None, left_exactly_once=False, right_exactly_once=False)

sourceJoin self with other using the given join expression.

Parameters
- other (Joinable) – the right side of the join, Table or JoinResult.
- on (ColumnExpression) – a list of column expressions. Each must have == as the top level operation and be of the form LHS: ColumnReference == RHS: ColumnReference.
- id (ColumnReference | None) – optional argument for id of result, can be only self.id or other.id
- how (JoinMode) – by default, inner join is performed. Possible values are JoinMode.{INNER,LEFT,RIGHT,OUTER} correspond to inner, left, right and outer join respectively.
- left_instance/right_instance – optional arguments describing partitioning of the data into separate instances
- left_exactly_once (bool) – if you can guarantee that each row on the left side of the join will be joined at most once, then you can set this parameter to True. Then each row after getting a match is removed from the join state. As a result, less memory is needed. Works only for append-only tables.
- right_exactly_once (bool) – if you can guarantee that each row on the right side of the join will be joined at most once, then you can set this parameter to True. Then each row after getting a match is removed from the join state. As a result, less memory is needed. Works only for append-only tables.
Returns
JoinResult – an object on which .select() may be called to extract relevant columns from the result of the join.

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown('''
age  | owner  | pet
 10  | Alice  | 1
  9  | Bob    | 1
  8  | Alice  | 2
''')
t2 = pw.debug.table_from_markdown('''
age  | owner  | pet | size
 10  | Alice  | 3   | M
 9   | Bob    | 1   | L
 8   | Tom    | 1   | XL
''')
t3 = t1.join(
    t2, t1.pet == t2.pet, t1.owner == t2.owner, how=pw.JoinMode.INNER
).select(age=t1.age, owner_name=t2.owner, size=t2.size)
pw.debug.compute_and_print(t3, include_id = False)

join_inner(other, *on, id=None, left_instance=None, right_instance=None, left_exactly_once=False, right_exactly_once=False)

sourceInner-joins two tables or join results.

Parameters
- other (Joinable) – the right side of the join, Table or JoinResult.
- on (ColumnExpression) – a list of column expressions. Each must have == as the top level operation and be of the form LHS: ColumnReference == RHS: ColumnReference.
- id (ColumnReference | None) – optional argument for id of result, can be only self.id or other.id
- left_instance/right_instance – optional arguments describing partitioning of the data into separate instances
- left_exactly_once (bool) – if you can guarantee that each row on the left side of the join will be joined at most once, then you can set this parameter to True. Then each row after getting a match is removed from the join state. As a result, less memory is needed. Works only for append-only tables.
- right_exactly_once (bool) – if you can guarantee that each row on the right side of the join will be joined at most once, then you can set this parameter to True. Then each row after getting a match is removed from the join state. As a result, less memory is needed. Works only for append-only tables.
Returns
JoinResult – an object on which .select() may be called to extract relevant columns from the result of the join.

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown('''
age  | owner  | pet
 10  | Alice  | 1
  9  | Bob    | 1
  8  | Alice  | 2
''')
t2 = pw.debug.table_from_markdown('''
age  | owner  | pet | size
 10  | Alice  | 3   | M
 9   | Bob    | 1   | L
 8   | Tom    | 1   | XL
''')
t3 = t1.join_inner(t2, t1.pet == t2.pet, t1.owner == t2.owner).select(
    age=t1.age, owner_name=t2.owner, size=t2.size
)
pw.debug.compute_and_print(t3, include_id = False)

join_left(other, *on, id=None, left_instance=None, right_instance=None, left_exactly_once=False, right_exactly_once=False)

sourceLeft-joins two tables or join results.

Parameters
- other (Joinable) – the right side of the join, Table or JoinResult.
- *on (ColumnExpression) – Columns to join, syntax self.col1 == other.col2
- id (ColumnReference | None) – optional id column of the result
- left_instance/right_instance – optional arguments describing partitioning of the data into separate instances
- left_exactly_once (bool) – if you can guarantee that each row on the left side of the join will be joined at most once, then you can set this parameter to True. Then each row after getting a match is removed from the join state. As a result, less memory is needed. Works only for append-only tables.
- right_exactly_once (bool) – if you can guarantee that each row on the right side of the join will be joined at most once, then you can set this parameter to True. Then each row after getting a match is removed from the join state. As a result, less memory is needed. Works only for append-only tables.

Remarks: args cannot contain id column from either of tables, as the result table has id column with auto-generated ids; it can be selected by assigning it to a column with defined name (passed in kwargs)

Behavior:

for rows from the left side that were not matched with the right side, missing values on the right are replaced with None
rows from the right side that were not matched with the left side are skipped
for rows that were matched the behavior is the same as that of an inner join.

Returns
JoinResult – an object on which .select() may be called to extract relevant columns from the result of the join.

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown(
    '''
        | a  | b
      1 | 11 | 111
      2 | 12 | 112
      3 | 13 | 113
      4 | 13 | 114
    '''
)
t2 = pw.debug.table_from_markdown(
    '''
        | c  | d
      1 | 11 | 211
      2 | 12 | 212
      3 | 14 | 213
      4 | 14 | 214
    '''
)
pw.debug.compute_and_print(t1.join_left(t2, t1.a == t2.c
).select(t1.a, t2_c=t2.c, s=pw.require(t1.b + t2.d, t2.id)),
include_id=False)

join_outer(other, *on, id=None, left_instance=None, right_instance=None, left_exactly_once=False, right_exactly_once=False)

sourceOuter-joins two tables or join results.

Parameters
- other (Joinable) – the right side of the join, Table or JoinResult.
- *on (ColumnExpression) – Columns to join, syntax self.col1 == other.col2
- id (ColumnReference | None) – optional id column of the result
- instance – optional argument describing partitioning of the data into separate instances
- left_exactly_once (bool) – if you can guarantee that each row on the left side of the join will be joined at most once, then you can set this parameter to True. Then each row after getting a match is removed from the join state. As a result, less memory is needed. Works only for append-only tables.
- right_exactly_once (bool) – if you can guarantee that each row on the right side of the join will be joined at most once, then you can set this parameter to True. Then each row after getting a match is removed from the join state. As a result, less memory is needed. Works only for append-only tables.

Behavior:

for rows from the left side that were not matched with the right side, missing values on the right are replaced with None
for rows from the right side that were not matched with the left side, missing values on the left are replaced with None
for rows that were matched the behavior is the same as that of an inner join.

Returns
JoinResult – an object on which .select() may be called to extract relevant columns from the result of the join.

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown(
    '''
        | a  | b
      1 | 11 | 111
      2 | 12 | 112
      3 | 13 | 113
      4 | 13 | 114
    '''
)
t2 = pw.debug.table_from_markdown(
    '''
        | c  | d
      1 | 11 | 211
      2 | 12 | 212
      3 | 14 | 213
      4 | 14 | 214
    '''
)
pw.debug.compute_and_print(t1.join_outer(t2, t1.a == t2.c
).select(t1.a, t2_c=t2.c, s=pw.require(t1.b + t2.d, t1.id, t2.id)),
include_id=False)

join_right(other, *on, id=None, left_instance=None, right_instance=None, left_exactly_once=False, right_exactly_once=False)

sourceOuter-joins two tables or join results.

Parameters
- other (Joinable) – the right side of the join, Table or JoinResult.
- *on (ColumnExpression) – Columns to join, syntax self.col1 == other.col2
- id (ColumnReference | None) – optional id column of the result
- left_instance/right_instance – optional arguments describing partitioning of the data into separate instances
- left_exactly_once (bool) – if you can guarantee that each row on the left side of the join will be joined at most once, then you can set this parameter to True. Then each row after getting a match is removed from the join state. As a result, less memory is needed. Works only for append-only tables.
- right_exactly_once (bool) – if you can guarantee that each row on the right side of the join will be joined at most once, then you can set this parameter to True. Then each row after getting a match is removed from the join state. As a result, less memory is needed. Works only for append-only tables.

Behavior:

rows from the left side that were not matched with the right side are skipped
for rows from the right side that were not matched with the left side, missing values on the left are replaced with None
for rows that were matched the behavior is the same as that of an inner join.

Returns
JoinResult – an object on which .select() may be called to extract relevant columns from the result of the join.

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown(
    '''
        | a  | b
      1 | 11 | 111
      2 | 12 | 112
      3 | 13 | 113
      4 | 13 | 114
    '''
)
t2 = pw.debug.table_from_markdown(
    '''
        | c  | d
      1 | 11 | 211
      2 | 12 | 212
      3 | 14 | 213
      4 | 14 | 214
    '''
)
pw.debug.compute_and_print(t1.join_right(t2, t1.a == t2.c
).select(t1.a, t2_c=t2.c, s=pw.require(pw.coalesce(t1.b,0) + t2.d,t1.id)),
include_id=False)

Returns
OuterJoinResult object

plot(plotting_function, sorting_col=None)

sourceAllows for plotting contents of the table visually in e.g. jupyter. If the table depends only on the bounded data sources, the plot will be generated right away. Otherwise (in streaming scenario), the plot will be auto-updating after running pw.run()

Parameters
- self (pw.Table) – a table serving as a source of data
- plotting_function (Callable[[ColumnDataSource], Plot]) – function for creating plot from ColumnDataSource
Returns
pn.Column – visualization which can be displayed immediately or passed as a dashboard widget

Example:

import pathway as pw
from bokeh.plotting import figure
def func(source):
    plot = figure(height=400, width=400, title="CPU usage over time")
    plot.scatter('a', 'b', source=source, line_width=3, line_alpha=0.6)
    return plot
viz = pw.debug.table_from_pandas(pd.DataFrame({"a":[1,2,3],"b":[3,1,2]})).plot(func)  
type(viz)

pointer_from(*args, optional=False, instance=None)

sourcePseudo-random hash of its argument. Produces pointer types. Applied column-wise.

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown('''
   age  owner  pet
1   10  Alice  dog
2    9    Bob  dog
3    8  Alice  cat
4    7    Bob  dog''')
g = t1.groupby(t1.owner).reduce(refcol = t1.pointer_from(t1.owner)) # g.id == g.refcol
pw.debug.compute_and_print(g.select(test = (g.id == g.refcol)), include_id=False)

promise_universe_is_equal_to(other)

sourceAsserts to the Pathway Live Data Framework that an universe of self is a subset of universe of each of the others.

Semantics: Used in situations where the Pathway Live Data Framework cannot deduce one universe being a subset of another.

Returns
None

NOTE: The assertion works in place.

Example:

import pathway as pw
import pytest
t1 = pw.debug.table_from_markdown(
    '''
  | age | owner | pet
1 | 8   | Alice | cat
2 | 9   | Bob   | dog
3 | 15  | Alice | tortoise
4 | 99  | Bob   | seahorse
'''
).filter(pw.this.age<30)
t2 = pw.debug.table_from_markdown(
    '''
  | age | owner
1 | 11  | Alice
2 | 12  | Tom
3 | 7   | Eve
'''
)
t3 = t2.filter(pw.this.age > 10)
with pytest.raises(ValueError):
    t1.update_cells(t3)
t1 = t1.promise_universe_is_equal_to(t2)
result = t1.update_cells(t3)
pw.debug.compute_and_print(result, include_id=False)

promise_universe_is_subset_of(other)

sourceAsserts to the Pathway Live Data Framework that an universe of self is a subset of universe of each of the other.

Semantics: Used in situations where the Pathway Live Data Framework cannot deduce one universe being a subset of another.

Returns
self

NOTE: The assertion works in place.

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown('''
  | age | owner | pet
1 | 10  | Alice | 1
2 | 9   | Bob   | 1
3 | 8   | Alice | 2
''')
t2 = pw.debug.table_from_markdown('''
  | age | owner | pet
1 | 10  | Alice | 30
''').promise_universe_is_subset_of(t1)
t3 = t1 << t2
pw.debug.compute_and_print(t3, include_id=False)

promise_universes_are_disjoint(other)

sourceAsserts to Pathway Live Data Framework that an universe of self is disjoint from universe of other.

Semantics: Used in situations where the Pathway Live Data Framework cannot deduce universes are disjoint.

Returns
self

NOTE: The assertion works in place.

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown('''
  | age | owner | pet
1 | 10  | Alice | 1
2 | 9   | Bob   | 1
3 | 8   | Alice | 2
''')
t2 = pw.debug.table_from_markdown('''
   | age | owner | pet
11 | 11  | Alice | 30
12 | 12  | Tom   | 40
''').promise_universes_are_disjoint(t1)
t3 = t1.concat(t2)
pw.debug.compute_and_print(t3, include_id=False)

reduce(*args, **kwargs)

sourceReduce a table to a single row.

Equivalent to self.groupby().reduce(*args, **kwargs).

Parameters
- args (ColumnReference) – reducer to reduce the table with
- kwargs (ColumnExpression) – reducer to reduce the table with. Its key is the new name of a column.
Returns
Table – Reduced table.

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown('''
age | owner | pet
10  | Alice | dog
9   | Bob   | dog
8   | Alice | cat
7   | Bob   | dog
''')
t2 = t1.reduce(ageagg=pw.reducers.argmin(t1.age))
pw.debug.compute_and_print(t2, include_id=False)

t3 = t2.select(t1.ix(t2.ageagg).age, t1.ix(t2.ageagg).pet)
pw.debug.compute_and_print(t3, include_id=False)

remove_errors()

sourceFilters out rows that contain errors.

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown(
    '''
    a | b
    3 | 3
    4 | 0
    5 | 5
    6 | 2
'''
)
t2 = t1.with_columns(x=pw.this.a // pw.this.b)
res = t2.remove_errors()
pw.debug.compute_and_print(res, include_id=False, terminate_on_error=False)

rename(names_mapping=None, **kwargs)

sourceRename columns according either a dictionary or kwargs.

If a mapping is provided using a dictionary, rename_by_dict will be used. Otherwise, rename_columns will be used with kwargs. Columns not in keys(kwargs) are not changed. New name of a column must not be id.

Parameters
- names_mapping (dict[str | ColumnReference, str] | None) – mapping from old column names to new names.
- kwargs (ColumnExpression) – mapping from old column names to new names.
Returns
Table – self with columns renamed.

rename_by_dict(names_mapping)

sourceRename columns according to a dictionary.

Columns not in keys(kwargs) are not changed. New name of a column must not be id.

Parameters
names_mapping (dict[str | ColumnReference, str]) – mapping from old column names to new names.
Returns
Table – self with columns renamed.

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown('''
age | owner | pet
10  | Alice | 1
9   | Bob   | 1
8   | Alice | 2
''')
t2 = t1.rename_by_dict({"age": "years_old", t1.pet: "animal"})
pw.debug.compute_and_print(t2, include_id=False)

rename_columns(**kwargs)

sourceRename columns according to kwargs.

Columns not in keys(kwargs) are not changed. New name of a column must not be id.

Parameters
kwargs (str | ColumnReference) – mapping from old column names to new names.
Returns
Table – self with columns renamed.

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown('''
age | owner | pet
10  | Alice | 1
9   | Bob   | 1
8   | Alice | 2
''')
t2 = t1.rename_columns(years_old=t1.age, animal=t1.pet)
pw.debug.compute_and_print(t2, include_id=False)

restrict(other)

sourceRestrict self universe to keys appearing in other.

Parameters
other (TableLike) – table which universe is used to restrict universe of self.
Returns
Table – table with restricted universe, with the same set of columns

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown(
    '''
  | age  | owner  | pet
1 | 10   | Alice  | 1
2 | 9    | Bob    | 1
3 | 8    | Alice  | 2
'''
)
t2 = pw.debug.table_from_markdown(
    '''
  | cost
2 | 100
3 | 200
'''
)
t2.promise_universe_is_subset_of(t1)

t3 = t1.restrict(t2)
pw.debug.compute_and_print(t3, include_id=False)

property schema: type[pathway.internals.schema.Schema]

Get schema of the table.

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown('''
age | owner | pet
10  | Alice | dog
9   | Bob   | dog
8   | Alice | cat
7   | Bob   | dog
''')
t1.schema

t1.typehints()['age']

select(*args, **kwargs)

sourceBuild a new table with columns specified by kwargs.

Output columns’ names are keys(kwargs). values(kwargs) can be raw values, boxed values, columns. Assigning to id reindexes the table.

Parameters
- args (ColumnReference) – Column references.
- kwargs (Any) – Column expressions with their new assigned names.
Returns
Table – Created table.

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown('''
pet
Dog
Cat
''')
t2 = t1.select(animal=t1.pet, desc="fluffy")
pw.debug.compute_and_print(t2, include_id=False)

show(*, snapshot=True, include_id=True, short_pointers=True, sorters=None, page_size=10, table_height=400)

sourceAllows for displaying table visually in e.g. jupyter. If the table depends only on the bounded data sources, the table preview will be generated right away. Otherwise (in streaming scenario), the table will be auto-updating after running pw.run()

Parameters
- self (pw.Table) – a table to be displayed
- snapshot (bool, optional) – whether only current snapshot or all changes to the table should be displayed. Defaults to True.
- include_id (bool, optional) – whether to show ids of rows. Defaults to True.
- short_pointers (bool, optional) – whether to shorten printed ids. Defaults to True.
- sorters (list, optional) – a list of sorter definitions mapping where each item should declare the column to sort on and the direction to sort. Defaults to None.
- page_size (int, optional) – number of rows on each page. Defaults to 10.
- table_height (int, optional) – fixed height of the table widget. Defaults to 400.
Returns
pn.Column – visualization which can be displayed immediately or passed as a dashboard widget

Example:

import pathway as pw
table_viz = pw.debug.table_from_pandas(pd.DataFrame({"a":[1,2,3],"b":[3,1,2]})).show()  
type(table_viz)

property slice: TableSlice

Creates a collection of references to self columns. Supports basic column manipulation methods.

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown('''
age | owner | pet
10  | Alice | dog
9   | Bob   | dog
8   | Alice | cat
7   | Bob   | dog
''')
t1.slice.without("age")

sort(key, instance=None)

sourceSorts a table by the specified keys.

Parameters
- table – pw.Table The table to be sorted.
- key (ColumnExpression[int | float | datetime | str | bytes]) – An expression to sort by.
- instance (ColumnExpression | None) – ColumnReference or None An expression with instance. Rows are sorted within an instance. prev and next columns will only point to rows that have the same instance.
Returns
pw.Table – The sorted table. Contains two columns: prev and next, containing the pointers to the previous and next rows.

Example:

import pathway as pw
table = pw.debug.table_from_markdown('''
name     | age | score
Alice    | 25  | 80
Bob      | 20  | 90
Charlie  | 30  | 80
''')
table = table.with_id_from(pw.this.name)
table += table.sort(key=pw.this.age)
pw.debug.compute_and_print(table, include_id=True)

table = pw.debug.table_from_markdown('''
name     | age | score
Alice    | 25  | 80
Bob      | 20  | 90
Charlie  | 30  | 80
David    | 35  | 90
Eve      | 15  | 80
''')
table = table.with_id_from(pw.this.name)
table += table.sort(key=pw.this.age, instance=pw.this.score)
pw.debug.compute_and_print(table, include_id=True)

split(split_expression)

sourceSplit a table according to split_expression condition.

Parameters
split_expression (ColumnExpression) – ColumnExpression that specifies the split condition.
Returns
positive_table, negative_table – tuple of tables, with the same schemas as self and with ids that are subsets of self.id, and provably disjoint.

Example:

import pathway as pw
vertices = pw.debug.table_from_markdown('''
label outdegree
    1         3
    7         0
''')
positive, negative = vertices.split(vertices.outdegree == 0)
pw.debug.compute_and_print(positive, include_id=False)

pw.debug.compute_and_print(negative, include_id=False)

stream_to_table(is_upsert)

sourceConverts a stream of changes (updates and deletions) into a table.

In the Pathway Live Data Framework, a stream is a sequence of row changes, where each row has an id and a boolean column (e.g., “is_upsert”) indicating whether the row is an update (True) or a deletion (False).

This method reconstructs the current state of the table from such a stream by applying the updates and deletions in order. It is a stateful operation: the operator keeps track of the latest value for each id. If there are multiple events for a single id in a single batch in a stream, the order of applying the actions is not specified. For deletions, only ids are important. The values in columns are ignored.

Parameters
is_upsert (ColumnExpression) – An expression that evaluates to a boolean value. True means the row is an upsert (insert or update), False means the row is a deletion.
Returns
Table – A table with the same columns as the original stream, representing the current state.

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown(
    '''
id | pet | age | is_upsert | __time__
 1 | cat |  3  |   True    |     2
 2 | dog | 11  |   True    |     2
 1 | cat | 4   |   True    |     4
 2 | dog | 0   |  False    |     4
'''
)
t2 = t1.stream_to_table(pw.this.is_upsert)
pw.debug.compute_and_print_update_stream(t2, include_id=False)

to_stream(upsert_column_name='is_upsert')

sourceConverts a table to a stream of changes.

If in a given batch there is:

an insert or an update for a given key, a row with True in the update_column_name column is produced
a delete for a given key, a row with False in the update_column_name column is produced.

The values in all other columns are kept. This is a stateless operation.

Parameters
upsert_column_name (str) – name of the boolean column that will be added to the table and contain information about the type of action.
Returns
Table – An append only table with an additional column informing about the action type.

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown('''
id | age | owner | pet | __time__ | __diff__
 1 | 10  | Alice | dog |     2    |     1
 2 | 9   | Bob   | cat |     2    |     1
 1 | 10  | Alice | dog |     4    |    -1
 1 | 11  | Alice | dog |     4    |     1
 2 | 9   | Bob   | cat |     4    |    -1
 2 | 10  | Bob   | cat |     4    |     1
 1 | 11  | Alice | dog |     6    |    -1
 1 | 12  | Alice | dog |     6    |     1
 2 | 10  | Bob   | cat |     6    |    -1
''')
t2 = t1.to_stream()
pw.debug.compute_and_print_update_stream(t2, include_id=False)

typehints()

sourceReturn the types of the columns as a dictionary.

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown('''
age | owner | pet
10  | Alice | dog
9   | Bob   | dog
8   | Alice | cat
7   | Bob   | dog
''')
t1.typehints()

unpack_snapshots()

sourceTransforms a table representation from a change stream into a snapshot stream. A snapshot is the full state of the table after all additions, deletions, and updates corresponding to a specific changes minibatch that has been applied.

For example, suppose that at time T the table contains three rows: A, B, and C. At the next Pathway minibatch, time T+1, row C is replaced by row D. The table produced by this operator will then contain six rows as follows: at time T the rows A, B, and C, and at time T+1 the rows A, B, and D.

Use caution when applying this method to large tables that change frequently. Any Pathway Live Data Framework minibatch in which at least one row is modified will emit a snapshot containing all rows in the table, which can result in a very large output.

Example:

You can create a table streamed in three minibatches with three rows as follows:

import pathway as pw
class DataColumnSchema(pw.Schema):
    data: str
table = pw.demo.generate_custom_stream(
    value_generators={"data": lambda x: str(x + 1)},
    schema=DataColumnSchema,
    nb_rows=3,
)

Then, the snapshot representation can be obtained:

snapshot_representation = table.unpack_snapshots()

Use an output connector to write the snapshots grouped by time:

pw.io.csv.write(snapshot_representation, "snapshots.txt")
pw.run()
with open("snapshots.txt", "r") as f:  
    print(f.read())

The output shows three time-based snapshots: first the initial state with row "1", then an updated state with rows "1" and "2", and finally the state with rows "1", "2", and "3".

update_cells(other, )

sourceUpdates cells of self, breaking ties in favor of the values in other.

Semantics:

* result.columns == self.columns

* result.id == self.id

* conflicts are resolved preferring other’s values

Requires:

* other.columns ⊆ self.columns

* other.id ⊆ self.id

Parameters
other (Table) – the other table.
Returns
Table – self updated with cells form other.

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown('''
  | age | owner | pet
1 | 10  | Alice | 1
2 | 9   | Bob   | 1
3 | 8   | Alice | 2
''')
t2 = pw.debug.table_from_markdown('''
    age | owner | pet
1 | 10  | Alice | 30
''')
pw.universes.promise_is_subset_of(t2, t1)
t3 = t1.update_cells(t2)
pw.debug.compute_and_print(t3, include_id=False)

update_rows(other)

sourceUpdates rows of self, breaking ties in favor for the rows in other.

Semantics:

result.columns == self.columns == other.columns
result.id == self.id ∪ other.id

Requires:

other.columns == self.columns

Parameters
other (Table[TypeVar(TSchema, bound= Schema)]) – the other table.
Returns
Table – self updated with rows form other.

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown('''
  | age | owner | pet
1 | 10  | Alice | 1
2 | 9   | Bob   | 1
3 | 8   | Alice | 2
''')
t2 = pw.debug.table_from_markdown('''
   | age | owner | pet
1  | 10  | Alice | 30
12 | 12  | Tom   | 40
''')
t3 = t1.update_rows(t2)
pw.debug.compute_and_print(t3, include_id=False)

update_types(**kwargs)

sourceUpdates types in schema. Has no effect on the runtime.

with_columns(*args, **kwargs)

sourceUpdates columns of self, according to args and kwargs. See table.select specification for evaluation of args and kwargs.

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown('''
  | age | owner | pet
1 | 10  | Alice | 1
2 | 9   | Bob   | 1
3 | 8   | Alice | 2
''')
t2 = pw.debug.table_from_markdown('''
  | owner | pet | size
1 | Tom   | 1   | 10
2 | Bob   | 1   | 9
3 | Tom   | 2   | 8
''')
t3 = t1.with_columns(*t2)
pw.debug.compute_and_print(t3, include_id=False)

with_id(new_index)

sourceSet new ids based on another column containing id-typed values.

To generate ids based on arbitrary valued columns, use with_id_from.

Values assigned must be row-wise unique. The uniqueness is not checked by pathway. Failing to provide unique ids can cause unexpected errors downstream.

Parameters
new_id – column to be used as the new index.
Returns
Table with updated ids.

Example:

import pytest; pytest.xfail("with_id is hard to test")
import pathway as pw
t1 = pw.debug.table_from_markdown('''
  | age | owner | pet
1 | 10  | Alice | 1
2 | 9   | Bob   | 1
3 | 8   | Alice | 2
''')
t2 = pw.debug.table_from_markdown('''
  | new_id
1 | 2
2 | 3
3 | 4
''')
t3 = t1.promise_universe_is_subset_of(t2).with_id(t2.new_id)
pw.debug.compute_and_print(t3)

with_id_from(*args, instance=None)

sourceCompute new ids based on values in columns. Ids computed from columns must be row-wise unique. The uniqueness is not checked by pathway. Failing to provide unique ids can cause unexpected errors downstream.

Parameters
columns – columns to be used as primary keys.
Returns
Table – self updated with recomputed ids.

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown('''
   | age | owner  | pet
 1 | 10  | Alice  | 1
 2 | 9   | Bob    | 1
 3 | 8   | Alice  | 2
''')
t2 = t1 + t1.select(old_id=t1.id)
t3 = t2.with_id_from(t2.age)
pw.debug.compute_and_print(t3)

t4 = t3.select(t3.age, t3.owner, t3.pet, same_as_old=(t3.id == t3.old_id),
    same_as_new=(t3.id == t3.pointer_from(t3.age)))
pw.debug.compute_and_print(t4)

with_prefix(prefix)

sourceRename columns by adding prefix to each name of column.

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown('''
age | owner | pet
10  | Alice | 1
9   | Bob   | 1
8   | Alice | 2
''')
t2 = t1.with_prefix("u_")
pw.debug.compute_and_print(t2, include_id=False)

with_suffix(suffix)

sourceRename columns by adding suffix to each name of column.

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown('''
age | owner | pet
10  | Alice | 1
9   | Bob   | 1
8   | Alice | 2
''')
t2 = t1.with_suffix("_current")
pw.debug.compute_and_print(t2, include_id=False)

with_universe_of(other)

sourceReturns a copy of self with exactly the same universe as others.

Semantics: Required precondition self.universe == other.universe Used in situations where the Pathway Live Data Framework cannot deduce equality of universes, but those are equal as verified during runtime.

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown('''
  | pet
1 | Dog
7 | Cat
''')
t2 = pw.debug.table_from_markdown('''
  | age
1 | 10
7 | 3
8 | 100
''')
t3 = t2.filter(pw.this.age < 30).with_universe_of(t1)
t4 = t1 + t3
pw.debug.compute_and_print(t4, include_id=False)

without(*columns)

sourceSelects all columns without named column references.

Parameters
columns (str | ColumnReference) – columns to be dropped provided by table.column_name notation.
Returns
Table – self without specified columns.

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown('''
age  | owner  | pet
 10  | Alice  | 1
  9  | Bob    | 1
  8  | Alice  | 2
''')
t2 = t1.without(t1.age, pw.this.pet)
pw.debug.compute_and_print(t2, include_id=False)