pathway.stdlib.utils package

Submodules

pathway.stdlib.utils.async_transformer module


class pathway.stdlib.utils.async_transformer.AsyncTransformer(input_table)

Allows to perform async transformations on a table.

invoke() will be called asynchronously for each row of an input_table.

Output table can be acccesed via result.

Example:

import pathway as pw
import asyncio
class OutputSchema(pw.Schema):
ret: int
class AsyncIncrementTransformer(pw.AsyncTransformer, output_schema=OutputSchema):
async def invoke(self, value) -> Dict[str, Any]:
await asyncio.sleep(0.1)
return {"ret": value + 1 }
input = pw.debug.parse_to_table('''
| value
1 | 42
2 | 44
''')
result = AsyncIncrementTransformer(input_table=input).result
pw.debug.compute_and_print(result, include_id=False)
ret
42
43

close()

Called once at the end. Proper place for cleanup.

  • Return type
    None

abstract async invoke(*args, **kwargs)

Called for every row of input_table. The arguments will correspond to the columns in the input table.

Should return dict of values matching output_schema.

  • Return type
    Dictstr, Any

open()

Called before actual work. Suitable for one time setup.

  • Return type
    None

property result(_: Table

Resulting table.


with_options(capacity=None, retry_strategy=None)

Sets async options.

  • Parameters
    • capacity (Optionalint) – maximum number of concurrent operations.
    • retry_strategy (OptionalAsyncRetryStrategy) – defines how failures will be handled.
  • Return type
    AsyncTransformer
  • Returns
    self

pathway.stdlib.utils.bucketing module

pathway.stdlib.utils.col module


pathway.stdlib.utils.col.apply_all_rows(*cols, fun, result_col_name)

Applies a function to all the data in selected columns at once, returning a single column. This transformer is meant to be run infrequently on a relativelly small tables.

Input:

  • cols: list of columns to which function will be applied
  • fun: function taking lists of columns and returning a corresponding list of outputs.
  • result_col_name: name of the output column

Output:

  • Table indexed with original indices with a single column named by “result_col_name” argument containing results of the apply

pathway.stdlib.utils.col.flatten_column(column, origin_id=.origin_id)

Flattens a column of a table.

Input:

  • column: Column expression of column to be flattened
  • origin_id: name of output column where to store id’s of input rows

Output:

  • Table with columns: colname_to_flatten and origin_id (if not None)

pathway.stdlib.utils.col.groupby_reduce_majority(column_group, column_val)

Finds a majority in column_val for every group in column_group.

Workaround for missing majority reducer.


pathway.stdlib.utils.col.multiapply_all_rows(*cols, fun, result_col_names)

Applies a function to all the data in selected columns at once, returning multiple columns. This transformer is meant to be run infrequently on a relativelly small tables.

Input:

  • cols: list of columns to which function will be applied
  • fun: function taking lists of columns and returning a corresponding list of outputs.
  • result_col_names: names of the output columns

Output:

  • Table indexed with original indices with columns named by “result_col_names” argument containing results of the apply

pathway.stdlib.utils.col.unpack_col(column, *args)

Unpacks multiple columns from a single column.

Input:

  • column: Column expression of column containing some sequences
  • names: list of names of output columns

Output:

  • Table with columns named by “names” argument

pathway.stdlib.utils.filtering module

pathway.stdlib.utils.pandas_transformer module


pathway.stdlib.utils.pandas_transformer.pandas_transformer(output_schema, output_universe=None)

Decorator that turns python function operating on pandas.DataFrame into pathway transformer.

Input universes are converted into input DataFrame indexes. The resulting index is treated as the output universe, so it must maintain uniqueness and be of integer type.

  • Parameters
    • output_schema (Type[Schema]) – Schema of a resulting table.
    • output_universe (Unionstr, int, None) – Index or name of an argument whose universe will be used
    • None. (in resulting table. Defaults to) –
  • Returns
    Transformer that can be applied on Pathway tables.