pathway.stdlib.utils package
Submodules
pathway.stdlib.utils.async_transformer module
class pathway.stdlib.utils.async_transformer.AsyncTransformer(input_table)
Allows to perform async transformations on a table.
invoke()
will be called asynchronously for each row of an input_table.
Output table can be acccesed via result
.
Example:
import pathway as pw import asyncio class OutputSchema(pw.Schema): ret: int class AsyncIncrementTransformer(pw.AsyncTransformer, output_schema=OutputSchema): async def invoke(self, value) -> Dict[str, Any]: await asyncio.sleep(0.1) return {"ret": value + 1 } input = pw.debug.parse_to_table(''' | value 1 | 42 2 | 44 ''') result = AsyncIncrementTransformer(input_table=input).result pw.debug.compute_and_print(result, include_id=False)
ret4243
close()
Called once at the end. Proper place for cleanup.
- Return type
None
abstract async invoke(*args, **kwargs)
Called for every row of input_table. The arguments will correspond to the columns in the input table.
Should return dict of values matching output_schema
.
- Return type
Dict
str
,Any
open()
Called before actual work. Suitable for one time setup.
- Return type
None
property result(_: Table
Resulting table.
with_options(capacity=None, retry_strategy=None)
Sets async options.
- Parameters
- capacity (
Optional
int
) – maximum number of concurrent operations. - retry_strategy (
Optional
AsyncRetryStrategy
) – defines how failures will be handled.
- capacity (
- Return type
AsyncTransformer
- Returns
self
pathway.stdlib.utils.bucketing module
pathway.stdlib.utils.col module
pathway.stdlib.utils.col.apply_all_rows(*cols, fun, result_col_name)
Applies a function to all the data in selected columns at once, returning a single column. This transformer is meant to be run infrequently on a relativelly small tables.
Input:
- cols: list of columns to which function will be applied
- fun: function taking lists of columns and returning a corresponding list of outputs.
- result_col_name: name of the output column
Output:
- Table indexed with original indices with a single column named by “result_col_name” argument containing results of the apply
- Return type
Table
pathway.stdlib.utils.col.flatten_column(column, origin_id=.origin_id)
Flattens a column of a table.
Input:
- column: Column expression of column to be flattened
- origin_id: name of output column where to store id’s of input rows
Output:
- Table with columns: colname_to_flatten and origin_id (if not None)
- Return type
Table
pathway.stdlib.utils.col.groupby_reduce_majority(column_group, column_val)
Finds a majority in column_val for every group in column_group.
Workaround for missing majority reducer.
pathway.stdlib.utils.col.multiapply_all_rows(*cols, fun, result_col_names)
Applies a function to all the data in selected columns at once, returning multiple columns. This transformer is meant to be run infrequently on a relativelly small tables.
Input:
- cols: list of columns to which function will be applied
- fun: function taking lists of columns and returning a corresponding list of outputs.
- result_col_names: names of the output columns
Output:
- Table indexed with original indices with columns named by “result_col_names” argument containing results of the apply
- Return type
Table
pathway.stdlib.utils.col.unpack_col(column, *args)
Unpacks multiple columns from a single column.
Input:
- column: Column expression of column containing some sequences
- names: list of names of output columns
Output:
- Table with columns named by “names” argument
- Return type
Table
pathway.stdlib.utils.filtering module
pathway.stdlib.utils.pandas_transformer module
pathway.stdlib.utils.pandas_transformer.pandas_transformer(output_schema, output_universe=None)
Decorator that turns python function operating on pandas.DataFrame into pathway transformer.
Input universes are converted into input DataFrame indexes. The resulting index is treated as the output universe, so it must maintain uniqueness and be of integer type.
- Parameters
- output_schema (
Type
[Schema
]) – Schema of a resulting table. - output_universe (
Union
str
,int
,None
) – Index or name of an argument whose universe will be used - None. (in resulting table. Defaults to) –
- output_schema (
- Returns
Transformer that can be applied on Pathway tables.