Column Operations
This section explores some functions under the pw.utils.col namespace. These utilities are designed to facilitate complex manipulations on table columns, including applying functions across all rows, flattening columns, performing group-wise majority find operations, and unpacking multiple columns from a single one.
Functions
pw.utils.col.apply_all_rows(*cols, fun, result_col_name)
Applies a function to all the data in selected columns at once, returning a single column. This transformer is meant to be run infrequently on a relativelly small tables.Input:
- cols: list of columns to which function will be applied
- fun: function taking lists of columns and returning a corresponding list of outputs.
- result_col_name: name of the output column
Output:
- Table indexed with original indices with a single column named by “result_col_name” argument containing results of the apply
Example:
import pathway as pw
table = pw.debug.table_from_markdown(
'''
| colA | colB
1 | 1 | 10
2 | 2 | 20
3 | 3 | 30
''')
def add_total_sum(col1, col2):
sum_all = sum(col1) + sum(col2)
return [x + sum_all for x in col1]
result = pw.utils.col.apply_all_rows(
table.colA, table.colB, fun=add_total_sum, result_col_name="res"
)
pw.debug.compute_and_print(result, include_id=False)
pw.utils.col.flatten_column(column, origin_id=<table1>.origin_id)
Deprecated: use pw.Table.flatten instead.Flattens a column of a table.
Input:
- column: Column expression of column to be flattened
- origin_id: name of output column where to store id’s of input rows
Output:
- Table with columns: colname_to_flatten and origin_id (if not None)
Example
import pathway as pw
t1 = pw.debug.parse_to_table('''
| pet | age
1 | Dog | 2
7 | Cat | 5
''')
t2 = pw.utils.col.flatten_column(t1.pet)
pw.debug.compute_and_print(t2.without(pw.this.origin_id), include_id=False)
pw.utils.col.groupby_reduce_majority(column_group, column_val)
Finds a majority in column_val for every group in column_group.Workaround for missing majority reducer.
Example:
import pathway as pw
table = pw.debug.table_from_markdown(
'''
| group | vote
0 | 1 | pizza
1 | 1 | pizza
2 | 1 | hotdog
3 | 2 | hotdog
4 | 2 | pasta
5 | 2 | pasta
6 | 2 | pasta
''')
result = pw.utils.col.groupby_reduce_majority(table.group, table.vote)
pw.debug.compute_and_print(result, include_id=False)
pw.utils.col.multiapply_all_rows(*cols, fun, result_col_names)
Applies a function to all the data in selected columns at once, returning multiple columns. This transformer is meant to be run infrequently on a relativelly small tables.Input:
- cols: list of columns to which function will be applied
- fun: function taking lists of columns and returning a corresponding list of outputs.
- result_col_names: names of the output columns
Output:
- Table indexed with original indices with columns named by “result_col_names” argument containing results of the apply
Example:
import pathway as pw
table = pw.debug.table_from_markdown(
'''
| colA | colB
1 | 1 | 10
2 | 2 | 20
3 | 3 | 30
''')
def add_total_sum(col1, col2):
sum_all = sum(col1) + sum(col2)
return [x + sum_all for x in col1], [x + sum_all for x in col2]
result = pw.utils.col.multiapply_all_rows(
table.colA, table.colB, fun=add_total_sum, result_col_names=["res1", "res2"]
)
pw.debug.compute_and_print(result, include_id=False)
pw.utils.col.unpack_col(column, *unpacked_columns)
Unpacks multiple columns from a single column.Input:
- column: Column expression of column containing some sequences
- unpacked_columns: list of names of output columns
Output:
- Table with columns named by “unpacked_columns” argument
Example:
import pathway as pw
t1 = pw.debug.table_from_markdown(
'''
| colA | colB | colC
1 | Alice | 25 | dog
2 | Bob | 32 | cat
3 | Carole | 28 | dog
''')
t2 = t1.select(user = pw.make_tuple(pw.this.colA, pw.this.colB, pw.this.colC))
pw.debug.compute_and_print(t2, include_id=False)
unpack_table = pw.utils.col.unpack_col(t2.user, "name", "age", "pet")
pw.debug.compute_and_print(unpack_table, include_id=False)