Column Operations

This section explores some functions under the pw.utils.col namespace. These utilities are designed to facilitate complex manipulations on table columns, including applying functions across all rows, flattening columns, performing group-wise majority find operations, and unpacking multiple columns from a single one.

Functions

pw.utils.col.apply_all_rows(*cols, fun, result_col_name)

Applies a function to all the data in selected columns at once, returning a single column. This transformer is meant to be run infrequently on a relativelly small tables.

Input:

  • cols: list of columns to which function will be applied
  • fun: function taking lists of columns and returning a corresponding list of outputs.
  • result_col_name: name of the output column

Output:

  • Table indexed with original indices with a single column named by “result_col_name” argument containing results of the apply

Example:

import pathway as pw
table = pw.debug.table_from_markdown(
'''
  | colA | colB
1 | 1    | 10
2 | 2    | 20
3 | 3    | 30
''')
def add_total_sum(col1, col2):
   sum_all = sum(col1) + sum(col2)
   return [x + sum_all for x in col1]
result = pw.utils.col.apply_all_rows(
   table.colA, table.colB, fun=add_total_sum, result_col_name="res"
)
pw.debug.compute_and_print(result, include_id=False)
res
67
68
69

pw.utils.col.flatten_column(column, origin_id=<table1>.origin_id)

Deprecated: use pw.Table.flatten instead.

Flattens a column of a table.

Input:

  • column: Column expression of column to be flattened
  • origin_id: name of output column where to store id’s of input rows

Output:

  • Table with columns: colname_to_flatten and origin_id (if not None)

Example

import pathway as pw
t1 = pw.debug.parse_to_table('''
  | pet  |  age
1 | Dog  |   2
7 | Cat  |   5
''')
t2 = pw.utils.col.flatten_column(t1.pet)
pw.debug.compute_and_print(t2.without(pw.this.origin_id), include_id=False)
pet
C
D
a
g
o
t

pw.utils.col.groupby_reduce_majority(column_group, column_val)

Finds a majority in column_val for every group in column_group.

Workaround for missing majority reducer.

Example:

import pathway as pw
table = pw.debug.table_from_markdown(
'''
  | group | vote
0 | 1     | pizza
1 | 1     | pizza
2 | 1     | hotdog
3 | 2     | hotdog
4 | 2     | pasta
5 | 2     | pasta
6 | 2     | pasta
''')
result = pw.utils.col.groupby_reduce_majority(table.group, table.vote)
pw.debug.compute_and_print(result, include_id=False)
group | majority
1     | pizza
2     | pasta

pw.utils.col.multiapply_all_rows(*cols, fun, result_col_names)

Applies a function to all the data in selected columns at once, returning multiple columns. This transformer is meant to be run infrequently on a relativelly small tables.

Input:

  • cols: list of columns to which function will be applied
  • fun: function taking lists of columns and returning a corresponding list of outputs.
  • result_col_names: names of the output columns

Output:

  • Table indexed with original indices with columns named by “result_col_names” argument containing results of the apply

Example:

import pathway as pw
table = pw.debug.table_from_markdown(
'''
  | colA | colB
1 | 1    | 10
2 | 2    | 20
3 | 3    | 30
''')
def add_total_sum(col1, col2):
   sum_all = sum(col1) + sum(col2)
   return [x + sum_all for x in col1], [x + sum_all for x in col2]
result = pw.utils.col.multiapply_all_rows(
   table.colA, table.colB, fun=add_total_sum, result_col_names=["res1", "res2"]
)
pw.debug.compute_and_print(result, include_id=False)
res1 | res2
67   | 76
68   | 86
69   | 96

pw.utils.col.unpack_col(column, *unpacked_columns)

Unpacks multiple columns from a single column.

Input:

  • column: Column expression of column containing some sequences
  • unpacked_columns: list of names of output columns

Output:

  • Table with columns named by “unpacked_columns” argument

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown(
'''
  | colA   | colB | colC
1 | Alice  | 25   | dog
2 | Bob    | 32   | cat
3 | Carole | 28   | dog
''')
t2 = t1.select(user = pw.make_tuple(pw.this.colA, pw.this.colB, pw.this.colC))
pw.debug.compute_and_print(t2, include_id=False)
user
('Alice', 25, 'dog')
('Bob', 32, 'cat')
('Carole', 28, 'dog')
unpack_table = pw.utils.col.unpack_col(t2.user, "name", "age", "pet")
pw.debug.compute_and_print(unpack_table, include_id=False)
name   | age | pet
Alice  | 25  | dog
Bob    | 32  | cat
Carole | 28  | dog