Table Operations: An Overview

This section describes the basic transformations you can do with Pathway.

Assignment and renaming

You can create a column in a table using the select and assignment (=) operators:

  • t.select(new_col=t.colA + t.colB)
  • t.select(new_col="default value")

To rename a column, you can do the same (use select) or use rename:

  • t.select(new_col=t.old_col)
  • t.rename(new_col=t.old_col)

Selection and indexing

Description Operators Example
Select a column select and dot/bracket notations t.select(t.colA, t['colB'], pw.this.colC)
Select all columns select and star notation t.select(*pw.this)
Removing columns without and dot/bracket notations t.without(t.colA, t.colB)
Referring to the current table pw.this t1.select(new_col=pw.this.colA + pw.this.colB)
Referring to a table in a join/window pw.left and pw.right notations t1.join(t2, pw.left.colA == pw.right.colB).reduce(*pw.left, pw.right.colC)
Reference indexing ix_ref t_selected_ids.select(selected=t.ix_ref(column).name)
Reindexing with_id_from t.with_id_from(t_new_ids.new_id_source)

Arithmetic operators

Description Operators Example
Addition + t.select(new_col=t.colA + t.colB)
Subtraction - t.select(new_col=t.colA - t.colB)
Multiplication * t.select(new_col=t.colA * t.colB)
Division / t.select(new_col=t.colA / t.colB)
Floor division // t.select(new_col=t.colA // t.colB)
Modulus % t.select(new_col=t.colA % t.colB)
Exponentiation ** t.select(new_col=t.colA ** t.colB)

Comparisons operators

Description Operators Example
Equal == t.select(new_col=t.colA == t.colB)
Not equal != t.select(new_col=t.colA != t.colB)
Greater than > t.select(new_col=t.colA > t.colB)
Less than < t.select(new_col=t.colA < t.colB)
Greater than or equal to >= t.select(new_col=t.colA >= t.colB)
Less than or equal to <= t.select(new_col=t.colA <= t.colB)

Boolean operators

Description Operators Example
And & t.select(new_col=t.colA & t.colB)
Or | t.select(new_col=t.colA | t.colB)
Not ~ t.select(new_col=~t.colA)
Exclusive or (XOR) ^ t.select(new_col=t.colA ^ t.colB)

Filtering

You can filter rows using the filter operator:

  • t.filter(~pw.this.column)
  • t.filter(pw.this.column > value)

Missing data

The pw.coalesce operator returns the first not-None value from the given columns:

  • t.select(new_col=pw.coalesce(t.colA, t.colB))
  • t.select(new_col=pw.coalesce(t.colA, 10)

Aggregation

You can aggregate data across the rows of the table using the groupby and reduce operators:

t.groupby(pw.this.column).reduce(sum=pw.reducers.sum(pw.this.value))

You can read our dedicated tutorial to learn more about it.

Reducers

Pathway provides several reducers to use on the aggregated values:

Reducer Example
any t.groupby(t.colA).reduce(col_any=pw.reducers.any(t.colB))
argmax t.groupby(t.colA).reduce(col_argmax=pw.reducers.argmax(t.colB))
argmin t.groupby(t.colA).reduce(col_argmin=pw.reducers.argmin(t.colB))
avg t.groupby(t.colA).reduce(col_avg=pw.reducers.avg(t.colB))
earliest t.groupby(t.colA).reduce(col_min=pw.reducers.earliest(t.colB))
latest t.groupby(t.colA).reduce(col_max=pw.reducers.latest(t.colB))
max t.groupby(t.colA).reduce(col_max=pw.reducers.max(t.colB))
min t.groupby(t.colA).reduce(col_min=pw.reducers.min(t.colB))
ndarray t.groupby(t.colA).reduce(col_array=pw.reducers.ndarray(t.colB))
sorted_tuple t.groupby(t.colA).reduce(col_tuple=pw.reducers.sorted_tuple(t.colB))
sum t.groupby(t.colA).reduce(col_sum=pw.reducers.sum(t.colB))
tuple t.groupby(t.colA).reduce(col_tuple=pw.reducers.tuple(t.colB))
unique t.groupby(t.colA).reduce(col_unique=pw.reducers.unique(t.colB))

You can also create your own stateful reducers.

Joins

You can use a join to combine columns from two different tables by associating rows from both tables which are matching on some given values:

t1.join(t2, pw.left.column == pw.right.column).select(...)

Read our tutorial about joins to learn more about how to do joins in Pathway.

Union and Concatenation

Description Operators Example
Union + or += t1 + t2
t1 += t2 modifiest1
Concatenation concat_reindex pw.Table.concat_reindex(t1, t2)

Updating cell values

You can update the cells of a table using the content of another table using the update_cells operator (<<):

  • t.update_cells(t_new)
  • t << t_new

Flattening a column

You can transform a column containing iterables or JSON arrays into multiple rows using the flatten operator:

t.flatten(t.col_to_flatten)

Column operations

Description Operators Example
Applying a function to each cell of a column. pw.apply in a select t.select(new_col=pw.apply(func, pw.this.col))
Folding columns into a single one. pw.make_tuple t.select(new_col=pw.make_tuple(t.a, t.b, t.c))

User-defined functions (UDF)

Pathway allows you to define your own User-defined functions. See our tutorial to learn more about it.