Introduction to transformer classes

A quick introduction to Pathway's transformer classes.

Pathway's transformer syntax allows you to express pipelines of transformations on entire (and ever-changing) data tables. In Pathway, transformers behave like functions, whose arguments are Pathway Tables. If you have used Spark SQL or Kafka Streams in the past, the syntax should feel familiar.

In addition to this, Pathway also natively supports transformers defined on data rows. This is achieved through an objected-oriented (ORM) view of rows in data. These are known as Transformer Classes.

Transformer Classes are used for easy implementation of data-structure querying operations, defining APIs in Data Products, and on-demand computations.

Transformer classes provide a way to achieve row-centric operations in Pathway where use of apply maps is not sufficient or not convenient. Using transformer classes is the easiest way do advanced computation, involving pointers between fields of tables.

Transformers 101: how to make a map

To create a transformer class is creating a class which is annotated by @pw.transformer. In that class, we can declare other classes: each class defines one input table and one output table.

First, we can access and manipulate the values of the input table by declaring the field existing in the table: val = pw.input_attribute(). Note that the variable val has to be named with the name of the targeted column of the input table.

We can then define output field by using the annotation @pw.output_attribute before a function: the name of the function will be the column name in the output column and the return value will be the value stored in that column.

As an example, let's consider the following transformer doing a map: the transformer takes a table which has a column named col_name as input and applies a given function f to each row and the output values are stored in a new column named col_name_output:

import pathway as pw@pw.transformerclass my_transformer:    class my_table(pw.ClassArg):        col_name=pw.input_attribute()        @pw.output_attribute        def col_name_output(self):            return f(self.col_name)

In this transformer, the class my_table takes one input table whose columns will be match to the parameters defined using pw.input_attribute() and will output a table whose columns are defined by functions annotated by @pw.output_attribute.

To test our transformer, we consider this toy table t:

    col_name
0   x
1   y
2   z

We apply the transformer to the table t, and we extract the resulting table stored in my_table:

t_map = my_transformer(my_table=t).my_table

We obtain the following table:

    col_name_output
0   f(x)
1   f(y)
2   f(z)

Why using transformers?

Now that we have seen the basis of transformer classes, it looks like a quite complicated way of doing a map, which can be done in one line by doing:

t_map = t.select(col_name_output=apply(f,t.col_name))

So one natural question one might ask is 'why using transformer classes?'.

It is true that when doing single row operations, using apply is the way to go. Transformer classes are made for more advanced operations, in particular operation involving different tables. While using apply is limited to row-centric operations, transformer classes are able to perform look-ups and recursive operations on the rows. Further more, inside the transformer class, we can easily access any table referenced by a class by doing self.transformer_name.table_name.

For instance, if you need to add the values of two different tables, things get more complicated with only standard operations. It is possible to make a join and then use apply, but it would result in copying the values in a new table before doing the sum. This does not scale well on large datasets. On the other hand, using a class transformer would allow to do it without having to create a new table. You can see how easy to use transformer classes to combine several tables at once.

Going further

Transformer classes are a key component of Pathway programming framework.

If you want to learn more about transformer classes, you can see our basic examples of transformer classes or our advanced example on how to make a tree using transformer classes.

You can also take a look at our connectors to see how to use different data sources to Pathway!

Olivier Ruas

Algorithm and Data Processing Magician