Run In Colab  View in Github

Getting started with Pathway

In the following, you can find instructions on how to start using Pathway.

How to install Pathway

You can download the current Pathway release, which is now available in Open Beta on a free-to-use license:

                    pip install 
                
                    
                        pip install
                        
                    
                

on a Python 3.10 installation, and we are ready to roll!

To use Pathway, we only need to import it:

import pathway as pw

How to connect to your first table

The first thing you need to do is to access the data you want to manipulate. Pathway provides many connectors to access your data and manipulate them.

As an example, let's load a table using a csv connector:

table_dogs = pw.debug.table_from_markdown(    """    | name  | age  1 | Ace   | 8  2 | Bella | 5  3 | Coco  | 13 """)

Now the table is loaded. But if we try to print it, we obtain a very generic output: Table['age', 'name']

That's perfectly normal: as explained in our introduction to programming in Pathway, Pathway is used to schedule the operations that will be later performed in realtime by the runtime engine. To process the actual data in our example, we need to use debug function called compute_and_print:

pw.debug.compute_and_print(table_dogs)
            | name  | age
^2TMTFGY... | Ace   | 8
^YHZBTNY... | Bella | 5
^SERVYWW... | Coco  | 13

Some basic operations using Pathway

Now that we have a table, we are going to do some basic operations on it. You can find the full list of the supported operations in our API documentation.

The first thing we may want, is to filter on the age and keep only the dogs younger than 10 years old. We can use the operator filter on the column 'age'. To access a column, we can either use the notation table_name.column_name or use the more generic table['column_name'].

table_dogs_young = table_dogs.filter(    table_dogs.age <= 10)  # table_dogs['age'] also workspw.debug.compute_and_print(table_dogs_young)
            | name  | age
^2TMTFGY... | Ace   | 8
^YHZBTNY... | Bella | 5

We can also apply a function to a given column. Let's say that we want to change the value of a column. Due to an error in rounding, all the age values are wrong and should be decreased by one. We can modify the table using the apply operation:

table_dogs_corrected = table_dogs.select(    table_dogs.name, age=pw.apply((lambda x: x - 1), table_dogs["age"]))pw.debug.compute_and_print(table_dogs_corrected)
            | name  | age
^2TMTFGY... | Ace   | 7
^YHZBTNY... | Bella | 4
^SERVYWW... | Coco  | 12

What happens here is that we select from table_dogs the column 'name' and a column 'age' which is obtained by the operator apply: pw.apply(f,col) applies f to each entry in col (there may be several such columns).

To do more complicated operations, we may need a second table:

table_dogs_owners = pw.debug.table_from_markdown(    """    | name  | owner  1 | Ace   | Alice  2 | Bella | Bob  3 | Coco  | Alice """)pw.debug.compute_and_print(table_dogs_owners)
            | name  | owner
^2TMTFGY... | Ace   | Alice
^YHZBTNY... | Bella | Bob
^SERVYWW... | Coco  | Alice

We can build a table with both information using the operator join:

table_dogs_full = table_dogs_corrected.join(    table_dogs_owners, table_dogs_corrected.name == table_dogs_owners.name).select(table_dogs_corrected.name, table_dogs_corrected.age, table_dogs_owners.owner)pw.debug.compute_and_print(table_dogs_full)
            | name  | age | owner
^VJ3K9DF... | Ace   | 7   | Alice
^V1RPZW8... | Bella | 4   | Bob
^R0GE4WM... | Coco  | 12  | Alice

To go further

As we continue you will see some more advanced programming constructs which provide a lot of flexibility to Pathway:

  • Applying Machine Learning to data tables.
  • The ability to do iteration and recursion.

We will also use Pathway connectors to external data sources (for data inputs) and sinks (for data outputs).

This, and a lot more, is covered in recipes in the Pathway cookbook - try these for a start: