What is a Table in Pathway

The Table is the central data abstraction used in Pathway. It is used to represent the data and model the transformations done on it.

In Pathway, data is stored in Tables (pw.Table). The Table is a two-dimensional data structure, similar to a SQL table or a pandas' DataFrame. However, Tables in Pathway have a very different nature as they store potentially infinite and ever-changing streaming data. You can think of it as a "changing data frame": the number of rows and the content changes with time as new data comes into the system.

Summary of properties of Pathway Table

Main elements of a Table

Pathway Tables are composed of three key elements: rows, columns, and indexes (ids).

Rows represent individual records or observations, with each row containing a unique set of values across various attributes. Columns, on the other hand, define the distinct attributes or fields associated with the data, specifying the characteristics of each record. The intersection of a row and a column, known as a cell, holds a single data value. The structure of a Table, defined by its columns and their data types, is called a schema (pw.Schema).

The third vital component is the index (or id), which serves as a unique pointer to the associated row in the Table. Each row has its own unique index stored in the column id (id is a reserved column name). This index is crucial for referencing and accessing specific records within the Table. The indexes are either automatically attributed or based on primary keys. You can specify a set of columns as primary keys: the indexes will be computed based on those columns.

Table updating at reception of new data

Specificities of Tables

Tables in Pathway possess a distinctive dual nature: immutability and dynamic adaptability.

Tables are immutable: when operations such as select or other transformers are applied to the Table, rather than modifying the existing data, a new Table is generated. Immutability ensures that the original data remains unchanged.

Table updating at reception of new data

Simultaneously, Tables are designed to seamlessly handle the ever-changing nature of data streams by incorporating new rows dynamically. Whenever a new entry arrives, the Tables are updated by adding, removing, or updating the rows. The updates are propagated to all the Tables of the pipeline. The data has changed, not the Table: it still represents the same transformation over the same data stream.

Table updating at reception of new data

How to create a Table

You can create a Table either by using an input connector or by modifying an existing Table. If you want to connect to an external data source, you can use Pathway input connectors to store the input data in a Table. Otherwise, a Table can be derived from an already existing Table by using transformers. You can also create Tables with artificial data streams using the demo module.

Creating a Table in Pathway

How to access the data

Tables store dynamic data streams so accessing data in Pathway differs from static data frameworks. With Pathway, you first set up your pipeline, with input connectors and transformers to define the transformations you want to do on your data. At that point, the pipeline is built, but no data is ingested: your Tables are empty. You then use pw.run() to ingest the data: the input is dynamically ingested in the Tables.

The static nature of the usual "print" is not compatible with the dynamic nature of the Tables in Pathway. Of course, we could print a snapshot of the Table, but this would defeat the whole purpose of data stream processing: any update occurring after the snapshot would be lost.

You can output the Table, which represents a data stream, to external services using output connectors. Every change in the Table will be forwarded to the chosen service (Kafka, Postgres, etc.). Alternatively, you can visualize the Table "live" using the pw.Table.show operator (see our Jupyter example). Finally, you might want to test your pipeline with static data. In that case, you can print the data using pw.debug.compute_and_print which will compute the Table with the available data and print the results. You can learn more on how to use pw.debug.compute_and_print in our "Your first realtime app" article.

Operations on the data

Pathway supports most of the common basic operations on Tables such as mathematical and boolean operations, filters, or apply. You can learn more about those basic operations in the associated guide. These include standard join, groupby and windowby. Pathway also provides temporal join operations such as ASOF join or interval joins.