Playing with joins.

A brief explanation on how to perform joins with Pathway.

Join is one of the basic table operations provided in Pathway. A join operation combines columns from two different tables by associating rows from both tables wich are matching on some given values.

This guide presents several samples of code using the joins:

The examples demonstrate usual use-cases, explain the behavior of outer joins, and point out some peculiarities you may encounter while using Pathway. In particular, they show how to:

Prerequisites

Be sure to import Pathway.

import pathway as pw

Also, you need some tables to play with. For the sake of presentation, let us consider the following made up scenario: a group of four friends (table friends) goes to grab some breakfast.

friends = pw.debug.table_from_markdown(
    """
    | name  |budget
  1 | Alice | 13
  2 | Bob   | 10
  3 | Carol | 11
  4 | Dan   | 12
 """
)
menu = pw.debug.table_from_markdown(
    """
     | dish                              | price
  11 | pancakes_with_jam                 | 11
  12 | pb_jam_sandwich                   | 9
  13 | jam_pb_pancakes                   | 12
  14 | scrambled_egg                     | 11
 """
)

However, some of them have allergies (table allergies), and cannot eat everything. Luckily, the restaurant has a list of allergens contained in each dish (table allergens_in_menu).

allergies = pw.debug.table_from_markdown(
    """
     | person| allergy
  21 | Alice | milk
  22 | Bob   | eggs
  23 | Carol | peanuts
  24 | Carol | milk
 """
)

allergens_in_menu = pw.debug.table_from_markdown(
    """
     | dish                              | allergen
  31 | pancakes_with_jam                 | milk
  32 | pancakes_with_jam                 | eggs
  33 | pb_jam_sandwich                   | peanuts
  34 | jam_pb_pancakes                   | eggs
  35 | jam_pb_pancakes                   | peanuts
  36 | jam_pb_pancakes                   | milk
  37 | scrambled_egg                     | eggs
 """
)

Simple inner join

Syntax

Putting it simply, the syntax of join is:

table.join(other, *on)

where

  • table and other are tables to be joined,
  • *on is a list of conditions of form:
table.some_column == other.some_other_column

Remark: the order of tables in the condition matters. That is, a condition of the following form won't be accepted:

other.some_other_column == table.some_column

Examples

As a warm-up, let us see how to make simple join-queries. Let's begin with a simple query that finds all pairs person-dish that can put someone in a hospital.

bad_match = allergies.join(
    allergens_in_menu, allergies.allergy == allergens_in_menu.allergen
).select(allergens_in_menu.dish, allergies.person)
pw.debug.compute_and_print(bad_match)
[2024-03-27T22:39:32]:INFO:Preparing Pathway computation


            | dish              | person
^YCQD1V0... | jam_pb_pancakes   | Alice
^CJ3R9RZ... | jam_pb_pancakes   | Bob
^WB77SD3... | jam_pb_pancakes   | Carol
^7R8QXH2... | jam_pb_pancakes   | Carol
^RZMDRR3... | pancakes_with_jam | Alice
^YGEN02A... | pancakes_with_jam | Bob
^B8NPADA... | pancakes_with_jam | Carol
^1T2NMPV... | pb_jam_sandwich   | Carol
^86KQFD2... | scrambled_egg     | Bob

As you can see, the result is a table of life-threatening pairs.

The select function works here similarly as select on a table. The difference is that here, you can use columns of both tables as arguments, e.g.

def make_food_constraint_note(name, dish):
    return f"{name} can't eat {dish}."


bad_match_note = allergies.join(
    allergens_in_menu, allergies.allergy == allergens_in_menu.allergen
).select(
    note=pw.apply(make_food_constraint_note, allergies.person, allergens_in_menu.dish)
)
pw.debug.compute_and_print(bad_match_note)
[2024-03-27T22:39:32]:INFO:Preparing Pathway computation


            | note
^YCQD1V0... | Alice can't eat jam_pb_pancakes.
^RZMDRR3... | Alice can't eat pancakes_with_jam.
^CJ3R9RZ... | Bob can't eat jam_pb_pancakes.
^YGEN02A... | Bob can't eat pancakes_with_jam.
^86KQFD2... | Bob can't eat scrambled_egg.
^WB77SD3... | Carol can't eat jam_pb_pancakes.
^7R8QXH2... | Carol can't eat jam_pb_pancakes.
^B8NPADA... | Carol can't eat pancakes_with_jam.
^1T2NMPV... | Carol can't eat pb_jam_sandwich.

Remark: note that id is now some auto generated number, which is the usual behavior of join.

On self joins

In order to perform a self join (a join of table with itself), you need to create a copy with .copy.

same_allergies = (
    allergies.join(
        allergies_copy := allergies.copy(), allergies.allergy == allergies_copy.allergy
    )
    .select(
        l_name=allergies.person,
        r_name=allergies_copy.person,
    )
    .filter(pw.this.l_name != pw.this.r_name)
)

pw.debug.compute_and_print(same_allergies)
[2024-03-27T22:39:32]:INFO:Preparing Pathway computation


            | l_name | r_name
^EFRJX9T... | Alice  | Carol
^3W8REGH... | Carol  | Alice

Outer joins

The difference between outer joins and joins is that the outer join adds to the result table also entries that didn't match:

left.join_left(right, conditions)

includes entries from the left table and

left.join_right(right, conditions)

includes entries from the right table.

Simple example

To demonstrate outer-joins, you can consider a question about forbidden breakfast configurations (i.e. pairs of person-dish, such that a person is allergic to some ingredients).

people_allergies = friends.join_left(
    allergies, friends.name == allergies.person
).select(friends.name, allergies.allergy)
pw.debug.compute_and_print(people_allergies)

forbidden_breakfast = people_allergies.join_left(
    allergens_in_menu, people_allergies.allergy == allergens_in_menu.allergen
).select(people_allergies.name, allergens_in_menu.dish)
pw.debug.compute_and_print(forbidden_breakfast)
[2024-03-27T22:39:32]:INFO:Preparing Pathway computation


[2024-03-27T22:39:32]:INFO:Preparing Pathway computation


            | name  | allergy
^T1M8A5Y... | Alice | milk
^123RSFF... | Bob   | eggs
^Y11MSAF... | Carol | milk
^F8D89M9... | Carol | peanuts
^4H1M6P9... | Dan   |
            | name  | dish
^9Q7A6XZ... | Alice | jam_pb_pancakes
^EWC34CB... | Alice | pancakes_with_jam
^CZKDKGR... | Bob   | jam_pb_pancakes
^FY9JB7J... | Bob   | pancakes_with_jam
^9MP9DYZ... | Bob   | scrambled_egg
^MNZ6EKG... | Carol | jam_pb_pancakes
^B3FFJ7Z... | Carol | jam_pb_pancakes
^V9Z503V... | Carol | pancakes_with_jam
^FMSXWPA... | Carol | pb_jam_sandwich
^GCYSJHZ... | Dan   |

Now, as you can see, the table has duplicate rows. This is the intended behavior, as a particular person can be allergic to more than one ingredient. In other words, when you add a reason column to your result table, you can see that each row was included in the table above for a different reason.

forbidden_breakfast = people_allergies.join_left(
    allergens_in_menu, people_allergies.allergy == allergens_in_menu.allergen
).select(
    people_allergies.name, allergens_in_menu.dish, reason=allergens_in_menu.allergen
)
pw.debug.compute_and_print(forbidden_breakfast)
[2024-03-27T22:39:32]:INFO:Preparing Pathway computation


            | name  | dish              | reason
^9Q7A6XZ... | Alice | jam_pb_pancakes   | milk
^EWC34CB... | Alice | pancakes_with_jam | milk
^CZKDKGR... | Bob   | jam_pb_pancakes   | eggs
^FY9JB7J... | Bob   | pancakes_with_jam | eggs
^9MP9DYZ... | Bob   | scrambled_egg     | eggs
^B3FFJ7Z... | Carol | jam_pb_pancakes   | milk
^MNZ6EKG... | Carol | jam_pb_pancakes   | peanuts
^V9Z503V... | Carol | pancakes_with_jam | milk
^FMSXWPA... | Carol | pb_jam_sandwich   | peanuts
^GCYSJHZ... | Dan   |                   |

Removing duplicates

If you really want to have a table without the 'reason' column and without duplicates, you can achieve that with extra groupby and reduce :

forbidden_breakfast = people_allergies.join_left(
    allergens_in_menu, people_allergies.allergy == allergens_in_menu.allergen
).select(people_allergies.name, allergens_in_menu.dish)

forbidden_breakfast_no_duplicates = forbidden_breakfast.groupby(
    forbidden_breakfast.name, forbidden_breakfast.dish
).reduce(forbidden_breakfast.name, forbidden_breakfast.dish)

pw.debug.compute_and_print(forbidden_breakfast_no_duplicates)
[2024-03-27T22:39:32]:INFO:Preparing Pathway computation


            | name  | dish
^MNKENAP... | Alice | jam_pb_pancakes
^BVPHA0R... | Alice | pancakes_with_jam
^FZ5X7T0... | Bob   | jam_pb_pancakes
^HNX75TQ... | Bob   | pancakes_with_jam
^85X7DVH... | Bob   | scrambled_egg
^822Y9RB... | Carol | jam_pb_pancakes
^40QDNWZ... | Carol | pancakes_with_jam
^NY6Q5H9... | Carol | pb_jam_sandwich
^TER5BA5... | Dan   |

That simple application of groupby-reduce combination essentially selects a unique set of pairs (name, dish) from our table. More on the reduce-groupby operations can be found in the reduce-groupby manual.

Expressions for unmatched rows

Because join_left operation returns rows from the left table, even if there is no match in the right column, some input columns for functions might have no defined value.

The left join sets the undefined input cells to None and the function needs to explicitly define how to handle such cases.

As an example, you can consider a modified variant of the bad_match_note table. The modification is that you want to include all the people, and additionally you want to display their budget.

To that end, it is enough that you join the friends table with the bad_match table, as together they include all necessary information.

basic_customer_info = friends.join_left(
    bad_match, friends.name == bad_match.person
).select(
    friends.name,
    friends.budget,
    note=pw.apply(make_food_constraint_note, friends.name, bad_match.dish),
)
pw.debug.compute_and_print(basic_customer_info)
[2024-03-27T22:39:32]:INFO:Preparing Pathway computation


            | name  | budget | note
^P79PJ93... | Alice | 13     | Alice can't eat jam_pb_pancakes.
^Y04502Z... | Alice | 13     | Alice can't eat pancakes_with_jam.
^3VZ7DJ8... | Bob   | 10     | Bob can't eat jam_pb_pancakes.
^ZC2SM27... | Bob   | 10     | Bob can't eat pancakes_with_jam.
^ASR2KRJ... | Bob   | 10     | Bob can't eat scrambled_egg.
^W3CZADQ... | Carol | 11     | Carol can't eat jam_pb_pancakes.
^YPTWCF2... | Carol | 11     | Carol can't eat jam_pb_pancakes.
^6N7QW9P... | Carol | 11     | Carol can't eat pancakes_with_jam.
^0RTEM3K... | Carol | 11     | Carol can't eat pb_jam_sandwich.
^4H1M6P9... | Dan   | 12     | Dan can't eat None.

As you can see, the behavior of the original make_food_constraint_note generates a little bit of an odd entry for Dan. To fix that problem, you can redefine the make_food_constraint_note.

def make_food_constraint_note_none_tolerant(name, dish):
    if dish is None:
        return f"{name} has no food restrictions."
    else:
        return make_food_constraint_note(name, dish)


basic_customer_info = friends.join_left(
    bad_match, friends.name == bad_match.person
).select(
    friends.name,
    friends.budget,
    note=pw.apply(
        make_food_constraint_note_none_tolerant,
        friends.name,
        bad_match.dish,
    ),
)
pw.debug.compute_and_print(basic_customer_info)
[2024-03-27T22:39:32]:INFO:Preparing Pathway computation


            | name  | budget | note
^P79PJ93... | Alice | 13     | Alice can't eat jam_pb_pancakes.
^Y04502Z... | Alice | 13     | Alice can't eat pancakes_with_jam.
^3VZ7DJ8... | Bob   | 10     | Bob can't eat jam_pb_pancakes.
^ZC2SM27... | Bob   | 10     | Bob can't eat pancakes_with_jam.
^ASR2KRJ... | Bob   | 10     | Bob can't eat scrambled_egg.
^W3CZADQ... | Carol | 11     | Carol can't eat jam_pb_pancakes.
^YPTWCF2... | Carol | 11     | Carol can't eat jam_pb_pancakes.
^6N7QW9P... | Carol | 11     | Carol can't eat pancakes_with_jam.
^0RTEM3K... | Carol | 11     | Carol can't eat pb_jam_sandwich.
^4H1M6P9... | Dan   | 12     | Dan has no food restrictions.

To once again demonstrate the fact that the arguments for expressions are replaced with None, let us change friends.name to bad_match.person in the select part of our code.

basic_customer_info = friends.join_left(
    bad_match, friends.name == bad_match.person
).select(
    bad_match.person,
    friends.budget,
    note=pw.apply(
        make_food_constraint_note_none_tolerant, friends.name, bad_match.dish
    ),
)
pw.debug.compute_and_print(basic_customer_info)
[2024-03-27T22:39:32]:INFO:Preparing Pathway computation


            | person | budget | note
^4H1M6P9... |        | 12     | Dan has no food restrictions.
^P79PJ93... | Alice  | 13     | Alice can't eat jam_pb_pancakes.
^Y04502Z... | Alice  | 13     | Alice can't eat pancakes_with_jam.
^3VZ7DJ8... | Bob    | 10     | Bob can't eat jam_pb_pancakes.
^ZC2SM27... | Bob    | 10     | Bob can't eat pancakes_with_jam.
^ASR2KRJ... | Bob    | 10     | Bob can't eat scrambled_egg.
^W3CZADQ... | Carol  | 11     | Carol can't eat jam_pb_pancakes.
^YPTWCF2... | Carol  | 11     | Carol can't eat jam_pb_pancakes.
^6N7QW9P... | Carol  | 11     | Carol can't eat pancakes_with_jam.
^0RTEM3K... | Carol  | 11     | Carol can't eat pb_jam_sandwich.

Note that, the cell that earlier included Dan is empty, even though Dan is the only value that could be included in this column (if there would be a match). The reason is that the expression for this column simply shows the value from bad match.person; since this row was not matched, this value is undefined and replaced by None.

On right joins

In the examples above, you only relied on join_left. The join_right operation is quite similar in its behavior. Namely,

people_allergies = friends.join_left(
    allergies, friends.name == allergies.person
).select(friends.name, allergies.allergy)
pw.debug.compute_and_print(people_allergies)

# is almost (except for auto-generated IDs) equivalent to

people_allergies = allergies.join_right(
    friends, allergies.person == friends.name
).select(friends.name, allergies.allergy)
pw.debug.compute_and_print(people_allergies)
[2024-03-27T22:39:32]:INFO:Preparing Pathway computation


            | name  | allergy
^T1M8A5Y... | Alice | milk
^123RSFF... | Bob   | eggs
^Y11MSAF... | Carol | milk
^F8D89M9... | Carol | peanuts
^4H1M6P9... | Dan   |


[2024-03-27T22:39:32]:INFO:Preparing Pathway computation


            | name  | allergy
^8RDKJRG... | Alice | milk
^T59TAZH... | Bob   | eggs
^H8SWXJS... | Carol | milk
^EPTV118... | Carol | peanuts
^NXYMNMM... | Dan   |

When you join two tables, the only difference is in syntax - since allergies is the table on which you call join_right, it must be first argument in the join condition, i.e. it is

allergies.person == friends.name

as opposed to

friends.name == allergies.person

used in the join_left.

On full outer joins

The join_outer operation is a full outer join, which means that

left.join_outer(right, *on)

not only show the pairs of rows from left and right that meet the condition in *on, but also rows that didn't get matched with any other row, from both the left and right tables. To demonstrate this operation, let us introduce another group of friends and find out, for each person in a group, whether the other group has any people with the same name.

other_group = pw.debug.table_from_markdown(
    """
    | name  |budget
  5 | Bob   | 12
  6 | Carol | 14
  7 | Eve   | 12
 """
)
pw.debug.compute_and_print(
    friends.join_outer(other_group, friends.name == other_group.name).select(
        l_name=friends.name,
        l_id=friends.id,
        r_name=other_group.name,
        r_id=other_group.id,
    )
)
[2024-03-27T22:39:32]:INFO:Preparing Pathway computation


            | l_name | l_id        | r_name | r_id
^AT0PYY5... |        |             | Eve    | ^6A0QZMJ...
^Z8ADEY3... | Alice  | ^YYY4HAB... |        |
^J1AVR2S... | Bob    | ^Z3QWT29... | Bob    | ^3S2X6B2...
^8V184A9... | Carol  | ^3CZ78B4... | Carol  | ^A984WV0...
^4H1M6P9... | Dan    | ^3HN31E1... |        |

ID inheritance in join

Full (yet still informal) syntax of join is:

table.join(other, *on, id = None)

where

  • table and other are tables to be joined,
  • *on is a list of conditions of form:
table.some_column == other.some_other_column
  • optional id can be set to either table.id or other.id

Whenever the id argument is not none, join will try to use the column passed in the id argument as the new id in the result of join. This operation will succeed only when there is a guarantee that the resulting joined table has no multiple rows with the same id-to-inherit.

Below you can find three examples - one successful and two failed id inheritance. First, let us see what are the id-s of the original tables.

pw.debug.compute_and_print(friends)
pw.debug.compute_and_print(allergies)
[2024-03-27T22:39:32]:INFO:Preparing Pathway computation


[2024-03-27T22:39:32]:INFO:Preparing Pathway computation


            | name  | budget
^YYY4HAB... | Alice | 13
^Z3QWT29... | Bob   | 10
^3CZ78B4... | Carol | 11
^3HN31E1... | Dan   | 12
            | person | allergy
^MPFCCEZ... | Alice  | milk
^HTR084Q... | Bob    | eggs
^V9MWYA4... | Carol  | milk
^SJFASVQ... | Carol  | peanuts

Successful id inheritance

Let us try a join that inherits id-s from table allergies.

pw.debug.compute_and_print(
    allergies.join(friends, allergies.person == friends.name, id=allergies.id).select(
        friends.name, allergies.allergy, friends.budget
    )
)
[2024-03-27T22:39:32]:INFO:Preparing Pathway computation


            | name  | allergy | budget
^MPFCCEZ... | Alice | milk    | 13
^HTR084Q... | Bob   | eggs    | 10
^V9MWYA4... | Carol | milk    | 11
^SJFASVQ... | Carol | peanuts | 11

As you can see, the id column is the same as in the friends table, which is not the case when the id parameter is not set.

pw.debug.compute_and_print(
    allergies.join(friends, allergies.person == friends.name, id=allergies.id).select(
        friends.name, allergies.allergy, friends.budget
    )
)
[2024-03-27T22:39:33]:INFO:Preparing Pathway computation


            | name  | allergy | budget
^MPFCCEZ... | Alice | milk    | 13
^HTR084Q... | Bob   | eggs    | 10
^V9MWYA4... | Carol | milk    | 11
^SJFASVQ... | Carol | peanuts | 11

Failed id inheritance: duplicate id-s

The first possible problem with inheriting id is that one row of the source table could be matched with several entries of the other table. The code below will cause such a problem, and will raise a KeyError error.

pw.debug.compute_and_print(
    allergies.join(friends, allergies.person == friends.name, id=friends.id).select(
        friends.name, allergies.allergy
    )
)
[stacktrace...]
KeyError: 'duplicate key: ^SERVYWW6KDGEQ2WVZ3ZZB86VSR'

Failed id inheritance: empty id-s

Finally, if you consider outer joins between those two tables, you may encounter a situation in which you need to assign an id that is empty in the join result:

pw.debug.compute_and_print(
    allergies.join_right(friends, allergies.person == friends.name).select(
        friends.name, allergies.allergy, allergies_id=allergies.id
    )
)
[2024-03-27T22:39:33]:INFO:Preparing Pathway computation


            | name  | allergy | allergies_id
^8RDKJRG... | Alice | milk    | ^MPFCCEZ...
^T59TAZH... | Bob   | eggs    | ^HTR084Q...
^H8SWXJS... | Carol | milk    | ^V9MWYA4...
^EPTV118... | Carol | peanuts | ^SJFASVQ...
^NXYMNMM... | Dan   |         |

As you can see, the allergies_id field is not set for Dan's entry. If you try to use allergies.id as the id parameter, you will encounter a TypeError error.

pw.debug.compute_and_print(
    allergies.join_right(
        friends, allergies.person == friends.name, id=allergies.id
    ).select(friends.name, allergies.allergy, allergies_id=allergies.id)
)
[stacktrace...]
TypeError: type mismatch: expected a pointer, got None

Joins on a foreign key

In Pathway, the id column is auto-generated, and as such joining over a foreign key kept in some other table requires extra care. Let's assume that you have another table likes that indicates that a friend (row in friends) likes some particular dish (row in menu).

likes = pw.debug.table_from_markdown(
    """
    | f_id  | m_id
100 | 1     | 11
101 | 1     | 13
102 | 2     | 12
103 | 2     | 13
104 | 3     | 11
105 | 3     | 14
106 | 3     | 13
107 | 4     | 12
108 | 4     | 14
"""
)

Without further specification, Pathway treats columns f_id and m_id as numbers:

pw.debug.compute_and_print(likes)
[2024-03-27T22:39:33]:INFO:Preparing Pathway computation


            | f_id | m_id
^1BNYXQH... | 1    | 11
^6J3K9CB... | 1    | 13
^N3QENAM... | 2    | 12
^H4W0S32... | 2    | 13
^EQXAJMB... | 3    | 11
^XEGKSQC... | 3    | 13
^S57QPSB... | 3    | 14
^MVKXDSS... | 4    | 12
^VVA8C71... | 4    | 14

while the id of tables friends and menu was converted to Pointer.

pw.debug.compute_and_print(friends)
pw.debug.compute_and_print(menu)
[2024-03-27T22:39:33]:INFO:Preparing Pathway computation


[2024-03-27T22:39:33]:INFO:Preparing Pathway computation


            | name  | budget
^YYY4HAB... | Alice | 13
^Z3QWT29... | Bob   | 10
^3CZ78B4... | Carol | 11
^3HN31E1... | Dan   | 12
            | dish              | price
^3J2R55X... | jam_pb_pancakes   | 12
^03PYXDQ... | pancakes_with_jam | 11
^GFDBR0G... | pb_jam_sandwich   | 9
^XTPZRQ2... | scrambled_egg     | 11

To handle joins using those columns, you can use pointer_from function

likes += likes.select(
    f_id_ptr=friends.pointer_from(likes.f_id),
    m_id_ptr=menu.pointer_from(likes.m_id),
)
pw.debug.compute_and_print(likes)
[2024-03-27T22:39:33]:INFO:Preparing Pathway computation


            | f_id | m_id | f_id_ptr    | m_id_ptr
^1BNYXQH... | 1    | 11   | ^YYY4HAB... | ^03PYXDQ...
^6J3K9CB... | 1    | 13   | ^YYY4HAB... | ^3J2R55X...
^N3QENAM... | 2    | 12   | ^Z3QWT29... | ^GFDBR0G...
^H4W0S32... | 2    | 13   | ^Z3QWT29... | ^3J2R55X...
^EQXAJMB... | 3    | 11   | ^3CZ78B4... | ^03PYXDQ...
^XEGKSQC... | 3    | 13   | ^3CZ78B4... | ^3J2R55X...
^S57QPSB... | 3    | 14   | ^3CZ78B4... | ^XTPZRQ2...
^MVKXDSS... | 4    | 12   | ^3HN31E1... | ^GFDBR0G...
^VVA8C71... | 4    | 14   | ^3HN31E1... | ^XTPZRQ2...

An example of code joining friends with menu using likes is presented the next section on chaining joins.

Chaining joins:

Pathway provides two ways of chaining joins. The first relies on usage of pw.this, the second allows for slightly more compact code. Below, let's focus on chaining joins using pw.left.

Simple join chaining

Below, you will do chain joins using pw.left and pw.right. To show how it can be used, let's revisit the first example of join_left, in which you computed a join_left on a table that was obtained by another join_left. Instead of storing the result of the first join_left in people_allergies, you can use the following:

pw.debug.compute_and_print(
    friends.join_left(allergies, friends.name == allergies.person)
    .select(friends.name, allergies.allergy, friends.budget)
    .join_left(allergens_in_menu, pw.left.allergy == pw.right.allergen)
    .select(pw.this.name, allergens_in_menu.dish)
)
[2024-03-27T22:39:33]:INFO:Preparing Pathway computation


            | name  | dish
^9Q7A6XZ... | Alice | jam_pb_pancakes
^EWC34CB... | Alice | pancakes_with_jam
^CZKDKGR... | Bob   | jam_pb_pancakes
^FY9JB7J... | Bob   | pancakes_with_jam
^9MP9DYZ... | Bob   | scrambled_egg
^MNZ6EKG... | Carol | jam_pb_pancakes
^B3FFJ7Z... | Carol | jam_pb_pancakes
^V9Z503V... | Carol | pancakes_with_jam
^FMSXWPA... | Carol | pb_jam_sandwich
^GCYSJHZ... | Dan   |

Essentially, for a join left.join(right, *on), pw.left allows us to address the left table and pw.right allows us to address the right table. In this particular example, pw.left allows us to address the table computed by

  friends.join_left(allergies, friends.name == allergies.person
      ).select(friends.name, allergies.allergy)

without breaking the chain and storing intermediate results in a temporary variable.

More generally, given a chain:

table.join(...).select(...).join(...).select(...)...

pw.left can be used to address the result of the latest select.

While in the example above pw.right is essentially a replacement for allergens_in_menu, it can be also used to address a table that is passed as an argument of a join, but is not assigned to any variable.

To show pw.right in action, you can go back to our example showing basic consumer information, and compute it directly from tables friends, allergies, and allergens_in_menu.

basic_customer_info = friends.join_left(
    allergies.join(
        allergens_in_menu, allergies.allergy == allergens_in_menu.allergen
    ).select(
        allergens_in_menu.dish,
        allergies.person,
    ),
    friends.name == pw.right.person,
).select(
    friends.name,
    friends.budget,
    note=pw.apply(make_food_constraint_note_none_tolerant, friends.name, pw.right.dish),
)
pw.debug.compute_and_print(basic_customer_info)
[2024-03-27T22:39:33]:INFO:Preparing Pathway computation


            | name  | budget | note
^P79PJ93... | Alice | 13     | Alice can't eat jam_pb_pancakes.
^Y04502Z... | Alice | 13     | Alice can't eat pancakes_with_jam.
^3VZ7DJ8... | Bob   | 10     | Bob can't eat jam_pb_pancakes.
^ZC2SM27... | Bob   | 10     | Bob can't eat pancakes_with_jam.
^ASR2KRJ... | Bob   | 10     | Bob can't eat scrambled_egg.
^W3CZADQ... | Carol | 11     | Carol can't eat jam_pb_pancakes.
^YPTWCF2... | Carol | 11     | Carol can't eat jam_pb_pancakes.
^6N7QW9P... | Carol | 11     | Carol can't eat pancakes_with_jam.
^0RTEM3K... | Carol | 11     | Carol can't eat pb_jam_sandwich.
^4H1M6P9... | Dan   | 12     | Dan has no food restrictions.

Long chain example

To demonstrate longer chains in action, let's go back to the table likes you used to show how to handle (generate) foreign keys.

feasible_choice = (
    friends.join(likes, friends.id == likes.f_id_ptr)
    .select(friends.name, friends.budget, likes.m_id_ptr)
    .join(menu, pw.left.m_id_ptr == menu.id)
    .select(pw.left.name, pw.left.budget, menu.dish, menu.price)
    .join_left(allergies, pw.left.name == pw.right.person)
    .select(*pw.left[["name", "budget", "dish", "price"]], allergies.allergy)
    .join(allergens_in_menu, pw.left.dish == pw.right.dish)
    .select(
        *pw.left[["name", "budget", "dish", "price", "allergy"]],
        allergens_in_menu.allergen,
    )
    .filter((pw.this.price <= pw.this.budget) & (pw.this.allergy != pw.this.allergen))
)
pw.debug.compute_and_print(feasible_choice)
[2024-03-27T22:39:33]:INFO:Preparing Pathway computation


            | name  | budget | dish              | price | allergy | allergen
^1R0ZDNJ... | Alice | 13     | jam_pb_pancakes   | 12    | milk    | eggs
^MY9MQCC... | Alice | 13     | jam_pb_pancakes   | 12    | milk    | peanuts
^K15J5RM... | Alice | 13     | pancakes_with_jam | 11    | milk    | eggs
^YYYA8Q9... | Bob   | 10     | pb_jam_sandwich   | 9     | eggs    | peanuts
^NDGQ8H2... | Carol | 11     | pancakes_with_jam | 11    | milk    | eggs
^AE6BYJK... | Carol | 11     | pancakes_with_jam | 11    | peanuts | eggs
^28AHQ7Q... | Carol | 11     | pancakes_with_jam | 11    | peanuts | milk
^ZF95SEX... | Carol | 11     | scrambled_egg     | 11    | milk    | eggs
^N47771W... | Carol | 11     | scrambled_egg     | 11    | peanuts | eggs
^WF6BY33... | Dan   | 12     | pb_jam_sandwich   | 9     |         | peanuts
^H882QD0... | Dan   | 12     | scrambled_egg     | 11    |         | eggs

As you can see, this table contains all choices of person and dish, such that a person likes a particular dish, is not allergic to it, and can afford it. You can further simplify the result by adding another groupby-reduce at the end of the chain.

pw.debug.compute_and_print(
    feasible_choice.groupby(
        *pw.this[["name", "dish", "budget", "price"]],
    ).reduce(*pw.this[["name", "dish", "budget", "price"]])
)
[2024-03-27T22:39:33]:INFO:Preparing Pathway computation


            | name  | dish              | budget | price
^57TKD0K... | Alice | jam_pb_pancakes   | 13     | 12
^D5737FK... | Alice | pancakes_with_jam | 13     | 11
^ZR14WYB... | Bob   | pb_jam_sandwich   | 10     | 9
^B7MB26C... | Carol | pancakes_with_jam | 11     | 11
^8A89VGQ... | Carol | scrambled_egg     | 11     | 11
^B20HZPG... | Dan   | pb_jam_sandwich   | 12     | 9
^P1E97MY... | Dan   | scrambled_egg     | 12     | 11

Furthermore, one can make this piece of code more compact, using the * notation.

pw.debug.compute_and_print(
    friends.join(likes, friends.id == likes.f_id_ptr)
    .select(*friends, likes.m_id_ptr)
    .join(menu, pw.left.m_id_ptr == menu.id)
    .select(*pw.left.without("m_id_ptr"), *menu)
    .join_left(allergies, pw.left.name == pw.right.person)
    .select(*pw.left, allergies.allergy)
    .join(allergens_in_menu, pw.left.dish == pw.right.dish)
    .select(*pw.left, allergens_in_menu.allergen)
    .filter((pw.this.price <= pw.this.budget) & (pw.this.allergy != pw.this.allergen))
    .groupby(*pw.this)
    .reduce(*pw.this)
)
[2024-03-27T22:39:33]:INFO:Preparing Pathway computation


            | name  | budget | dish              | price | allergy | allergen
^GR3JTK7... | Alice | 13     | jam_pb_pancakes   | 12    | milk    | eggs
^560S7R7... | Alice | 13     | jam_pb_pancakes   | 12    | milk    | peanuts
^6AB73E4... | Alice | 13     | pancakes_with_jam | 11    | milk    | eggs
^SVK8374... | Bob   | 10     | pb_jam_sandwich   | 9     | eggs    | peanuts
^HZ6CV02... | Carol | 11     | pancakes_with_jam | 11    | milk    | eggs
^F7XY2H0... | Carol | 11     | pancakes_with_jam | 11    | peanuts | eggs
^XCB28V3... | Carol | 11     | pancakes_with_jam | 11    | peanuts | milk
^NVJZ7EM... | Carol | 11     | scrambled_egg     | 11    | milk    | eggs
^Q517DE3... | Carol | 11     | scrambled_egg     | 11    | peanuts | eggs
^SKZJ2PT... | Dan   | 12     | pb_jam_sandwich   | 9     |         | peanuts
^E5P13D5... | Dan   | 12     | scrambled_egg     | 11    |         | eggs