Run In Colab  View in Github

Playing with joins.

A brief explanation on how to perform joins with Pathway.

Join is one of the basic table operation provided in Pathway. A join operation combines columns from two different tables by associating rows with matching some given values both tables.

This guide presents several samples of code using the joins:

The examples demonstrate usual use-cases, explain the behavior of outer joins, and point out some peculiarities you may encounter while using Pathway. In particular, they show how to:

Prerequisites

Be sure to import Pathway.

import pathway as pw

Also, we need some tables to play with. For the sake of presentation, let us consider a following made up scenario: a group of four friends (table friends) goes to grab some breakfast.

friends = pw.debug.table_from_markdown(    """    | name  |budget  1 | Alice | 13  2 | Bob   | 10  3 | Carol | 11  4 | Dan   | 12 """)menu = pw.debug.table_from_markdown(    """     | dish                              | price  11 | pancakes_with_jam                 | 11  12 | pb_jam_sandwich                   | 9  13 | jam_pb_pancakes                   | 12  14 | scrambled_egg                     | 11 """)

However, some of them have allergies (table allergies), and cannot eat everything. Luckily, the restaurant has a list of allergens contained in each dish (table allergens_in_menu).

allergies = pw.debug.table_from_markdown(    """     | person| allergy  21 | Alice | milk  22 | Bob   | eggs  23 | Carol | peanuts  24 | Carol | milk """)allergens_in_menu = pw.debug.table_from_markdown(    """     | dish                              | allergen  31 | pancakes_with_jam                 | milk  32 | pancakes_with_jam                 | eggs  33 | pb_jam_sandwich                   | peanuts  34 | jam_pb_pancakes                   | eggs  35 | jam_pb_pancakes                   | peanuts  36 | jam_pb_pancakes                   | milk  37 | scrambled_egg                     | eggs """)

Simple inner join

Syntax

Putting it simply, the syntax of join is:

table.join(other, *on)

where

  • table and other are tables to be joined,
  • *on is a list of conditions of form:
table.some_column == other.some_other_column

Remark: the order of tables in the condition matters. That is, a condition of the following form won't be accepted:

other.some_other_column == table.some_column

Examples

As a warm-up, let us see how to make a simple join-queries. Let's begin with simple query that find all pairs person-dish that can put someone in a hospital.

bad_match = allergies.join(    allergens_in_menu, allergies.allergy == allergens_in_menu.allergen).select(allergens_in_menu.dish, allergies.person)pw.debug.compute_and_print(bad_match)
            | dish              | person
^NXS1ZCC... | jam_pb_pancakes   | Alice
^TKSYTBX... | jam_pb_pancakes   | Bob
^C62RFES... | jam_pb_pancakes   | Carol
^BX7DMG3... | jam_pb_pancakes   | Carol
^B7RFVMK... | pancakes_with_jam | Alice
^51FP1WY... | pancakes_with_jam | Bob
^FHCFCY1... | pancakes_with_jam | Carol
^ZQ7DWQ8... | pb_jam_sandwich   | Carol
^0MRRXPF... | scrambled_egg     | Bob

As we can see, the result is a table of life-threatening pairs.

The select function works here similarly as select on a table. The difference is that here, we can use columns of both tables as arguments, e.g.

def make_food_constraint_note(name, dish):    return f"{name} can't eat {dish}."bad_match_note = allergies.join(    allergens_in_menu, allergies.allergy == allergens_in_menu.allergen).select(    note=pw.apply(make_food_constraint_note, allergies.person, allergens_in_menu.dish))pw.debug.compute_and_print(bad_match_note)
            | note
^NXS1ZCC... | Alice can't eat jam_pb_pancakes.
^B7RFVMK... | Alice can't eat pancakes_with_jam.
^TKSYTBX... | Bob can't eat jam_pb_pancakes.
^51FP1WY... | Bob can't eat pancakes_with_jam.
^0MRRXPF... | Bob can't eat scrambled_egg.
^C62RFES... | Carol can't eat jam_pb_pancakes.
^BX7DMG3... | Carol can't eat jam_pb_pancakes.
^FHCFCY1... | Carol can't eat pancakes_with_jam.
^ZQ7DWQ8... | Carol can't eat pb_jam_sandwich.

Remark: note that id is now some auto generated number, which is the usual behavior of join.

On self joins

In order to perform a self join (a join of table with itself), we need to create its copy.

same_allergies = (    allergies.join(        allergies_copy := allergies.copy(), allergies.allergy == allergies_copy.allergy    )    .select(        l_name=allergies.person,        r_name=allergies_copy.person,    )    .filter(pw.this.l_name != pw.this.r_name))pw.debug.compute_and_print(same_allergies)
            | l_name | r_name
^V26FYPD... | Alice  | Carol
^YTH8WNY... | Carol  | Alice

Outer joins

The difference between outer joins and joins is that the outer join adds to the result table also entries that didn't match:

left.left_join(right, conditions)

includes entries from the left table and

left.right_join(right, conditions)

includes entries from the right table.

Simple example

To demonstrate outer-joins, we can consider a question about forbidden breakfast configurations (i.e. pairs person-dish, such that person is allergic to some ingredients).

people_allergies = friends.left_join(    allergies, friends.name == allergies.person).select(friends.name, allergies.allergy)pw.debug.compute_and_print(people_allergies)forbidden_breakfast = people_allergies.left_join(    allergens_in_menu, people_allergies.allergy == allergens_in_menu.allergen).select(people_allergies.name, allergens_in_menu.dish)pw.debug.compute_and_print(forbidden_breakfast)
            | name  | allergy
^CEESHNP... | Alice | milk
^GKYNQB3... | Bob   | eggs
^MCCDMWQ... | Carol | milk
^28382DM... | Carol | peanuts
^QWN3RXE... | Dan   |
            | name  | dish
^Y4H9YJC... | Alice | jam_pb_pancakes
^MHQRNFF... | Alice | pancakes_with_jam
^4DQ4S80... | Bob   | jam_pb_pancakes
^XPGS5EQ... | Bob   | pancakes_with_jam
^63EGDX6... | Bob   | scrambled_egg
^YZ60CRJ... | Carol | jam_pb_pancakes
^K3K3E16... | Carol | jam_pb_pancakes
^M6FZ6K4... | Carol | pancakes_with_jam
^RN5X9SH... | Carol | pb_jam_sandwich
^Y317MGR... | Dan   |

Now, as we can see, the table has duplicate rows. This is intended behavior, as a particular person can be allergic to more than one ingredient. In other words, when we add a reason column to our result table, we can see that each row was included in the table above for a different reason.

forbidden_breakfast = people_allergies.left_join(    allergens_in_menu, people_allergies.allergy == allergens_in_menu.allergen).select(    people_allergies.name, allergens_in_menu.dish, reason=allergens_in_menu.allergen)pw.debug.compute_and_print(forbidden_breakfast)
            | name  | dish              | reason
^Y4H9YJC... | Alice | jam_pb_pancakes   | milk
^MHQRNFF... | Alice | pancakes_with_jam | milk
^4DQ4S80... | Bob   | jam_pb_pancakes   | eggs
^XPGS5EQ... | Bob   | pancakes_with_jam | eggs
^63EGDX6... | Bob   | scrambled_egg     | eggs
^YZ60CRJ... | Carol | jam_pb_pancakes   | milk
^K3K3E16... | Carol | jam_pb_pancakes   | peanuts
^M6FZ6K4... | Carol | pancakes_with_jam | milk
^RN5X9SH... | Carol | pb_jam_sandwich   | peanuts
^Y317MGR... | Dan   |                   |

Removing duplicates

If we really want to have a table without the 'reason' column and without duplicates, we can achieve that with extra groupby and reduce :

forbidden_breakfast = people_allergies.left_join(    allergens_in_menu, people_allergies.allergy == allergens_in_menu.allergen).select(people_allergies.name, allergens_in_menu.dish)forbidden_breakfast_no_duplicates = forbidden_breakfast.groupby(    forbidden_breakfast.name, forbidden_breakfast.dish).reduce(forbidden_breakfast.name, forbidden_breakfast.dish)pw.debug.compute_and_print(forbidden_breakfast_no_duplicates)
            | name  | dish
^M8E4DFK... | Alice | jam_pb_pancakes
^JMHBFKG... | Alice | pancakes_with_jam
^AN7YRTF... | Bob   | jam_pb_pancakes
^GZ0TEZ6... | Bob   | pancakes_with_jam
^W8AZSTG... | Bob   | scrambled_egg
^DBZV8YR... | Carol | jam_pb_pancakes
^N585E7A... | Carol | pancakes_with_jam
^9DJG3GD... | Carol | pb_jam_sandwich
^30JAJCE... | Dan   |

That simple application of groupby-reduce combination essentially selects unique set of pairs (name, dish) from our table. More on the reduce-groupby operations can be found in the reduce-groupby manual.

Expressions for unmatched rows

Because left_join operation returns rows from the left table, even if there is no match in the right column, some input columns for functions might have no defined value.

The left join sets the undefined input cells to None and the function needs to explicitly define how to handle such case.

As an example, we can consider a modified variant of the bad_match_note table. The modification is that we want to include all the people, and additionally we want to display their budget.

To that end, it is enough that we join the friends table with the bad_match table, as together they include all necessary information.

basic_customer_info = friends.left_join(    bad_match, friends.name == bad_match.person).select(    friends.name,    friends.budget,    note=pw.apply(make_food_constraint_note, friends.name, bad_match.dish),)pw.debug.compute_and_print(basic_customer_info)
            | name  | budget | note
^B74N75K... | Alice | 13     | Alice can't eat jam_pb_pancakes.
^YZBNRSA... | Alice | 13     | Alice can't eat pancakes_with_jam.
^Z8QGSEZ... | Bob   | 10     | Bob can't eat jam_pb_pancakes.
^CHHK1Q9... | Bob   | 10     | Bob can't eat pancakes_with_jam.
^DHWEFAB... | Bob   | 10     | Bob can't eat scrambled_egg.
^2SGPETY... | Carol | 11     | Carol can't eat jam_pb_pancakes.
^ADRQJVA... | Carol | 11     | Carol can't eat jam_pb_pancakes.
^JCXPCDF... | Carol | 11     | Carol can't eat pancakes_with_jam.
^H2S7YN7... | Carol | 11     | Carol can't eat pb_jam_sandwich.
^QWN3RXE... | Dan   | 12     | Dan can't eat None.

As we can see, the behavior of the original make_food_constraint_note generates a little bit odd entry for Dan. To fix that problem, we can redefine the make_food_constraint_note.

def make_food_constraint_note_none_tolerant(name, dish):    if dish is None:        return f"{name} has no food restrictions."    else:        return make_food_constraint_note(name, dish)basic_customer_info = friends.left_join(    bad_match, friends.name == bad_match.person).select(    friends.name,    friends.budget,    note=pw.apply(        make_food_constraint_note_none_tolerant,        friends.name,        bad_match.dish,    ),)pw.debug.compute_and_print(basic_customer_info)
            | name  | budget | note
^B74N75K... | Alice | 13     | Alice can't eat jam_pb_pancakes.
^YZBNRSA... | Alice | 13     | Alice can't eat pancakes_with_jam.
^Z8QGSEZ... | Bob   | 10     | Bob can't eat jam_pb_pancakes.
^CHHK1Q9... | Bob   | 10     | Bob can't eat pancakes_with_jam.
^DHWEFAB... | Bob   | 10     | Bob can't eat scrambled_egg.
^2SGPETY... | Carol | 11     | Carol can't eat jam_pb_pancakes.
^ADRQJVA... | Carol | 11     | Carol can't eat jam_pb_pancakes.
^JCXPCDF... | Carol | 11     | Carol can't eat pancakes_with_jam.
^H2S7YN7... | Carol | 11     | Carol can't eat pb_jam_sandwich.
^QWN3RXE... | Dan   | 12     | Dan has no food restrictions.

To once again demonstrate the fact that the arguments for expressions are replaced with None, let us change friends.name to bad_match.person in the select part of our code.

basic_customer_info = friends.left_join(    bad_match, friends.name == bad_match.person).select(    bad_match.person,    friends.budget,    note=pw.apply(        make_food_constraint_note_none_tolerant, friends.name, bad_match.dish    ),)pw.debug.compute_and_print(basic_customer_info)
            | person | budget | note
^QWN3RXE... |        | 12     | Dan has no food restrictions.
^B74N75K... | Alice  | 13     | Alice can't eat jam_pb_pancakes.
^YZBNRSA... | Alice  | 13     | Alice can't eat pancakes_with_jam.
^Z8QGSEZ... | Bob    | 10     | Bob can't eat jam_pb_pancakes.
^CHHK1Q9... | Bob    | 10     | Bob can't eat pancakes_with_jam.
^DHWEFAB... | Bob    | 10     | Bob can't eat scrambled_egg.
^2SGPETY... | Carol  | 11     | Carol can't eat jam_pb_pancakes.
^ADRQJVA... | Carol  | 11     | Carol can't eat jam_pb_pancakes.
^JCXPCDF... | Carol  | 11     | Carol can't eat pancakes_with_jam.
^H2S7YN7... | Carol  | 11     | Carol can't eat pb_jam_sandwich.

Note that, the cell that earlier included Dan is empty, even though Dan is the only value that could be included in this column (if there would be a match). The reason it that the expression for this column simply shows the value from bad match.person; since this row was not matched, this value is undefined and replaced by None.

On right joins

In the examples above, we only relied on left_join. The right_join operation is quite similar in its behavior. Namely,

people_allergies = friends.left_join(    allergies, friends.name == allergies.person).select(friends.name, allergies.allergy)pw.debug.compute_and_print(people_allergies)# is almost (except for auto-generated IDs) equivalent topeople_allergies = allergies.right_join(    friends, allergies.person == friends.name).select(friends.name, allergies.allergy)pw.debug.compute_and_print(people_allergies)
            | name  | allergy
^CEESHNP... | Alice | milk
^GKYNQB3... | Bob   | eggs
^MCCDMWQ... | Carol | milk
^28382DM... | Carol | peanuts
^QWN3RXE... | Dan   |
            | name  | allergy
^AHH9C80... | Alice | milk
^2BR8WRG... | Bob   | eggs
^Y5CAVAS... | Carol | milk
^WQENVG9... | Carol | peanuts
^F6RXSB7... | Dan   |

When we join two tables, the only difference is in syntax - since allergies is the table on which we call join_right, it must be first argument in the join condition, i.e. it is

allergies.person == friends.name

as opposed to

friends.name == allergies.person

used in the left_join.

On full outer joins

The outer_join operation is a full outer join, which means that

left.outer_join(right, *on)

not only show the pairs of rows form left and right that meet the condition in *on, but also rows that didn't get matched with any other row, from both left and right table. To demonstrate this operation, let us introduce another group of friends and find out, for each person in a group, whether the other group has any people with the same name.

other_group = pw.debug.table_from_markdown(    """    | name  |budget  5 | Bob   | 12  6 | Carol | 14  7 | Eve   | 12 """)
pw.debug.compute_and_print(    friends.outer_join(other_group, friends.name == other_group.name).select(        l_name=friends.name,        l_id=friends.id,        r_name=other_group.name,        r_id=other_group.id,    ))
            | l_name | l_id        | r_name | r_id
^2ZNCNBN... |        |             | Eve    | ^19D0FQ9...
^777VMKH... | Alice  | ^2TMTFGY... |        |
^1Y14E07... | Bob    | ^YHZBTNY... | Bob    | ^76QPWK3...
^7RXQ0Z1... | Carol  | ^SERVYWW... | Carol  | ^C4S6S48...
^QWN3RXE... | Dan    | ^8GR6BSX... |        |

ID inheritance in join

Full (yet still informal) syntax of join is:

table.join(other, *on, id = None)

where

  • table and other are tables to be joined,
  • *on is a list of conditions of form:
table.some_column == other.some_other_column
  • optional id can be set to either table.id or other.id

Whenever id argument is not none, join will try to use the column passed in id argument as new id in the result of join. This operation will succeed only when there is a guarantee that resulting joined table has no multiple rows with the same id-to-inherit.

Below you can find three examples - one successful and two failed id inheritance. First, let us see what are the id-s of the original tables.

pw.debug.compute_and_print(friends)pw.debug.compute_and_print(allergies)
            | name  | budget
^2TMTFGY... | Alice | 13
^YHZBTNY... | Bob   | 10
^SERVYWW... | Carol | 11
^8GR6BSX... | Dan   | 12
            | person | allergy
^SP3EQAR... | Alice  | milk
^H9SEXTP... | Bob    | eggs
^CWQ381T... | Carol  | milk
^9PXHKBD... | Carol  | peanuts

Successful id inheritance

Let us try a join that inherits id-s from table allergies.

pw.debug.compute_and_print(    allergies.join(friends, allergies.person == friends.name, id=allergies.id).select(        friends.name, allergies.allergy, friends.budget    ))
            | name  | allergy | budget
^SP3EQAR... | Alice | milk    | 13
^H9SEXTP... | Bob   | eggs    | 10
^CWQ381T... | Carol | milk    | 11
^9PXHKBD... | Carol | peanuts | 11

As we can see, the id column is the same as in the friends table, which was not the case when the id parameter is not set.

pw.debug.compute_and_print(    allergies.join(friends, allergies.person == friends.name, id=allergies.id).select(        friends.name, allergies.allergy, friends.budget    ))
            | name  | allergy | budget
^SP3EQAR... | Alice | milk    | 13
^H9SEXTP... | Bob   | eggs    | 10
^CWQ381T... | Carol | milk    | 11
^9PXHKBD... | Carol | peanuts | 11

Failed id inheritance: duplicate id-s

The first possible problem with inheriting id is that one row of the source table could be matched with several entries of the other table. The code below will cause such problem, and will raise a KeyError.

pw.debug.compute_and_print(    allergies.join(friends, allergies.person == friends.name, id=friends.id).select(        friends.name, allergies.allergy    ))[stacktrace...]KeyError: 'duplicate key: ^SERVYWW6KDGEQ2WVZ3ZZB86VSR'

Failed id inheritance: empty id-s

Finally, if we consider outer joins between those two tables, we may encounter a situation in which we need to assign id that is empty in the join result:

pw.debug.compute_and_print(    allergies.right_join(friends, allergies.person == friends.name).select(        friends.name, allergies.allergy, allergies_id=allergies.id    ))
            | name  | allergy | allergies_id
^AHH9C80... | Alice | milk    | ^SP3EQAR...
^2BR8WRG... | Bob   | eggs    | ^H9SEXTP...
^Y5CAVAS... | Carol | milk    | ^CWQ381T...
^WQENVG9... | Carol | peanuts | ^9PXHKBD...
^F6RXSB7... | Dan   |         |

As we can see, the allergies_id field is not set for Dan's entry. If we try to use allergies.id as the id parameter, we will encounter TypeError.

pw.debug.compute_and_print(    allergies.right_join(        friends, allergies.person == friends.name, id=allergies.id    ).select(friends.name, allergies.allergy, allergies_id=allergies.id))[stacktrace...]TypeError: type mismatch: expected a pointer, got None

Joins on a foreign key

In Pathway id column is auto-generated, and as such joining over foreign key kept in some other table requires extra care. Let's assume that we have another table likes that indicates that a friend (row in friends) likes some particular dish (row in menu).

likes = pw.debug.table_from_markdown(    """    | f_id  | m_id100 | 1     | 11101 | 1     | 13102 | 2     | 12103 | 2     | 13104 | 3     | 11105 | 3     | 14106 | 3     | 13107 | 4     | 12108 | 4     | 14""")

Without further specification, Pathway treats columns f_id and m_id as numbers:

pw.debug.compute_and_print(likes)
            | f_id | m_id
^M1T2QKJ... | 1    | 11
^4RH155M... | 1    | 13
^YQ30FHF... | 2    | 12
^AMFB2M6... | 2    | 13
^7J00RXZ... | 3    | 11
^NZSWGHA... | 3    | 13
^FCVQG87... | 3    | 14
^V771XE5... | 4    | 12
^0HAWE8Q... | 4    | 14

while the id of tables friends and menu was converted to Pointer.

pw.debug.compute_and_print(friends)pw.debug.compute_and_print(menu)
            | name  | budget
^2TMTFGY... | Alice | 13
^YHZBTNY... | Bob   | 10
^SERVYWW... | Carol | 11
^8GR6BSX... | Dan   | 12
            | dish              | price
^VYA37VV... | jam_pb_pancakes   | 12
^HMETZT8... | pancakes_with_jam | 11
^4B2REY1... | pb_jam_sandwich   | 9
^SE0WWPZ... | scrambled_egg     | 11

To handle joins using those columns, we can use pointer_from function

likes += likes.select(    f_id_ptr=friends.pointer_from(likes.f_id),    m_id_ptr=menu.pointer_from(likes.m_id),)pw.debug.compute_and_print(likes)
            | f_id | m_id | f_id_ptr    | m_id_ptr
^M1T2QKJ... | 1    | 11   | ^2TMTFGY... | ^HMETZT8...
^4RH155M... | 1    | 13   | ^2TMTFGY... | ^VYA37VV...
^YQ30FHF... | 2    | 12   | ^YHZBTNY... | ^4B2REY1...
^AMFB2M6... | 2    | 13   | ^YHZBTNY... | ^VYA37VV...
^7J00RXZ... | 3    | 11   | ^SERVYWW... | ^HMETZT8...
^NZSWGHA... | 3    | 13   | ^SERVYWW... | ^VYA37VV...
^FCVQG87... | 3    | 14   | ^SERVYWW... | ^SE0WWPZ...
^V771XE5... | 4    | 12   | ^8GR6BSX... | ^4B2REY1...
^0HAWE8Q... | 4    | 14   | ^8GR6BSX... | ^SE0WWPZ...

An example of code joining friends with menu using likes is presented the next section on chaining joins.

Chaining joins:

Pathway provides two ways of chaining joins. The first relies on usage of pw.this, the second allows for slightly more compact code. Below, we focus on chaining joins using pw.left.

Simple join chaining

Below, we show how to chain joins using pw.left and pw.right. To show how it can be used, we revisit the first example of left_join, in which we computed a left_join on a table that was obtained by another left_join. Instead of storing the result of the first left_join in people_allergies, we can use the following:

pw.debug.compute_and_print(    friends.left_join(allergies, friends.name == allergies.person)    .select(friends.name, allergies.allergy, friends.budget)    .left_join(allergens_in_menu, pw.left.allergy == pw.right.allergen)    .select(pw.this.name, allergens_in_menu.dish))
            | name  | dish
^Y4H9YJC... | Alice | jam_pb_pancakes
^MHQRNFF... | Alice | pancakes_with_jam
^4DQ4S80... | Bob   | jam_pb_pancakes
^XPGS5EQ... | Bob   | pancakes_with_jam
^63EGDX6... | Bob   | scrambled_egg
^YZ60CRJ... | Carol | jam_pb_pancakes
^K3K3E16... | Carol | jam_pb_pancakes
^M6FZ6K4... | Carol | pancakes_with_jam
^RN5X9SH... | Carol | pb_jam_sandwich
^Y317MGR... | Dan   |

Essentially, for a join left.join(right, *on), pw.left allows us to address the left table and pw.right allows us to address the right table. In this particular example, pw.left allows us to address the table computed by

  friends.left_join(allergies, friends.name == allergies.person
      ).select(friends.name, allergies.allergy)

without breaking the chain and storing intermediate result in a temporary variable.

More generally, given a chain:

table.join(...).select(...).join(...).select(...)...

pw.left can be used to address the result of the latest select.

While in the example above pw.right is essentially a replacement for allergens_in_menu, it can be also used to address a table that is passed as an argument of a join, but is not assigned to any variable.

To show pw.right in action, we can go back to our example showing basic consumer information, and compute it directly from tables friends, allergies, and allergens_in_menu.

basic_customer_info = friends.left_join(    allergies.join(        allergens_in_menu, allergies.allergy == allergens_in_menu.allergen    ).select(        allergens_in_menu.dish,        allergies.person,    ),    friends.name == pw.right.person,).select(    friends.name,    friends.budget,    note=pw.apply(make_food_constraint_note_none_tolerant, friends.name, pw.right.dish),)pw.debug.compute_and_print(basic_customer_info)
            | name  | budget | note
^B74N75K... | Alice | 13     | Alice can't eat jam_pb_pancakes.
^YZBNRSA... | Alice | 13     | Alice can't eat pancakes_with_jam.
^Z8QGSEZ... | Bob   | 10     | Bob can't eat jam_pb_pancakes.
^CHHK1Q9... | Bob   | 10     | Bob can't eat pancakes_with_jam.
^DHWEFAB... | Bob   | 10     | Bob can't eat scrambled_egg.
^2SGPETY... | Carol | 11     | Carol can't eat jam_pb_pancakes.
^ADRQJVA... | Carol | 11     | Carol can't eat jam_pb_pancakes.
^JCXPCDF... | Carol | 11     | Carol can't eat pancakes_with_jam.
^H2S7YN7... | Carol | 11     | Carol can't eat pb_jam_sandwich.
^QWN3RXE... | Dan   | 12     | Dan has no food restrictions.

Long chain example

To demonstrate longer chains in action, we go back to the table likes we used to show how to handle (generate) foreign keys.

feasible_choice = (    friends.join(likes, friends.id == likes.f_id_ptr)    .select(friends.name, friends.budget, likes.m_id_ptr)    .join(menu, pw.left.m_id_ptr == menu.id)    .select(pw.left.name, pw.left.budget, menu.dish, menu.price)    .left_join(allergies, pw.left.name == pw.right.person)    .select(        pw.left.name, pw.left.budget, pw.left.dish, pw.left.price, allergies.allergy    )    .join(allergens_in_menu, pw.left.dish == pw.right.dish)    .select(        pw.left.name,        pw.left.budget,        pw.left.dish,        pw.left.price,        pw.left.allergy,        allergens_in_menu.allergen,    )    .filter((pw.this.price <= pw.this.budget) & (pw.this.allergy != pw.this.allergen)))pw.debug.compute_and_print(feasible_choice)
            | name  | budget | dish              | price | allergy | allergen
^C7E41EW... | Alice | 13     | jam_pb_pancakes   | 12    | milk    | eggs
^3C8BYCT... | Alice | 13     | jam_pb_pancakes   | 12    | milk    | peanuts
^V49K9AY... | Alice | 13     | pancakes_with_jam | 11    | milk    | eggs
^5HD9M6D... | Bob   | 10     | pb_jam_sandwich   | 9     | eggs    | peanuts
^3F4G2QJ... | Carol | 11     | pancakes_with_jam | 11    | milk    | eggs
^Q416RZ3... | Carol | 11     | pancakes_with_jam | 11    | peanuts | eggs
^05R585H... | Carol | 11     | pancakes_with_jam | 11    | peanuts | milk
^ZP22G15... | Carol | 11     | scrambled_egg     | 11    | milk    | eggs
^Z3CSET2... | Carol | 11     | scrambled_egg     | 11    | peanuts | eggs
^9RVY4QH... | Dan   | 12     | pb_jam_sandwich   | 9     |         | peanuts
^V50MGNK... | Dan   | 12     | scrambled_egg     | 11    |         | eggs

As we can see, this table contains all choices of person and dish, such that person likes particular dish, is not allergic to it, and can afford it. We can further simplify the result by adding another groupby-reduce at the end of the chain.

pw.debug.compute_and_print(    feasible_choice.groupby(        pw.this.name, pw.this.dish, pw.this.budget, pw.this.price    ).reduce(pw.this.name, pw.this.dish, pw.this.budget, pw.this.price))
            | name  | dish              | budget | price
^XJV0C2H... | Alice | jam_pb_pancakes   | 13     | 12
^B9PDF7J... | Alice | pancakes_with_jam | 13     | 11
^86P08KH... | Bob   | pb_jam_sandwich   | 10     | 9
^FS0VA1H... | Carol | pancakes_with_jam | 11     | 11
^SFK6S54... | Carol | scrambled_egg     | 11     | 11
^AN8SK6X... | Dan   | pb_jam_sandwich   | 12     | 9
^G13RSRH... | Dan   | scrambled_egg     | 12     | 11

Furthermore, one can make this piece of code more compact, using the * notation).

pw.debug.compute_and_print(    friends.join(likes, friends.id == likes.f_id_ptr)    .select(*friends, likes.m_id_ptr)    .join(menu, pw.left.m_id_ptr == menu.id)    .select(*pw.left, *menu)    .without(pw.this.m_id_ptr)    .left_join(allergies, pw.left.name == pw.right.person)    .select(*pw.left, allergies.allergy)    .join(allergens_in_menu, pw.left.dish == pw.right.dish)    .select(*pw.left, allergens_in_menu.allergen)    .filter((pw.this.price <= pw.this.budget) & (pw.this.allergy != pw.this.allergen))    .select(*pw.this)    .groupby(*pw.this)    .reduce(*pw.this))
            | name  | budget | dish              | price | allergy | allergen
^9CRN3ZD... | Alice | 13     | jam_pb_pancakes   | 12    | milk    | eggs
^7QMJ3B9... | Alice | 13     | jam_pb_pancakes   | 12    | milk    | peanuts
^GQW4NHH... | Alice | 13     | pancakes_with_jam | 11    | milk    | eggs
^W4B3N70... | Bob   | 10     | pb_jam_sandwich   | 9     | eggs    | peanuts
^Q5RKM31... | Carol | 11     | pancakes_with_jam | 11    | milk    | eggs
^20S1PGP... | Carol | 11     | pancakes_with_jam | 11    | peanuts | eggs
^KX97P8Q... | Carol | 11     | pancakes_with_jam | 11    | peanuts | milk
^0DHKVMF... | Carol | 11     | scrambled_egg     | 11    | milk    | eggs
^T0MTJM0... | Carol | 11     | scrambled_egg     | 11    | peanuts | eggs
^YZQNPFW... | Dan   | 12     | pb_jam_sandwich   | 9     |         | peanuts
^CND5GJ8... | Dan   | 12     | scrambled_egg     | 11    |         | eggs

Krzysztof Nowicki

Datastore Research Engineer