Join API

Contains reference for helper classes related to joins.

class pw.JoinMode(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

class pw.FilteredJoinResult(join_result, filtering)

filter(filter_expression)

Filters rows, keeping the ones satisfying the predicate.

Example:

import pathway as pw
t1 = pw.debug.parse_to_table('''
   age  owner  pet
1   10  Alice    1
2    9    Bob    1
3    8  Alice    2
''')
t2 = pw.debug.parse_to_table('''
    age  owner  pet size
11   10  Alice    3    M
12    9    Bob    1    L
13    8    Tom    1   XL
''')
result = t1.join(t2).filter(t1.owner == t2.owner).select(t1.age, t2.size)   # noqa: E501
pw.debug.compute_and_print(result, include_id=False)
age | size
8   | M
9   | L
10  | M

groupby(*args, id=None)

Groups join result by columns from args.

NOTE: Usually followed by .reduce() that aggregates the result and returns a table.

  • Parameters
    • args (ColumnReference) – columns to group by.
    • id (Optional[ColumnReference]) – if provided, is the column used to set id’s of the rows of the result
  • Returns
    GroupedJoinResult – Groupby object.

Example:

import pathway as pw
t1 = pw.debug.parse_to_table('''
   cost  owner  pet
1   100  Alice    1
2    90    Bob    1
3    80  Alice    2
''')
t2 = pw.debug.parse_to_table('''
    cost  owner  pet size
11   100  Alice    3    M
12    90    Bob    1    L
13    80    Tom    1   XL
''')
result = (t1.join(t2, t1.owner==t2.owner).groupby(pw.this.owner)
    .reduce(pw.this.owner, pairs = pw.reducers.count()))
pw.debug.compute_and_print(result, include_id=False)
owner | pairs
Alice | 2
Bob   | 1

join(other, *on, id=None, how=JoinMode.INNER)

Join self with other using the given join expression.
  • Parameters
    • other (Joinable) – the right side of the join.
    • on (ColumnExpression) – a list of column expressions. Each must have == as the top level operation and be of the form LHS: ColumnReference == RHS: ColumnReference.
    • id (Optional[ColumnReference]) – optional argument for id of result, can be only self.id or other.id
    • how (JoinMode) – by default, inner join is performed. Possible values are JoinMode.{INNER,LEFT,RIGHT,OUTER} correspond to inner, left, right and outer join respectively.
  • Returns
    JoinResult – an object on which .select() may be called to extract relevant columns from the result of the join.

Example:

import pathway as pw
t1 = pw.debug.parse_to_table('''
age  | owner  | pet
 10  | Alice  | 1
  9  | Bob    | 1
  8  | Alice  | 2
''')
t2 = pw.debug.parse_to_table('''
age  | owner  | pet | size
 10  | Alice  | 3   | M
 9   | Bob    | 1   | L
 8   | Tom    | 1   | XL
''')
t3 = t1.join(
    t2, t1.pet == t2.pet, t1.owner == t2.owner, how=pw.JoinMode.INNER
).select(age=t1.age, owner_name=t2.owner, size=t2.size)
pw.debug.compute_and_print(t3, include_id = False)
age | owner_name | size
9   | Bob        | L

join_inner(other, *on, id=None)

Inner-joins two tables or join results.
  • Parameters
    • other (Joinable) – the right side of the join.
    • on (ColumnExpression) – a list of column expressions. Each must have == as the top level operation and be of the form LHS: ColumnReference == RHS: ColumnReference.
    • id (Optional[ColumnReference]) – optional argument for id of result, can be only self.id or other.id
  • Returns
    JoinResult – an object on which .select() may be called to extract relevant columns from the result of the join.

Example:

import pathway as pw
t1 = pw.debug.parse_to_table('''
age  | owner  | pet
 10  | Alice  | 1
  9  | Bob    | 1
  8  | Alice  | 2
''')
t2 = pw.debug.parse_to_table('''
age  | owner  | pet | size
 10  | Alice  | 3   | M
 9   | Bob    | 1   | L
 8   | Tom    | 1   | XL
''')
t3 = t1.join(t2, t1.pet == t2.pet, t1.owner == t2.owner, how=pw.JoinMode.INNER).select(age=t1.age, owner_name=t2.owner, size=t2.size)  # noqa: E501
pw.debug.compute_and_print(t3, include_id = False)
age | owner_name | size
9   | Bob        | L

join_left(other, *on, id=None)

Left-joins two tables or join results.

Remarks: args cannot contain id column from either of tables, as the result table has id column with auto-generated ids; it can be selected by assigning it to a column with defined name (passed in kwargs)

Behavior:

  • for rows from the left side that were not matched with the right side, missing values on the right are replaced with None
  • rows from the right side that were not matched with the left side are skipped
  • for rows that were matched the behavior is the same as that of an inner join.
  • Returns
    JoinResult – an object on which .select() may be called to extract relevant columns from the result of the join.

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown(
    '''
        | A  | B
      1 | 11 | 111
      2 | 12 | 112
      3 | 13 | 113
      4 | 13 | 114
    '''
)
t2 = pw.debug.table_from_markdown(
    '''
        | C  | D
      1 | 11 | 211
      2 | 12 | 212
      3 | 14 | 213
      4 | 14 | 214
    '''
)
pw.debug.compute_and_print(t1.join_left(t2, t1.A == t2.C
).select(t1.A, t2_C=t2.C, S=pw.require(t1.B + t2.D,t2.id)),
include_id=False)
A  | t2_C | S
11 | 11   | 322
12 | 12   | 324
13 |      |
13 |      |

join_outer(other, *on, id=None)

Outer-joins two tables or join results.

Remarks: args cannot contain id column from either of tables, as the result table has id column with auto-generated ids; it can be selected by assigning it to a column with defined name (passed in kwargs)

Behavior:

  • for rows from the left side that were not matched with the right side, missing values on the right are replaced with None
  • for rows from the right side that were not matched with the left side, missing values on the left are replaced with None
  • for rows that were matched the behavior is the same as that of an inner join.
  • Returns
    JoinResult – an object on which .select() may be called to extract relevant columns from the result of the join.

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown(
    '''
        | A  | B
      1 | 11 | 111
      2 | 12 | 112
      3 | 13 | 113
      4 | 13 | 114
    '''
)
t2 = pw.debug.table_from_markdown(
    '''
        | C  | D
      1 | 11 | 211
      2 | 12 | 212
      3 | 14 | 213
      4 | 14 | 214
    '''
)
pw.debug.compute_and_print(t1.join_outer(t2, t1.A == t2.C
).select(t1.A, t2_C=t2.C, S=pw.require(t1.B + t2.D,t1.id,t2.id)),
include_id=False)
A  | t2_C | S
   | 14   |
   | 14   |
11 | 11   | 322
12 | 12   | 324
13 |      |
13 |      |

join_right(other, *on, id=None)

Outer-joins two tables or join results.

Remarks: args cannot contain id column from either of tables, as the result table has id column with auto-generated ids; it can be selected by assigning it to a column with defined name (passed in kwargs)

Behavior:

  • rows from the left side that were not matched with the right side are skipped
  • for rows from the right side that were not matched with the left side, missing values on the left are replaced with None
  • for rows that were matched the behavior is the same as that of an inner join.
  • Returns
    JoinResult – an object on which .select() may be called to extract relevant columns from the result of the join.

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown(
    '''
        | A  | B
      1 | 11 | 111
      2 | 12 | 112
      3 | 13 | 113
      4 | 13 | 114
    '''
)
t2 = pw.debug.table_from_markdown(
    '''
        | C  | D
      1 | 11 | 211
      2 | 12 | 212
      3 | 14 | 213
      4 | 14 | 214
    '''
)
pw.debug.compute_and_print(t1.join_right(t2, t1.A == t2.C
).select(t1.A, t2_C=t2.C, S=pw.require(pw.coalesce(t1.B,0) + t2.D,t1.id)),
include_id=False)
A  | t2_C | S
   | 14   |
   | 14   |
11 | 11   | 322
12 | 12   | 324
  • Returns
    OuterJoinResult object

promise_universe_is_equal_to(other)

Asserts to Pathway that an universe of self is a subset of universe of each of the others.

Semantics: Used in situations where Pathway cannot deduce one universe being a subset of another.

  • Returns
    None

NOTE: The assertion works in place.

Example:

import pathway as pw
t1 = pw.debug.parse_to_table('''
  | pet
1 | Dog
7 | Cat
''')
t2 = pw.debug.parse_to_table('''
  | age
1 | 10
7 | 3
''')
t1 = t1.promise_universe_is_equal_to(t2)
t3 = t1 + t2
pw.debug.compute_and_print(t3, include_id=False)
pet | age
Cat | 3
Dog | 10

promise_universe_is_subset_of(other)

Asserts to Pathway that an universe of self is a subset of universe of each of the other.

Semantics: Used in situations where Pathway cannot deduce one universe being a subset of another.

  • Returns
    self

NOTE: The assertion works in place.

Example:

import pathway as pw
t1 = pw.debug.parse_to_table('''
  | age | owner | pet
1 | 10  | Alice | 1
2 | 9   | Bob   | 1
3 | 8   | Alice | 2
''')
t2 = pw.debug.parse_to_table('''
  | age | owner | pet
1 | 10  | Alice | 30
''').promise_universe_is_subset_of(t1)
t3 = t1 << t2
pw.debug.compute_and_print(t3, include_id=False)
age | owner | pet
8   | Alice | 2
9   | Bob   | 1
10  | Alice | 30

promise_universes_are_disjoint(other)

Asserts to Pathway that an universe of self is disjoint from universe of other.

Semantics: Used in situations where Pathway cannot deduce universes are disjoint.

  • Returns
    self

NOTE: The assertion works in place.

Example:

import pathway as pw
t1 = pw.debug.parse_to_table('''
  | age | owner | pet
1 | 10  | Alice | 1
2 | 9   | Bob   | 1
3 | 8   | Alice | 2
''')
t2 = pw.debug.parse_to_table('''
   | age | owner | pet
11 | 11  | Alice | 30
12 | 12  | Tom   | 40
''').promise_universes_are_disjoint(t1)
t3 = t1.concat(t2)
pw.debug.compute_and_print(t3, include_id=False)
age | owner | pet
8   | Alice | 2
9   | Bob   | 1
10  | Alice | 1
11  | Alice | 30
12  | Tom   | 40

reduce(*args, **kwargs)

Reduce a join result to a single row.

Equivalent to self.groupby().reduce(*args, **kwargs).

  • Parameters
    • args (ColumnReference) – reducer to reduce the table with
    • kwargs (ColumnExpression) – reducer to reduce the table with. Its key is the new name of a column.
  • Returns
    Table – Reduced table.

Example:

import pathway as pw
t1 = pw.debug.parse_to_table('''
   cost  owner  pet
1   100  Alice    1
2    90    Bob    1
3    80  Alice    2
''')
t2 = pw.debug.parse_to_table('''
    cost  owner  pet size
11   100  Alice    3    M
12    90    Bob    1    L
13    80    Tom    1   XL
''')
result = t1.join(t2, t1.owner==t2.owner).reduce(total_pairs = pw.reducers.count())
pw.debug.compute_and_print(result, include_id=False)
total_pairs
3

select(*args, **kwargs)

Computes result of a join.
  • Parameters
    • args (ColumnReference) – Column references.
    • kwargs (Any) – Column expressions with their new assigned names.
  • Returns
    Table – Created table.

Example:

import pathway as pw
t1 = pw.debug.parse_to_table('''
age  | owner  | pet
 10  | Alice  | 1
  9  | Bob    | 1
  8  | Alice  | 2
''')
t2 = pw.debug.parse_to_table('''
age  | owner  | pet | size
 10  | Alice  | 3   | M
 9   | Bob    | 1   | L
 8   | Tom    | 1   | XL
''')
t3 = t1.join(t2, t1.pet == t2.pet, t1.owner == t2.owner).select(age=t1.age, owner_name=t2.owner, size=t2.size)   # noqa: E501
pw.debug.compute_and_print(t3, include_id=False)
age | owner_name | size
9   | Bob        | L

class pw.JoinResult(_universe, _context, _left_table, _right_table, _original_left, _original_right, _substitution, _chained_join_desugaring, _joined_on_names, _join_mode)

Result of a join between tables.

Example:

import pathway as pw
t1 = pw.debug.parse_to_table('''
   age  owner  pet
1   10  Alice    1
2    9    Bob    1
3    8  Alice    2
''')
t2 = pw.debug.parse_to_table('''
    age  owner  pet size
11   10  Alice    3    M
12    9    Bob    1    L
13    8    Tom    1   XL
''')
joinresult= t1.join(t2, t1.pet == t2.pet, t1.owner == t2.owner)   # noqa: E501
isinstance(joinresult, pw.JoinResult)
True
pw.debug.compute_and_print(joinresult.select(t1.age, t2.size), include_id=False)
age | size
9   | L

filter(filter_expression)

Filters rows, keeping the ones satisfying the predicate.

Example:

import pathway as pw
t1 = pw.debug.parse_to_table('''
   age  owner  pet
1   10  Alice    1
2    9    Bob    1
3    8  Alice    2
''')
t2 = pw.debug.parse_to_table('''
    age  owner  pet size
11   10  Alice    3    M
12    9    Bob    1    L
13    8    Tom    1   XL
''')
result = t1.join(t2).filter(t1.owner == t2.owner).select(t1.age, t2.size)   # noqa: E501
pw.debug.compute_and_print(result, include_id=False)
age | size
8   | M
9   | L
10  | M

groupby(*args, id=None)

Groups join result by columns from args.

NOTE: Usually followed by .reduce() that aggregates the result and returns a table.

  • Parameters
    • args (ColumnReference) – columns to group by.
    • id (Optional[ColumnReference]) – if provided, is the column used to set id’s of the rows of the result
  • Returns
    GroupedJoinResult – Groupby object.

Example:

import pathway as pw
t1 = pw.debug.parse_to_table('''
   cost  owner  pet
1   100  Alice    1
2    90    Bob    1
3    80  Alice    2
''')
t2 = pw.debug.parse_to_table('''
    cost  owner  pet size
11   100  Alice    3    M
12    90    Bob    1    L
13    80    Tom    1   XL
''')
result = (t1.join(t2, t1.owner==t2.owner).groupby(pw.this.owner)
    .reduce(pw.this.owner, pairs = pw.reducers.count()))
pw.debug.compute_and_print(result, include_id=False)
owner | pairs
Alice | 2
Bob   | 1

join(other, *on, id=None, how=JoinMode.INNER)

Join self with other using the given join expression.
  • Parameters
    • other (Joinable) – the right side of the join.
    • on (ColumnExpression) – a list of column expressions. Each must have == as the top level operation and be of the form LHS: ColumnReference == RHS: ColumnReference.
    • id (Optional[ColumnReference]) – optional argument for id of result, can be only self.id or other.id
    • how (JoinMode) – by default, inner join is performed. Possible values are JoinMode.{INNER,LEFT,RIGHT,OUTER} correspond to inner, left, right and outer join respectively.
  • Returns
    JoinResult – an object on which .select() may be called to extract relevant columns from the result of the join.

Example:

import pathway as pw
t1 = pw.debug.parse_to_table('''
age  | owner  | pet
 10  | Alice  | 1
  9  | Bob    | 1
  8  | Alice  | 2
''')
t2 = pw.debug.parse_to_table('''
age  | owner  | pet | size
 10  | Alice  | 3   | M
 9   | Bob    | 1   | L
 8   | Tom    | 1   | XL
''')
t3 = t1.join(
    t2, t1.pet == t2.pet, t1.owner == t2.owner, how=pw.JoinMode.INNER
).select(age=t1.age, owner_name=t2.owner, size=t2.size)
pw.debug.compute_and_print(t3, include_id = False)
age | owner_name | size
9   | Bob        | L

join_inner(other, *on, id=None)

Inner-joins two tables or join results.
  • Parameters
    • other (Joinable) – the right side of the join.
    • on (ColumnExpression) – a list of column expressions. Each must have == as the top level operation and be of the form LHS: ColumnReference == RHS: ColumnReference.
    • id (Optional[ColumnReference]) – optional argument for id of result, can be only self.id or other.id
  • Returns
    JoinResult – an object on which .select() may be called to extract relevant columns from the result of the join.

Example:

import pathway as pw
t1 = pw.debug.parse_to_table('''
age  | owner  | pet
 10  | Alice  | 1
  9  | Bob    | 1
  8  | Alice  | 2
''')
t2 = pw.debug.parse_to_table('''
age  | owner  | pet | size
 10  | Alice  | 3   | M
 9   | Bob    | 1   | L
 8   | Tom    | 1   | XL
''')
t3 = t1.join(t2, t1.pet == t2.pet, t1.owner == t2.owner, how=pw.JoinMode.INNER).select(age=t1.age, owner_name=t2.owner, size=t2.size)  # noqa: E501
pw.debug.compute_and_print(t3, include_id = False)
age | owner_name | size
9   | Bob        | L

join_left(other, *on, id=None)

Left-joins two tables or join results.

Remarks: args cannot contain id column from either of tables, as the result table has id column with auto-generated ids; it can be selected by assigning it to a column with defined name (passed in kwargs)

Behavior:

  • for rows from the left side that were not matched with the right side, missing values on the right are replaced with None
  • rows from the right side that were not matched with the left side are skipped
  • for rows that were matched the behavior is the same as that of an inner join.
  • Returns
    JoinResult – an object on which .select() may be called to extract relevant columns from the result of the join.

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown(
    '''
        | A  | B
      1 | 11 | 111
      2 | 12 | 112
      3 | 13 | 113
      4 | 13 | 114
    '''
)
t2 = pw.debug.table_from_markdown(
    '''
        | C  | D
      1 | 11 | 211
      2 | 12 | 212
      3 | 14 | 213
      4 | 14 | 214
    '''
)
pw.debug.compute_and_print(t1.join_left(t2, t1.A == t2.C
).select(t1.A, t2_C=t2.C, S=pw.require(t1.B + t2.D,t2.id)),
include_id=False)
A  | t2_C | S
11 | 11   | 322
12 | 12   | 324
13 |      |
13 |      |

join_outer(other, *on, id=None)

Outer-joins two tables or join results.

Remarks: args cannot contain id column from either of tables, as the result table has id column with auto-generated ids; it can be selected by assigning it to a column with defined name (passed in kwargs)

Behavior:

  • for rows from the left side that were not matched with the right side, missing values on the right are replaced with None
  • for rows from the right side that were not matched with the left side, missing values on the left are replaced with None
  • for rows that were matched the behavior is the same as that of an inner join.
  • Returns
    JoinResult – an object on which .select() may be called to extract relevant columns from the result of the join.

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown(
    '''
        | A  | B
      1 | 11 | 111
      2 | 12 | 112
      3 | 13 | 113
      4 | 13 | 114
    '''
)
t2 = pw.debug.table_from_markdown(
    '''
        | C  | D
      1 | 11 | 211
      2 | 12 | 212
      3 | 14 | 213
      4 | 14 | 214
    '''
)
pw.debug.compute_and_print(t1.join_outer(t2, t1.A == t2.C
).select(t1.A, t2_C=t2.C, S=pw.require(t1.B + t2.D,t1.id,t2.id)),
include_id=False)
A  | t2_C | S
   | 14   |
   | 14   |
11 | 11   | 322
12 | 12   | 324
13 |      |
13 |      |

join_right(other, *on, id=None)

Outer-joins two tables or join results.

Remarks: args cannot contain id column from either of tables, as the result table has id column with auto-generated ids; it can be selected by assigning it to a column with defined name (passed in kwargs)

Behavior:

  • rows from the left side that were not matched with the right side are skipped
  • for rows from the right side that were not matched with the left side, missing values on the left are replaced with None
  • for rows that were matched the behavior is the same as that of an inner join.
  • Returns
    JoinResult – an object on which .select() may be called to extract relevant columns from the result of the join.

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown(
    '''
        | A  | B
      1 | 11 | 111
      2 | 12 | 112
      3 | 13 | 113
      4 | 13 | 114
    '''
)
t2 = pw.debug.table_from_markdown(
    '''
        | C  | D
      1 | 11 | 211
      2 | 12 | 212
      3 | 14 | 213
      4 | 14 | 214
    '''
)
pw.debug.compute_and_print(t1.join_right(t2, t1.A == t2.C
).select(t1.A, t2_C=t2.C, S=pw.require(pw.coalesce(t1.B,0) + t2.D,t1.id)),
include_id=False)
A  | t2_C | S
   | 14   |
   | 14   |
11 | 11   | 322
12 | 12   | 324
  • Returns
    OuterJoinResult object

promise_universe_is_equal_to(other)

Asserts to Pathway that an universe of self is a subset of universe of each of the others.

Semantics: Used in situations where Pathway cannot deduce one universe being a subset of another.

  • Returns
    None

NOTE: The assertion works in place.

Example:

import pathway as pw
t1 = pw.debug.parse_to_table('''
  | pet
1 | Dog
7 | Cat
''')
t2 = pw.debug.parse_to_table('''
  | age
1 | 10
7 | 3
''')
t1 = t1.promise_universe_is_equal_to(t2)
t3 = t1 + t2
pw.debug.compute_and_print(t3, include_id=False)
pet | age
Cat | 3
Dog | 10

promise_universe_is_subset_of(other)

Asserts to Pathway that an universe of self is a subset of universe of each of the other.

Semantics: Used in situations where Pathway cannot deduce one universe being a subset of another.

  • Returns
    self

NOTE: The assertion works in place.

Example:

import pathway as pw
t1 = pw.debug.parse_to_table('''
  | age | owner | pet
1 | 10  | Alice | 1
2 | 9   | Bob   | 1
3 | 8   | Alice | 2
''')
t2 = pw.debug.parse_to_table('''
  | age | owner | pet
1 | 10  | Alice | 30
''').promise_universe_is_subset_of(t1)
t3 = t1 << t2
pw.debug.compute_and_print(t3, include_id=False)
age | owner | pet
8   | Alice | 2
9   | Bob   | 1
10  | Alice | 30

promise_universes_are_disjoint(other)

Asserts to Pathway that an universe of self is disjoint from universe of other.

Semantics: Used in situations where Pathway cannot deduce universes are disjoint.

  • Returns
    self

NOTE: The assertion works in place.

Example:

import pathway as pw
t1 = pw.debug.parse_to_table('''
  | age | owner | pet
1 | 10  | Alice | 1
2 | 9   | Bob   | 1
3 | 8   | Alice | 2
''')
t2 = pw.debug.parse_to_table('''
   | age | owner | pet
11 | 11  | Alice | 30
12 | 12  | Tom   | 40
''').promise_universes_are_disjoint(t1)
t3 = t1.concat(t2)
pw.debug.compute_and_print(t3, include_id=False)
age | owner | pet
8   | Alice | 2
9   | Bob   | 1
10  | Alice | 1
11  | Alice | 30
12  | Tom   | 40

reduce(*args, **kwargs)

Reduce a join result to a single row.

Equivalent to self.groupby().reduce(*args, **kwargs).

  • Parameters
    • args (ColumnReference) – reducer to reduce the table with
    • kwargs (ColumnExpression) – reducer to reduce the table with. Its key is the new name of a column.
  • Returns
    Table – Reduced table.

Example:

import pathway as pw
t1 = pw.debug.parse_to_table('''
   cost  owner  pet
1   100  Alice    1
2    90    Bob    1
3    80  Alice    2
''')
t2 = pw.debug.parse_to_table('''
    cost  owner  pet size
11   100  Alice    3    M
12    90    Bob    1    L
13    80    Tom    1   XL
''')
result = t1.join(t2, t1.owner==t2.owner).reduce(total_pairs = pw.reducers.count())
pw.debug.compute_and_print(result, include_id=False)
total_pairs
3

select(*args, **kwargs)

Computes result of a join.
  • Parameters
    • args (ColumnReference) – Column references.
    • kwargs (Any) – Column expressions with their new assigned names.
  • Returns
    Table – Created table.

Example:

import pathway as pw
t1 = pw.debug.parse_to_table('''
age  | owner  | pet
 10  | Alice  | 1
  9  | Bob    | 1
  8  | Alice  | 2
''')
t2 = pw.debug.parse_to_table('''
age  | owner  | pet | size
 10  | Alice  | 3   | M
 9   | Bob    | 1   | L
 8   | Tom    | 1   | XL
''')
t3 = t1.join(t2, t1.pet == t2.pet, t1.owner == t2.owner).select(age=t1.age, owner_name=t2.owner, size=t2.size)   # noqa: E501
pw.debug.compute_and_print(t3, include_id=False)
age | owner_name | size
9   | Bob        | L

class pw.Joinable(universe)

join(other, *on, id=None, how=JoinMode.INNER)

Join self with other using the given join expression.
  • Parameters
    • other (Joinable) – the right side of the join.
    • on (ColumnExpression) – a list of column expressions. Each must have == as the top level operation and be of the form LHS: ColumnReference == RHS: ColumnReference.
    • id (Optional[ColumnReference]) – optional argument for id of result, can be only self.id or other.id
    • how (JoinMode) – by default, inner join is performed. Possible values are JoinMode.{INNER,LEFT,RIGHT,OUTER} correspond to inner, left, right and outer join respectively.
  • Returns
    JoinResult – an object on which .select() may be called to extract relevant columns from the result of the join.

Example:

import pathway as pw
t1 = pw.debug.parse_to_table('''
age  | owner  | pet
 10  | Alice  | 1
  9  | Bob    | 1
  8  | Alice  | 2
''')
t2 = pw.debug.parse_to_table('''
age  | owner  | pet | size
 10  | Alice  | 3   | M
 9   | Bob    | 1   | L
 8   | Tom    | 1   | XL
''')
t3 = t1.join(
    t2, t1.pet == t2.pet, t1.owner == t2.owner, how=pw.JoinMode.INNER
).select(age=t1.age, owner_name=t2.owner, size=t2.size)
pw.debug.compute_and_print(t3, include_id = False)
age | owner_name | size
9   | Bob        | L

join_inner(other, *on, id=None)

Inner-joins two tables or join results.
  • Parameters
    • other (Joinable) – the right side of the join.
    • on (ColumnExpression) – a list of column expressions. Each must have == as the top level operation and be of the form LHS: ColumnReference == RHS: ColumnReference.
    • id (Optional[ColumnReference]) – optional argument for id of result, can be only self.id or other.id
  • Returns
    JoinResult – an object on which .select() may be called to extract relevant columns from the result of the join.

Example:

import pathway as pw
t1 = pw.debug.parse_to_table('''
age  | owner  | pet
 10  | Alice  | 1
  9  | Bob    | 1
  8  | Alice  | 2
''')
t2 = pw.debug.parse_to_table('''
age  | owner  | pet | size
 10  | Alice  | 3   | M
 9   | Bob    | 1   | L
 8   | Tom    | 1   | XL
''')
t3 = t1.join(t2, t1.pet == t2.pet, t1.owner == t2.owner, how=pw.JoinMode.INNER).select(age=t1.age, owner_name=t2.owner, size=t2.size)  # noqa: E501
pw.debug.compute_and_print(t3, include_id = False)
age | owner_name | size
9   | Bob        | L

join_left(other, *on, id=None)

Left-joins two tables or join results.

Remarks: args cannot contain id column from either of tables, as the result table has id column with auto-generated ids; it can be selected by assigning it to a column with defined name (passed in kwargs)

Behavior:

  • for rows from the left side that were not matched with the right side, missing values on the right are replaced with None
  • rows from the right side that were not matched with the left side are skipped
  • for rows that were matched the behavior is the same as that of an inner join.
  • Returns
    JoinResult – an object on which .select() may be called to extract relevant columns from the result of the join.

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown(
    '''
        | A  | B
      1 | 11 | 111
      2 | 12 | 112
      3 | 13 | 113
      4 | 13 | 114
    '''
)
t2 = pw.debug.table_from_markdown(
    '''
        | C  | D
      1 | 11 | 211
      2 | 12 | 212
      3 | 14 | 213
      4 | 14 | 214
    '''
)
pw.debug.compute_and_print(t1.join_left(t2, t1.A == t2.C
).select(t1.A, t2_C=t2.C, S=pw.require(t1.B + t2.D,t2.id)),
include_id=False)
A  | t2_C | S
11 | 11   | 322
12 | 12   | 324
13 |      |
13 |      |

join_outer(other, *on, id=None)

Outer-joins two tables or join results.

Remarks: args cannot contain id column from either of tables, as the result table has id column with auto-generated ids; it can be selected by assigning it to a column with defined name (passed in kwargs)

Behavior:

  • for rows from the left side that were not matched with the right side, missing values on the right are replaced with None
  • for rows from the right side that were not matched with the left side, missing values on the left are replaced with None
  • for rows that were matched the behavior is the same as that of an inner join.
  • Returns
    JoinResult – an object on which .select() may be called to extract relevant columns from the result of the join.

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown(
    '''
        | A  | B
      1 | 11 | 111
      2 | 12 | 112
      3 | 13 | 113
      4 | 13 | 114
    '''
)
t2 = pw.debug.table_from_markdown(
    '''
        | C  | D
      1 | 11 | 211
      2 | 12 | 212
      3 | 14 | 213
      4 | 14 | 214
    '''
)
pw.debug.compute_and_print(t1.join_outer(t2, t1.A == t2.C
).select(t1.A, t2_C=t2.C, S=pw.require(t1.B + t2.D,t1.id,t2.id)),
include_id=False)
A  | t2_C | S
   | 14   |
   | 14   |
11 | 11   | 322
12 | 12   | 324
13 |      |
13 |      |

join_right(other, *on, id=None)

Outer-joins two tables or join results.

Remarks: args cannot contain id column from either of tables, as the result table has id column with auto-generated ids; it can be selected by assigning it to a column with defined name (passed in kwargs)

Behavior:

  • rows from the left side that were not matched with the right side are skipped
  • for rows from the right side that were not matched with the left side, missing values on the left are replaced with None
  • for rows that were matched the behavior is the same as that of an inner join.
  • Returns
    JoinResult – an object on which .select() may be called to extract relevant columns from the result of the join.

Example:

import pathway as pw
t1 = pw.debug.table_from_markdown(
    '''
        | A  | B
      1 | 11 | 111
      2 | 12 | 112
      3 | 13 | 113
      4 | 13 | 114
    '''
)
t2 = pw.debug.table_from_markdown(
    '''
        | C  | D
      1 | 11 | 211
      2 | 12 | 212
      3 | 14 | 213
      4 | 14 | 214
    '''
)
pw.debug.compute_and_print(t1.join_right(t2, t1.A == t2.C
).select(t1.A, t2_C=t2.C, S=pw.require(pw.coalesce(t1.B,0) + t2.D,t1.id)),
include_id=False)
A  | t2_C | S
   | 14   |
   | 14   |
11 | 11   | 322
12 | 12   | 324
  • Returns
    OuterJoinResult object

promise_universe_is_equal_to(other)

Asserts to Pathway that an universe of self is a subset of universe of each of the others.

Semantics: Used in situations where Pathway cannot deduce one universe being a subset of another.

  • Returns
    None

NOTE: The assertion works in place.

Example:

import pathway as pw
t1 = pw.debug.parse_to_table('''
  | pet
1 | Dog
7 | Cat
''')
t2 = pw.debug.parse_to_table('''
  | age
1 | 10
7 | 3
''')
t1 = t1.promise_universe_is_equal_to(t2)
t3 = t1 + t2
pw.debug.compute_and_print(t3, include_id=False)
pet | age
Cat | 3
Dog | 10

promise_universe_is_subset_of(other)

Asserts to Pathway that an universe of self is a subset of universe of each of the other.

Semantics: Used in situations where Pathway cannot deduce one universe being a subset of another.

  • Returns
    self

NOTE: The assertion works in place.

Example:

import pathway as pw
t1 = pw.debug.parse_to_table('''
  | age | owner | pet
1 | 10  | Alice | 1
2 | 9   | Bob   | 1
3 | 8   | Alice | 2
''')
t2 = pw.debug.parse_to_table('''
  | age | owner | pet
1 | 10  | Alice | 30
''').promise_universe_is_subset_of(t1)
t3 = t1 << t2
pw.debug.compute_and_print(t3, include_id=False)
age | owner | pet
8   | Alice | 2
9   | Bob   | 1
10  | Alice | 30

promise_universes_are_disjoint(other)

Asserts to Pathway that an universe of self is disjoint from universe of other.

Semantics: Used in situations where Pathway cannot deduce universes are disjoint.

  • Returns
    self

NOTE: The assertion works in place.

Example:

import pathway as pw
t1 = pw.debug.parse_to_table('''
  | age | owner | pet
1 | 10  | Alice | 1
2 | 9   | Bob   | 1
3 | 8   | Alice | 2
''')
t2 = pw.debug.parse_to_table('''
   | age | owner | pet
11 | 11  | Alice | 30
12 | 12  | Tom   | 40
''').promise_universes_are_disjoint(t1)
t3 = t1.concat(t2)
pw.debug.compute_and_print(t3, include_id=False)
age | owner | pet
8   | Alice | 2
9   | Bob   | 1
10  | Alice | 1
11  | Alice | 30
12  | Tom   | 40