Reducers

Usage

Reducers are used in reduce to compute the aggregated results obtained by a groupby:

import pathway as pw
my_table.groupby(table.columnA).reduce(aggregated_result=pw.reducers.my_reducer(my_table.columnB))

We use the following table t in the examples:

t = pw.debug.table_from_markdown(
    """
    | colA | colB | colC | colD
 1  | valA | -1   |   5  |  4
 2  | valA | 1    |   5  |  7
 3  | valA | 2    |   5  | -3
 4  | valB | 4    |  10  |  2
 5  | valB | 4    |  10  |  6
 6  | valB | 7    |  10  |  1
 """
)
pw.debug.compute_and_print(t)
[2023-09-19T09:27:51]:INFO:Preparing Pathway computation


            | colA | colB | colC | colD
^YYY4HAB... | valA | -1   | 5    | 4
^Z3QWT29... | valA | 1    | 5    | 7
^3CZ78B4... | valA | 2    | 5    | -3
^3HN31E1... | valB | 4    | 10   | 2
^3S2X6B2... | valB | 4    | 10   | 6
^A984WV0... | valB | 7    | 10   | 1

List of Available Reducers

min

Returns the minimum of the aggregated values.

t.groupby(t.colA).reduce(min=pw.reducers.min(t.colB))
[2023-09-19T09:27:51]:INFO:Preparing Pathway computation


            | min
^ENHSR8M... | -1
^XN617D8... | 4

max

Returns the maximum of the aggregated values.

t.groupby(t.colA).reduce(max=pw.reducers.max(t.colB))
[2023-09-19T09:27:51]:INFO:Preparing Pathway computation


            | max
^ENHSR8M... | 2
^XN617D8... | 7

sum

Returns the sum of the aggregated values.

t.groupby(t.colA).reduce(sum=pw.reducers.sum(t.colB))
[2023-09-19T09:27:51]:INFO:Preparing Pathway computation


            | sum
^ENHSR8M... | 2
^XN617D8... | 15

avg

Returns the average of the aggregated values.

t.groupby(t.colA).reduce(sum=pw.reducers.avg(t.colB))
[2023-09-19T09:27:51]:INFO:Preparing Pathway computation


            | sum
^ENHSR8M... | 0.6666666666666666
^XN617D8... | 5.0

tuple

Return a tuple containing all the aggregated values. Order of values inside a tuple is consistent across application to many columns.

t.groupby(t.colA).reduce(tuple_colB=pw.reducers.tuple(t.colB), tuple_colD=pw.reducers.tuple(t.colD))
[2023-09-19T09:27:51]:INFO:Preparing Pathway computation


            | tuple_colB | tuple_colD
^ENHSR8M... | (-1, 1, 2) | (4, 7, -3)
^XN617D8... | (4, 4, 7)  | (2, 6, 1)

sorted_tuple

Return a sorted tuple containing all the aggregated values.

t.groupby(t.colA).reduce(tuples=pw.reducers.sorted_tuple(t.colB))
[2023-09-19T09:27:51]:INFO:Preparing Pathway computation


            | tuples
^ENHSR8M... | (-1, 1, 2)
^XN617D8... | (4, 4, 7)

ndarray

Return an array containing all the aggregated values. Order of values inside an array is consistent across application to many columns.

t.groupby(t.colA).reduce(tuple_colB=pw.reducers.ndarray(t.colB), tuple_colD=pw.reducers.ndarray(t.colD))
[2023-09-19T09:27:51]:INFO:Preparing Pathway computation


            | tuple_colB | tuple_colD
^XN617D8... | [4 4 7]    | [2 6 1]
^ENHSR8M... | [-1  1  2] | [ 4  7 -3]

count

Returning the number of aggregated elements.

t.groupby(t.colA).reduce(count=pw.reducers.count())
[2023-09-19T09:27:51]:INFO:Preparing Pathway computation


            | count
^XN617D8... | 3
^ENHSR8M... | 3

argmin

Returns the index of the minimum aggregated value.

t.groupby(t.colA).reduce(argmin=pw.reducers.argmin(t.colB))
[2023-09-19T09:27:51]:INFO:Preparing Pathway computation


            | argmin
^ENHSR8M... | ^YYY4HAB...
^XN617D8... | ^3HN31E1...

argmax

Returns the index of the maximum aggregated value.

t.groupby(t.colA).reduce(argmax=pw.reducers.argmax(t.colB))
[2023-09-19T09:27:51]:INFO:Preparing Pathway computation


            | argmax
^ENHSR8M... | ^3CZ78B4...
^XN617D8... | ^A984WV0...

any

Returns any of the aggregated values. Values are consistent across application to many columns.

t.groupby(t.colA).reduce(any_colB=pw.reducers.any(t.colB), any_colD=pw.reducers.any(t.colD))
[2023-09-19T09:27:51]:INFO:Preparing Pathway computation


            | any_colB | any_colD
^ENHSR8M... | 2        | -3
^XN617D8... | 7        | 1

unique

Returns aggregated value, if all values are identical. If values are not identical, exception is raised.

t.groupby(t.colA).reduce(unique=pw.reducers.unique(t.colC))
[2023-09-19T09:27:51]:INFO:Preparing Pathway computation


            | unique
^ENHSR8M... | 5
^XN617D8... | 10
import numpy as np
# ### `sum`
#
# Return the sum of the values of aggregated numpy arrays.
import pandas as pd
np_table = pw.debug.table_from_pandas(
    pd.DataFrame(
        {
            "data": [
                np.array([1, 2, 3]),
                np.array([4, 5, 6]),
                np.array([7, 8, 9]),
            ]
        }
    )
)
np_table.reduce(data_sum=pw.reducers.sum(np_table.data))
[2023-09-19T09:27:51]:INFO:Preparing Pathway computation


            | data_sum
^PWSRT42... | [12 15 18]