Reducers
Usage
Reducers are used in reduce
to compute the aggregated results obtained by a groupby
:
import pathway as pw
my_table.groupby(table.columnA).reduce(aggregated_result=pw.reducers.my_reducer(my_table.columnB))
We use the following table t
in the examples:
t = pw.debug.table_from_markdown(
"""
| colA | colB | colC | colD
1 | valA | -1 | 5 | 4
2 | valA | 1 | 5 | 7
3 | valA | 2 | 5 | -3
4 | valB | 4 | 10 | 2
5 | valB | 4 | 10 | 6
6 | valB | 7 | 10 | 1
"""
)
pw.debug.compute_and_print(t)
[2023-09-19T09:27:51]:INFO:Preparing Pathway computation
| colA | colB | colC | colD
^YYY4HAB... | valA | -1 | 5 | 4
^Z3QWT29... | valA | 1 | 5 | 7
^3CZ78B4... | valA | 2 | 5 | -3
^3HN31E1... | valB | 4 | 10 | 2
^3S2X6B2... | valB | 4 | 10 | 6
^A984WV0... | valB | 7 | 10 | 1
List of Available Reducers
min
Returns the minimum of the aggregated values.
t.groupby(t.colA).reduce(min=pw.reducers.min(t.colB))
[2023-09-19T09:27:51]:INFO:Preparing Pathway computation
| min
^ENHSR8M... | -1
^XN617D8... | 4
max
Returns the maximum of the aggregated values.
t.groupby(t.colA).reduce(max=pw.reducers.max(t.colB))
[2023-09-19T09:27:51]:INFO:Preparing Pathway computation
| max
^ENHSR8M... | 2
^XN617D8... | 7
sum
Returns the sum of the aggregated values.
t.groupby(t.colA).reduce(sum=pw.reducers.sum(t.colB))
[2023-09-19T09:27:51]:INFO:Preparing Pathway computation
| sum
^ENHSR8M... | 2
^XN617D8... | 15
avg
Returns the average of the aggregated values.
t.groupby(t.colA).reduce(sum=pw.reducers.avg(t.colB))
[2023-09-19T09:27:51]:INFO:Preparing Pathway computation
| sum
^ENHSR8M... | 0.6666666666666666
^XN617D8... | 5.0
tuple
Return a tuple containing all the aggregated values. Order of values inside a tuple is consistent across application to many columns.
t.groupby(t.colA).reduce(tuple_colB=pw.reducers.tuple(t.colB), tuple_colD=pw.reducers.tuple(t.colD))
[2023-09-19T09:27:51]:INFO:Preparing Pathway computation
| tuple_colB | tuple_colD
^ENHSR8M... | (-1, 1, 2) | (4, 7, -3)
^XN617D8... | (4, 4, 7) | (2, 6, 1)
sorted_tuple
Return a sorted tuple containing all the aggregated values.
t.groupby(t.colA).reduce(tuples=pw.reducers.sorted_tuple(t.colB))
[2023-09-19T09:27:51]:INFO:Preparing Pathway computation
| tuples
^ENHSR8M... | (-1, 1, 2)
^XN617D8... | (4, 4, 7)
ndarray
Return an array containing all the aggregated values. Order of values inside an array is consistent across application to many columns.
t.groupby(t.colA).reduce(tuple_colB=pw.reducers.ndarray(t.colB), tuple_colD=pw.reducers.ndarray(t.colD))
[2023-09-19T09:27:51]:INFO:Preparing Pathway computation
| tuple_colB | tuple_colD
^XN617D8... | [4 4 7] | [2 6 1]
^ENHSR8M... | [-1 1 2] | [ 4 7 -3]
count
Returning the number of aggregated elements.
t.groupby(t.colA).reduce(count=pw.reducers.count())
[2023-09-19T09:27:51]:INFO:Preparing Pathway computation
| count
^XN617D8... | 3
^ENHSR8M... | 3
argmin
Returns the index of the minimum aggregated value.
t.groupby(t.colA).reduce(argmin=pw.reducers.argmin(t.colB))
[2023-09-19T09:27:51]:INFO:Preparing Pathway computation
| argmin
^ENHSR8M... | ^YYY4HAB...
^XN617D8... | ^3HN31E1...
argmax
Returns the index of the maximum aggregated value.
t.groupby(t.colA).reduce(argmax=pw.reducers.argmax(t.colB))
[2023-09-19T09:27:51]:INFO:Preparing Pathway computation
| argmax
^ENHSR8M... | ^3CZ78B4...
^XN617D8... | ^A984WV0...
any
Returns any of the aggregated values. Values are consistent across application to many columns.
t.groupby(t.colA).reduce(any_colB=pw.reducers.any(t.colB), any_colD=pw.reducers.any(t.colD))
[2023-09-19T09:27:51]:INFO:Preparing Pathway computation
| any_colB | any_colD
^ENHSR8M... | 2 | -3
^XN617D8... | 7 | 1
unique
Returns aggregated value, if all values are identical. If values are not identical, exception is raised.
t.groupby(t.colA).reduce(unique=pw.reducers.unique(t.colC))
[2023-09-19T09:27:51]:INFO:Preparing Pathway computation
| unique
^ENHSR8M... | 5
^XN617D8... | 10
import numpy as np
# ### `sum`
#
# Return the sum of the values of aggregated numpy arrays.
import pandas as pd
np_table = pw.debug.table_from_pandas(
pd.DataFrame(
{
"data": [
np.array([1, 2, 3]),
np.array([4, 5, 6]),
np.array([7, 8, 9]),
]
}
)
)
np_table.reduce(data_sum=pw.reducers.sum(np_table.data))
[2023-09-19T09:27:51]:INFO:Preparing Pathway computation
| data_sum
^PWSRT42... | [12 15 18]