Selecting columns | df[['colA', 'colB', ...]] | t.select(t.colA, t['colB'], pw.this.colC) |
Filtering on a given value | df[df['name'] == value] | t.filter(t.name == value) |
Filtering on a condition | df[condition] | t.filter(condition) |
Group-by and Aggregation | df.groupby('name').sum() | t.groupby(pw.this.name).reduce(sum=pw.reducers.sum(pw.this.value)) |
Applying a function | df['c'] = func(df['a'], df['b'], ..) | t.select(c=pw.apply(func, t.a, t.b)) |
Join Operations | df1.join(df2, on='name') | t1.join(t2, pw.left.name == pw.right.name).select(...) |
Reading CSV files | df = pd.read_csv('file_name') | t = pw.io.csv.read('./data/', schema=InputSchema) |
Writing to CSV files | pd.to_csv('file_name') | pw.io.csv.write(table, './output/') |
Accessing a row by label | df.loc[[label]] | Using joins or ix_ref |
Accessing a row by index | df.iloc[0] | No equivalence as location-based indexes don't make sense with changing table data. |