Changelog
All notable changes to this project will be documented in this file.
This project adheres to Semantic Versioning.
Unreleased
0.10.0 - 2024-04-24
Added
- Pathway now warns when unintentionally creating Table with empty universe.
pw.io.kafka.write
inraw
andplaintext
formats now supports output for tables with multiple columns. For such tables, it requires the specification of the column that must be used as a value of the produced Kafka messages and gives a possibility to provide column which must be used as a key.pw.io.kafka.write
can now output values from the table using Kafka message headers in 'raw' and 'plaintext' output format.
Changed
instance
arguments togroupby
,join
,with_id_from
now determine how entries are distributed between machines.flatten
results remain on the same machine as their source entries.join
sends each record between machines at most once.- BREAKING:
flatten
,join
,groupby
(if used withinstance
),with_id_from
(if used withinstance
) generate IDs of the produced rows differently than in the previous versions. pathway spawn
with multiple workers prints only output from the first worker.
0.9.0 - 2024-04-18
Added
pw.reducers.latest
andpw.reducers.earliest
that return the value with respectively maximal and minimal processing time assigned.pw.io.kafka.write
can now produce messages containing raw bytes in case the table consists of a single binary column andraw
mode is specified. Similarly, this method will provide plaintext messages ifplaintext
mode is chosen and the table consists of a single string-typed column.pw.io.pubsub.write
connector for publishing Pathway tables into Google PubSub.- Argument
strict_prompt
toanswer_with_geometric_rag_strategy
andanswer_with_geometric_rag_strategy_from_index
that allows optimizing prompts for smaller open-source LLM models. - Temporarily switch LiteLLMChat's generation method to sync version due to a bug while using
json
mode with Ollama.
Changed
- BREAKING:
pw.io.kafka.read
will not parse the messages from UTF-8 in caseraw
mode was specified. To preserve this behavior you can use theplaintext
mode. - BREAKING:
Table.flatten
now flattens one column and spreads every other column of the table, instead of taking other columns from the argument list.
0.8.6 - 2024-04-10
Added
pw.io.bigquery.write
connector for writing Pathway tables into Google BigQuery.- parameter
filepath_globpattern
toquery
method inVectorStoreClient
for specifying which files should be considered in the query. - Improved compatibility of
pw.Json
with standard methods such aslen()
,int()
,float()
,bool()
,iter()
,reversed()
when feasible.
Changed
pw.io.postgres.write
can now parallelize writes to several threads if several workers are configured.- Pathway now checks types of pointers rigorously. Indexing table with mismatched number/types of columns vs what was used to create index will now result in a TypeError.
pw.Json.as_float()
method now supports integer JSON values.
0.8.5 - 2024-03-27
Added
- New function
answer_with_geometric_rag_strategy_from_index
, which allows to useanswer_with_geometric_rag_strategy
without the need to first retrieve documents from index. - Added support for custom state serialization to
udf_reducer
. - Introduced
instance
parameter inAsyncTransformer
. All calls with a given(instance, processing_time)
pair are returned at the same processing time. Ordering is preserved within a single instance. - Added
successful
,failed
,finished
properties toAsyncTransformer
. They return tables with successful calls, failed calls and all finished calls, respectively.
Changed
- Property
result
ofAsyncTransformer
is deprecated. Propertysuccessful
should be used instead. pw.io.csv.read
,pw.io.jsonlines.read
,pw.io.fs.read
,pw.io.plaintext.read
now handlepath
as a glob pattern and read all matched files and directories recursively.
0.8.4 - 2024-03-18
Fixed
- Pathway will only require
LiteLLM
package, if you use one of the wrappers forLiteLLM
. - Retries are implemented in
pw.io.airbyte.read
. - State processing protocol is updated in
pw.io.airbyte.read
.
0.8.3 - 2024-03-13
Added
- New parameters of
pw.UDF
class andpw.udf
decorator:return_type
,deterministic
,propagate_none
,executor
,cache_strategy
. - The LLM Xpack now provides integrations with LlamaIndex and LangChain for running the Pathway VectorStore server.
Changed
- Subclassing
UDFSync
andUDFAsync
is deprecated.UDF
should be subclassed to create a new UDF. - Passing keyword arguments to
pw.apply
,pw.apply_with_type
,pw.apply_async
is deprecated. In the future, they'll be used for configuration, not passing data to the function.
Fixed
- Fixed a minor bug with
Table.groupby()
method which sometimes prevented of accessing certain columns in the followingreduce()
. - Fixed warnings from using OpenAI Async embedding model in the VectorStore in Colab.
0.8.2 - 2024-02-28
Added
%:z
timezone format code tostrptime
.- Support for Airbyte connectors
pw.io.airbyte
.
0.8.1 - 2024-02-15
Added
- Introduced the
send_alerts
function in thepw.io.slack
namespace, enabling users to send messages from a specified column directly to a Slack channel. - Enhanced the
pw.io.http.rest_connector
by introducing an additional argument calledrequest_validator
. This feature empowers users to validate payloads and raise anHTTP 400
error if necessary.
Fixed
- Addressed an issue in
pw.io.xpacks.llm.VectorStoreServer
where the computation of the last modification timestamp for an indexed document was incorrect.
Changed
- Improved the behavior of
pw.io.kafka.write
. It now includes retries when sending data to the output topic encounters failures.
0.8.0 - 2024-02-01
Added
pw.io.http.rest_connector
now supports multiple HTTP request types.pw.io.http.PathwayWebserver
now allows Cross-Origin Resource Sharing (CORS) to be enabled on newly added endpoints- Wrappers for LiteLLM and HuggingFace chat services and SentenceTransformers embedding service are now added to Pathway xpack for LLMs.
Changed
pw.run
now includes an additional parameterruntime_typechecking
that enables strict type checking at runtime.- Embedders in pathway.xpacks.llm.embedders now correctly process empty strings as queries.
- BREAKING:
pw.run
andpw.run_all
now only accept keyword arguments.
Fixed
pw.Duration
can now be returned from User-Defined Functions (UDFs) or used as a constant value without resulting in errors.pw.io.debezium.read
now correctly handles tables that do not have a primary key.
0.7.10 - 2024-01-26
Added
pw.io.http.rest_connector
can now generate Open API 3.0.3 schema that will be returned by the route/_schema
.- Wrappers for OpenAI Chat and Embedding services are now added to Pathway xpack for LLMs.
- A vector indexing pipeline that allows querying for the most similar documents. It is available as class
VectorStore
as part of Pathway xpack for LLMs.
Fixed
pw.debug.table_from_markdown
now uses schema parameter (when set) to properly assign simple types (int, bool, float, str, bytes
) and optional simple types to columns.
0.7.9 - 2024-01-18
Changed
pw.io.http.rest_connector
now also accepts port as a string for backwards compatibility.pw.stdlib.ml.index.KNNIndex
now sorts by distance by default.
0.7.8 - 2024-01-18
Added
- Support for comparisons of tuples has been added.
- Standalone versions of methods such as
pw.groupby
,pw.join
,pw.join_inner
,pw.join_left
,pw.join_right
, andpw.join_outer
are now available. - The
abs
function from Python can now be used on Pathway expressions. - The
asof_join
method now has configurable temporal behavior. Thebehavior
parameter can be used to pass the configuration. - The state of the
deduplicate
operator can now be persisted.
Changed
interval_join
can now work with intervals of zero length.- The
pw.io.http.rest_connector
can now open multiple endpoints on the same port using a newpw.io.http.PathwayWebserver
class. - The
pw.xpacks.connectors.sharepoint.read
andpw.io.gdrive.read
methods now support the size limit for a single object. If set, it will exclude too large files and won't read them.
0.7.7 - 2023-12-27
Added
- pathway.xpacks.llm.splitter.TokenCountSplitter.
0.7.6 - 2023-12-22
New Features
Conversion Methods in pw.Json
- Introducing new methods for strict conversion of
pw.Json
to desired types within a UDF body:as_int()
as_float()
as_str()
as_bool()
as_list()
as_dict()
DateTime Functionality
- Added
table.col.dt.utc_from_timestamp
method: CreatesDateTimeUtc
from timestamps represented asint
s orfloat
s. - Enhanced the
table.col.dt.timestamp
method with a newunit
argument to specify the unit of the returned timestamp.
Experimental Features
- Introduced an experimental xpack with a Microsoft SharePoint input connector.
Enhancements
Improved JSON Handling
- Index operator (
[]
) can now be directly applied topw.Json
within UDFs to access elements of JSON objects, arrays, and strings.
Expanded Timestamp Functionality
- Enhanced the
table.col.dt.from_timestamp
method to createDateTimeNaive
from timestamps represented asint
s orfloat
s. - Deprecated not specifying the
unit
argument of thetable.col.dt.timestamp
method.
KNNIndex Enhancements
KNNIndex
now supports returning computed distances.- Added support for cosine similarity in
KNNIndex
.
Deprecated Features
- The
offset
argument ofpw.stdlib.temporal.sliding
andpw.stdlib.temporal.tumbling
is deprecated. Useorigin
instead, as it represents a point in time, not a duration.
Bug Fixes
DateTime Fixes
- Sliding window now works correctly with UTC Datetimes.
asof_join
Improvements
- Temporal column in
asof_join
no longer has to be namedt
. asof_join
includes rows with equal times for all values of thedirection
parameter.
Fixed Issues
- Fixed an issue with
pw.io.gdrive.read
: Shared folders support is now working seamlessly.
0.7.5 - 2023-12-15
Added
- Added Table.split() method for splitting table based on an expression into two tables.
- Columns with datatype duration can now be multiplied and divided by floats.
- Columns with datatype duration now support both true and floor division (
/
and//
) by integers.
Changed
- Pathway is better at typing if_else expressions when optional types are involved.
table.flatten()
operator now supports Json array.- Buffers (used to delay outputs, configured via delay in
common_behavior
) now flush the data when the computation is finished. The effect of this change can be seen when run in bounded (batch / multi-revision) mode. pw.io.subscribe()
takes additional argumenton_time_end
- the callback function to be called on each closed time of computation.pw.io.subscribe()
is now a single-worker operator, guaranteeing thaton_end
is triggered at most once.KNNIndex
supports now metadata filtering. Each query can specify it's own filter in the JMESPath format.
Fixed
- Resolved an optimization bug causing
pw.iterate
to malfunction when handling columns effectively pointing to the same data.
0.7.4 - 2023-12-05
Changed
- Pathway now keeps track of
array
columntype better - it is able to keep track of Array dtype and number of dimensions, wherever applicable.
Fixed
- Fixed issues with standalone panel+Bokeh dashboards to ensure optimal functionality and performance.
0.7.3 - 2023-11-30
Added
- A method
weekday
has been added to thedt
namespace, that can be called on column expressions containing datetime data. This method returns an integer that represents the day of the week. - EXPERIMENTAL: Methods
show
andplot
on Tables, providing visualizations of data using HoloViz Panel. - Added support for
instance
parameter togroupby
,join
,windowby
and temporal join methods. pw.PersistenceMode.UDF_CACHING
persistence mode enabling automatic caching ofAsyncTransformer
invocations.
Changed
- Methods
round
andfloor
on columns with datetimes now accept duration argument to be a string. pw.debug.compute_and_print
andpw.debug.compute_and_print_update_stream
have a new argumentn_rows
that limits the number of rows printed.pw.debug.table_to_pandas
has a new argumentinclude_id
(by defaultTrue
). If set toFalse
, creates a new index for the Pandas DataFrame, rather than using the keys of the Pathway Table.windowby
functionshard
argument is now deprecated andinstance
should be used.- Special column name
_pw_shard
is now deprecated, and_pw_instance
should be used. pw.ReplayMode
now can be accessed aspw.PersistenceMode
, while theSPEEDRUN
andREALTIME
variants are now accessible asSPEEDRUN_REPLAY
andREALTIME_REPLAY
.- EXPERIMENTAL:
pw.io.gdrive.read
has a new argumentwith_metadata
(by defaultFalse
). If set toTrue
, adds a_metadata
column containing file metadata to the resulting table. - Methods
get_nearest_items
andget_nearest_items_asof_now
ofKNNIndex
allow to specifyk
(number of returned elements) separately in each query.
0.7.2 - 2023-11-24
Added
- Added ability of creating custom reducers using
pw.reducers.udf_reducer
decorator. Usepw.BaseCustomAccumulator
as a base class for creating accumulators. Decorating accumulator returns reducer following custom logic. - A function
pw.debug.compute_and_print_update_stream
that computes and prints the update stream of the table. - SQLite input connector (
pw.io.sqlite
).
Changed
pw.debug.parse_to_table
is now deprecated,pw.debug.table_from_markdown
should be used instead.pw.schema_from_csv
now hasquote
anddouble_quote_escapes
arguments.
Fixed
- Schema returned from
pw.schema_from_csv
will have quotes removed from column names, so it will now work properly withpw.io.csv.read
.
0.7.1 - 2023-11-17
Added
- Experimental Google Drive input connector.
- Stateful deduplication function (
pw.stateful.deduplicate
) allowing alerting on significant changes. - The ability to split data into batches in
pw.debug.table_from_markdown
andpw.debug.table_from_pandas
.
0.7.0 - 2023-11-16
Added
- class
Behavior
, a superclass of all behavior classes. - class
ExactlyOnceBehavior
indicating we want to create aCommonBehavior
that results in each window producing exactly one output (shifted in time by an optionalshift
parameter). - function
exactly_once_behavior
creating an instance ofExactlyOnceBehavior
.
Changed
- BREAKING:
WindowBehavior
is now calledCommonBehavior
, as it can be also used with interval joins. - BREAKING:
window_behavior
is now calledcommon_behavior
, as it can be also used with interval joins. - Deprecating parameter
keep_queries
inpw.io.http.rest_connector
. Nowdelete_completed_queries
with an opposite meaning should be used instead. The default is stilldelete_completed_queries=True
(equivalent tokeep_queries=False
) but it will soon be required to be set explicitly.
0.6.0 - 2023-11-10
Added
- A flag
with_metadata
for the filesystem-based connectors to attach the source file metadata to the table entries. - Methods
pw.debug.table_from_list_of_batches
andpw.debug.table_from_list_of_batches_by_workers
for creating tables with defined data being inserted over time.
Changed
- BREAKING:
pw.debug.table_from_pandas
andpw.debug.table_from_markdown
now will create tables in the streaming mode, instead of static, if given table definition contains_time
column. - BREAKING: Renamed the parameter
keep_queries
inpw.io.http.rest_connector
todelete_queries
with the opposite meaning. It changes the default behavior - it waskeep_queries=False
, now it isdelete_queries=False
.
0.5.3 - 2023-10-27
Added
- A method
get_nearest_items_asof_now
inKNNIndex
that allows to get nearest neighbors without updating old queries in the future. - A method
asof_now_join
inTable
to join rows from left side of the join with right side of the join at their processing time. Past rows from left side are not used when new data appears on the right side.
0.5.2 - 2023-10-19
Added
interval_join
now supports forgetting old entries. The configuration can be passed usingbehavior
parameter ofinterval_join
method.- Decorator
@table_transformer
for marking that functions take Tables as arguments. - Namespace for all columns
Table.C.*
. - Output connectors now provide logs about the number of entries written and time taken.
- Filesystem connectors now support reading whole files as rows.
- Command line option for
pathway spawn
to record data andpathway replay
command to replay data.
0.5.1 - 2023-10-04
Fixed
select
operates only on consistent states.
0.5.0 - 2023-10-04
Added
Schema
methodtypehints
that returns dict of mypy-compatible typehints.- Support for JSON parsing from CSV sources.
restrict
method inTable
to restrict table universe to the universe of the other table.- Better support for postgresql types in the output connector.
Changed
- BREAKING: renamed
Table
methoddtypes
totypehints
. It now returns adict
of mypy-compatible typehints. - BREAKING:
Schema.__getitem__
returns a data classColumnSchema
containing all related information on particular column. - BREAKING:
tuple
reducer used after intervals_over window now sorts values by time. - BREAKING: expressions used in
select
,filter
,flatten
,with_columns
,with_id
,with_id_from
have to have the same universe as the table. Earlier it was possible to use an expression from a superset of a table universe. To use expressions from wider universes, one can userestrict
on the expression source table. - BREAKING:
pw.universes.promise_are_equal(t1, t2)
no longer allows to use references fromt1
andt2
in a single expression. To change the universe of a table, usewith_universe_of
. - BREAKING:
ix
andix_ref
are temporarily broken inside joins (both temporal and ordinary). select
,filter
,concat
keep columns as a single stream. The work for other operators is ongoing.
Fixed
- Optional types other than string correctly output to PostgreSQL.
0.4.1 - 2023-09-25
Added
- Support for messages compressed with zstd in the Kafka connector.
0.4.0 - 2023-09-21
Added
- Support for JSON data format, including
pw.Json
type. - Methods
as_int()
,as_float()
,as_str()
,as_bool()
to convert values fromJson
. - New argument
skip_nones
fortuple
andsorted_tuple
reducers. - New argument
is_outer
forintervals_over
window. pw.schema_from_dict
andpw.schema_from_csv
for generating schema based, respectively, on provided definition as a dictionary and CSV file with sample data.generate_class
method inSchema
class for generating schema class code.
Changed
- Method
get()
and[]
to support accessing elements in Jsons. - Function
pw.assert_table_has_schema
for writing asserts checking, whether given table has the same schema as the one that is given as an argument. - BREAKING:
ix
andix_ref
operations are now standalone transformations ofpw.Table
intopw.Table
. Most of the usages remain the same, but sometimes user needs to provide a context (when e.g. using them insidejoin
orgroupby
operations).ix
andix_ref
are temporarily broken inside temporal joins.
Fixed
- Fixed a bug where new-style optional types (e.g.
int | None
) were translated toAny
dtype.
0.3.4 - 2023-09-18
Fixed
- Incompatible
beartype
version is now excluded from dependencies.
0.3.3 - 2023-09-14
Added
- Module
pathway.dt
to construct and manipulate DTypes. - New argument
keep_queries
inpw.io.http.rest_connector
.
Changed
- Internal representation of DTypes. Inputting types is compatible backwards.
- Temporal functions now accept arguments of mixed types (ints and floats). For example,
pw.temporal.interval
can use ints while columns it interacts with are floats. - Single-element arrays are now treated as arrays, not as scalars.
Fixed
to_string()
method on datetimes always prints 9 fractional digits.%f
format code instrptime()
parses fractional part of a second correctly regardless of the number of digits.
0.3.2 - 2023-09-07
Added
Table.cast_to_types()
function that can performpathway.cast
on multiple columns.intervals_over
window, which allows to get temporally close data to given times.demo.replay_csv_with_time
function that can replay a CSV file following the timestamps of a given column.
Fixed
- Static data is now copied to ensure immutability.
- Improved error tracing mechanism to work with any type of error.
0.3.1 - 2023-08-29
Added
tuple
reducer, that returns a tuple with values.ndarray
reducer, that returns an array with values.
Changed
numpy
arrays ofint32
,uint32
andfloat32
are now converted to their 64-bit variants instead of tuples.- KNNIndex interface to take columns as inputs.
- Reducers now check types of their arguments.
Fixed
- Fixed delayed reporting of output connector errors.
- Python objects are now freed more often, reducing peak memory usage.
0.3.0 - 2023-08-07
Added
@
(matrix multiplication) operator.
Changed
- Python version 3.10 or later is now required.
- Type checking is now more strict.
0.2.1 - 2023-07-31
Changed
- Immediately forget queries in REST connector.
- Make type annotations mandatory in
Schema
.
Fixed
- Fixed IDs coming from CSV source.
- Fixed indices of dataframes from pandas transformer.