Changelog
All notable changes to this project will be documented in this file.
This project adheres to Semantic Versioning.
Unreleased
Fixed
- Endpoints created by
pw.io.http.rest_connectornow accept requests both with and without a trailing slash. For example,/endpoint/and/endpointare now treated equivalently. - Schemas that inherit from other schemas now automatically preserve all properties from their parent schemas.
- Fixed an issue where the persistence configuration failed when provided with a relative filesystem path.
0.26.4 - 2025-10-16
Added
- New external integration with Qdrant.
pw.io.mysql.writemethod for writing to MySQL. It supports two output table types: stream of changes and a realtime-updated data snapshot.
Changed
pw.io.deltalake.readnow accepts thestart_from_timestamp_msparameter for non-append-only tables. In this case, the connector will replay the history of changes in the table version by version starting from the state of the table at the given timestamp. The differences between versions will be applied atomically.- Asynchronous UDFs for connecting to API based llm and embedding models now have by default retry strategy set to
pw.udfs.ExponentialRetryStrategy() pw.io.postgres.writemethod now supports two output table types: stream of changes and realtime-updated data snapshot. The output table type can be chosen with theoutput_table_typeparameter.pw.io.postgres.write_snapshotmethod has been deprecated.
0.26.3 - 2025-10-03
Added
- New parser
pathway.xpacks.llm.parsers.PaddleOCRParsersupporting parsing of PDF, PPTX and images.
0.26.2 - 2025-10-01
Added
pw.io.gdrive.readnow supports the"only_metadata"format. When this format is used, the table will contain only metadata updates for the tracked directory, without reading object contents.- Detailed metrics can now be exported to SQLite. Enable this feature using the environment variable
PATHWAY_DETAILED_METRICS_DIRor viapw.set_monitoring_config(). pw.io.kinesis.readandpw.io.kinesis.writemethods for reading from and writing to AWS Kinesis.
Fixed
- A bug leading to potentially unbounded memory consumption that could occur in
Table.forgetandTable.sortoperators during multi-worker runs has been fixed. - Improved memory efficiency during cold starts by compacting intermediary structures and reducing retained memory after backfilling.
Changed
- The frequency of background operator snapshot compression in data persistence is limited to the greater of the user-defined
snapshot_intervalor 30 minutes when S3 or Azure is used as the backend, in order to avoid frequent calls to potentially expensive operations. - The Google Drive input connector performance has been improved, especially when handling directories with many nested subdirectories.
- The MCP server
toolmethod now allows to pass the optional datatitle,output_schema,annotationsandmetato inform the LLM client. - Relaxed boto3 dependency to <2.0.0.
0.26.1 - 2025-08-28
Added
pw.Table.forgetto remove old (in terms of event time) entries from the pipeline.pw.Table.buffer, a stateful buffering operator that delays entries untiltime_column <= max(time_column) - thresholdcondition is met.pw.Table.ignore_lateto filter out old (in terms of event time) entries.- Rows batching for async UDFs. It can be enabled with
max_batch_sizeparameter.
Changed
pw.io.subscribeandpw.io.python.writenow work with async callbacks.- The
diffcolumn in tables automatically created bypw.io.postgres.writeandpw.io.postgres.write_snapshotinreplaceandcreate_if_not_existsinitialization modes now uses thesmallinttype. optimize_transaction_logoption has been removed frompw.io.deltalake.TableOptimizer.
Fixed
pw.io.postgres.writeandpw.io.postgres.write_snapshotnow respect the type optionality defined in the Pathway table schema when creating a new PostgreSQL table. This applies to thereplaceandcreate_if_not_existsinitialization modes.
0.26.0 - 2025-08-14
Added
path_filterparameter inpw.io.s3.readandpw.io.minio.readfunctions. It enables post-filtering of object paths using a wildcard pattern (*,?), allowing exclusion of paths that pass the mainpathfilter but do not matchpath_filter.- Input connectors now support backpressure control via
max_backlog_size, allowing to limit the number of read events in processing per connector. This is useful when the data source emits a large initial burst followed by smaller, incremental updates. pw.reducers.count_distinctandpw.reducers.count_distinct_approximateto count the number of distinct elements in a table. Thepw.reducers.count_distinct_approximateallows you to save memory by decreasing the accuracy. It is possible to control this tradeoff by using theprecisionparameter.pw.Table.join(and its variants) now has two additional parameters -left_exactly_onceandright_exactly_once. If the elements from a side of a join should be joined exactly once,*_exactly_onceparameter of the side can be set toTrue. Then after getting a match an entry will be removed from the join state and the memory consumption will be reduced.
Changed
- Delta table compression logging has been improved: logs now include table names, and verbose messages have been streamlined while preserving details of important processing steps.
- Improved initialization speed of
pw.io.s3.readandpw.io.minio.read. pw.io.s3.readandpw.io.minio.readnow limit the number and the total size of objects to be predownloaded.- BREAKING optimized the implementation of
pw.reducers.min,pw.reducers.max,pw.reducers.argmin,pw.reducers.argmax,pw.reducers.anyreducers for append-only tables. It is a breaking change for programs using operator persistence. The persisted state will have to be recomputed. - BREAKING optimized the implementation of
pw.reducers.sumreducer onfloatandnp.ndarraycolumns. It is a breaking change for programs using operator persistence. The persisted state will have to be recomputed. - BREAKING the implementation of data persistence has been optimized for the case of many small objects in filesystem and S3 connectors. It is a breaking change for programs using data persistence. The persisted state will have to be recomputed.
- BREAKING the data snapshot logic in persistence has been optimized for the case of big input snapshots. It is a breaking change for programs using data persistence. The persisted state will have to be recomputed.
- Improved precision of
pw.reducers.sumonfloatcolumns by introducing Neumeier summation.
0.25.1 - 2025-07-24
Added
pw.xpacks.llm.mcp_server.PathwayMcpthat allows servingpw.xpacks.llm.document_store.DocumentStoreandpw.xpacks.llm.question_answeringendpoints as MCP (Model Context Protocol) tools.pw.io.dynamodb.writemethod for writing to Dynamo DB.
0.25.0 - 2025-07-10
Added
pw.io.questdb.writemethod for writing to Quest DB.pw.io.fs.readnow supports the"only_metadata"format. When this format is used, the table will contain only metadata updates for the tracked directory, without reading file contents.pw.Table.to_streamthat transforms a table to a stream of changes from this table.pw.Table.stream_to_table,pw.Table.from_streamsthat transform a streams of changes to tables.pw.Table.assert_append_onlythat sets append_only property of a table and verifies at runtime if the condition is met.
Changed
- BREAKING The Elasticsearch and BigQuery connectors have been moved to the Scale license tier. You can obtain the Scale tier license for free at https://pathway.com/get-license.
- BREAKING
pw.io.fs.readno longer acceptsformat="raw". Useformat="binary"to read binary objects,format="plaintext_by_file"to read plaintext objects per file, orformat="plaintext"to read plaintext objects split into lines. - BREAKING The
pw.io.s3_csv.readconnector has been removed. Please usepw.io.s3.readwithformat="csv"instead.
Fixed
pw.io.s3.readandpw.io.s3.writenow also check theAWS_PROFILEenvironment variable for AWS credentials if none are explicitly provided.
0.24.1 - 2025-07-03
Added
- Confluent Schema Registry support in Kafka and Redpanda input and output connectors.
Changed
pw.io.airbyte.readwill now retry the pip install command if it fails during the installation of a connector. It only applies when using the PyPI version of the connector, not the Docker one.- Environment variables used in YAML configuration files are no longer being parsed as if they were YAML files by the
pw.load_yaml. Now, the value of the environment variable is only parsed if it's an integer, a float or a boolean.
0.24.0 - 2025-06-26
Added
pw.io.mqtt.readandpw.io.mqtt.writemethods for reading from and writing to MQTT.
Changed
pw.xpacks.llm.embedders.SentenceTransformerEmbedderandpw.xpacks.llm.llms.HFPipelineChatare now computed in batches. The maximum size of a single batch can be set in the constructor with the argumentmax_batch_size.- BREAKING Arguments
api_keyandbase_urlforpw.xpacks.llm.llms.OpenAIChatcan no longer be set in the__call__method, and instead, if needed, should be set in the constructor. - BREAKING Argument
api_keyforpw.xpacks.llm.llms.OpenAIEmbeddercan no longer be set in the__call__method, and instead, if needed, should be set in the constructor. pw.io.postgres.writenow accepts arbitrary types for the values of thepostgres_settingsdict. If a value is not a string, Python'sstr()method will be used.
Removed
pw.io.kafka.read_from_upstashhas been removed, as the managed Kafka service in Upstash has been deprecated.
0.23.0 - 2025-06-12
Added
pw.io.deltalake.writenow accepts an optionalpw.io.deltalake.TableOptimizerobject that defines the settings for the runtime output table optimization.
Changed
- BREAKING: To use
pw.sqlyou now have to installpathway[sql].
Fixed
pw.io.deltalake.readnow correctly reads data from partitioned tables in all cases.- Added retries for all cloud-based persistence backend operations to improve reliability.
0.22.0 - 2025-06-05
Added
- Data persistence can now be configured to use Azure Blob Storage as a backend. An Azure backend instance can be created using
pw.persistence.Backend.azureand included in the persistence config. - Added batching to UDFs. It is now possible to make UDFs operate on batches of data instead of single rows. To do so
max_batch_sizeargument has to be set.
Changed
- BREAKING: when creating
pw.DateTimeUtcit is now obligatory to pass the time zone information. - BREAKING: when creating
pw.DateTimeNaivepassing time zone information is not allowed. - BREAKING: expressions are now evaluated in batches. Generally, it speeds up the computations but might increase the memory usage if the intermediate state in the expressions is large.
Fixed
- Synchronization groups now correctly handle cases where the source file-like object is updated during the reading process.
0.21.6 - 2025-05-29
Added
sort_bymethod topw.BaseCustomAccumulatorthat allows to sort rows within a single batch. Whensort_byis defined the rows are reduced in the order specified by thesort_bymethod. It can for example be used to process entries in the order of event time.
Changed
pw.Table.debugnow prints a whole row in a single line instead of printing each cell separately.- Calling functions without arguments in YAML configurations files is now deprecated in
pw.load_yaml. To call the function a mapping should be passed, e.g. empty mapping as{}. In the future!syntax without any mapping will be used to pass function objects without calling them. - The license check error message now provides a more detailed explanation of the failure.
- When code is run using
pathway spawnwith multiple processes, if one process terminates with an error, all other processes will also be terminated. pw.xpacks.llm.vector_store.VectorStoreServeris being deprecated, and it is now subclass ofpw.xpacks.llm.document_store.DocumentStore. Public API is being kept the same, however users are encouraged to switch to usingDocumentStorefrom now on.pw.xpacks.llm.vector_store.VectorStoreClientis being deprecated in favor ofpw.xpacks.llm.document_store.DocumentStoreClient.pw.io.deltalake.writecan now maintain the target table's snapshot on the output.
0.21.5 - 2025-05-09
Changed
pw.io.deltalake.readnow processes Delta table version updates atomically, applying all changes together in a single minibatch.- The panel widget for table visualization now has a horizontal scroll bar for large tables.
- Added the possibility to return value from any column from
pw.reducers.argmaxandpw.reducers.argmin, not onlyid.
Fixed
pw.reducers.argmaxandpw.reducers.argminwork correctly with the result ofpw.Table.windowby.
0.21.4 - 2025-04-24
Added
pw.io.kafka.readandpw.io.redpanda.readnow support static mode.
Changed
- The
inactivity_detectionfunction is now a method for append only tables. It no longer relies on an event timestamp column but now uses table processing times to detect inactivity periods.
0.21.3 - 2025-04-16
Fixed
- The performance of input connectors is optimized in certain cases.
- The panel widget for table visualization does now a better formatting for timestamps and missing values. The pagination was also updated to better fit the widget and the default sorters in snapshot mode have been fixed.
0.21.2 - 2025-04-10
Added
- Added synchronization group mechanism to align multiple data sources based on selected columns. It can be accessed with
pw.io.register_input_synchronization_group. pw.io.register_input_synchronization_groupnow supports the following types of columns:pw.DateTimeUtc,pw.DateTimeNaive,pw.DateTimeDuration, andint.
Changed
- Enhanced error reporting for runtime errors across most operators, providing a trace that simplifies identifying the root cause.
Fixed
- Bugfix for problem with list_documents() when no documents present in store.
- The append-only property of tables created by
pw.io.kafka.readis now set correctly.
0.21.1 - 2025-03-28
Changed
- Input connectors now throttle parsing error messages if their share is more than 10% of the parsing attempts.
- New flag
return_statusforinputs_querymethod inpw.xpacks.llm.DocumentStore. If set to True, DocumentStore returns the status of indexing for each file.
0.21.0 - 2025-03-19
Added
- All Pathway types can now be serialized to CSV using
pw.io.csv.writeand deserialized back usingpw.io.csv.read. pw.io.csv.readnow parses null-values in data when it can be done unambiguously.
Changed
- BREAKING: Updated endpoints in
pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer:- Deprecated:
/v1/pw_list_documents,/v1/pw_ai_answer - New:
/v2/list_documents,/v2/answer
- Deprecated:
- RAG methods under the
pw.xpacks.llm.question_answering.RAGClientare re-named, and they now use the new endpoints. Old methods are deprecated and will be removed in the future.pw_ai_summary->summarizepw_ai_answer->answerpw_list_documents->list_documents
- When
pw.io.deltalake.writecreates a table, it also stores its metadata in the columns of the created Delta table. This metadata can be used by Pathway when reading the table withpw.io.deltalake.readif noschemais specified. - The
schemaparameter is now optional forpw.io.deltalake.read. If the table was created by Pathway and theschemawas not specified by user, it is read from the table metadata. pw.io.deltalake.writenow aligns the output metadata with the existing table's metadata, preserving any custom metadata in the sink.- BREAKING: The
Bytestype is now serialized and deserialized with base64 encoding and decoding when the CSV format is used. - BREAKING: The
Durationtype is now serialized and deserialized as a number of nanoseconds when the CSV format is used. - BREAKING: The
tupleandnp.ndarraytypes are now serialized and deserialized as their JSON representations when the CSV format is used.
Fixed
pw.io.csv.writenow correctly escapes quote characters.table_parsing_strategy="llm"inDoclingParsernow works correctly
0.20.1 - 2025-03-07
Added
- Added
RecursiveSplitter pw.io.deltalake.writenow checks that the schema of the target table Delta Table corresponds to the schema of the Pathway table that is sent for the output. If the schemas differ, a human-readable error message is produced.
0.20.0 - 2025-02-25
Added
- Added structure-aware chunking for
DoclingParser. - Added
table_parsing_strategyforDoclingParser. - Column expressions
as_int(),as_float(),as_str(), andas_bool()now accept additional arguments,unwrapanddefault, to simplify null handling. - Support for python tuples in expressions.
Changed
- BREAKING: Changed the argument in
DoclingParserfromparse_images(bool) intoimage_parsing_strategy(Literal"llm" | None). - BREAKING:
doc_post_processorsargument in thepw.xpacks.llm.document_store.DocumentStorenow longer acceptspw.UDF. - Better error messages when using
pathway spawnwith multiple workers. Now error messages are printed only from the worker experiencing the error directly.
Fixed
doc_post_processorsargument in thepw.xpacks.llm.document_store.DocumentStorehad no effect. This is now fixed.
0.19.0 - 2025-02-20
Added
LLMRerankernow supports custom prompts as well as custom response parsers allowing for other ranking scales apart from default 1-5.pw.io.kafka.writeandpw.io.nats.writenow supportColumnReferenceas a topic name. When aColumnReferenceis provided, each message's topic is determined by the corresponding column value.pw.io.python.writeacceptingConnectorObserveras an alternative topw.io.subscribe.pw.io.iceberg.readandpw.io.iceberg.writenow support S3 as data backend and AWS Glue catalog implementations.- All output connectors now support the
sort_byfield for ordering output within a single minibatch. - A new UDF executor
pw.udfs.fully_async_executor. It allows for creation of non-blocking asynchronous UDFs which results can be returned in the future processing time. - A Future data type to represent results of fully asynchronous UDFs.
pw.Table.await_futuresmethod to wait for results of fully asynchronous UDFs.pw.io.deltalake.writenow supports partition columns specification.
Changed
- BREAKING: Changed the interface of
LLMReranker, theuse_logit_bias,cache_strategy,retry_strategyandkwargsarguments are no longer supported. - BREAKING: LLMReranker no longer inherits from pw.UDF
- BREAKING:
pw.stdlib.utils.AsyncTransformer.output_tablenow returns a table with columns with Future data type. pw.io.deltalake.readcan now read append-only tables without requiring explicit specification of primary key fields.
0.18.0 - 2025-02-07
Added
pw.io.postgres.writeandpw.io.postgres.write_snapshotnow handle serialization ofPyObjectWrapperandTimedeltaproperly.- New chunking options in
pathway.xpacks.llm.parsers.UnstructuredParser - Now all Pathway types can be serialized into JSON and consistently deserialized back.
table.col.dt.to_durationconverting an integer into apw.Duration.pw.Jsonnow supports storing datetime and duration type values in ISO format.
Changed
- BREAKING: Changed the interface of
UnstructuredParser - BREAKING: The
Pointertype is now serialized and deserialized as a string field in Iceberg and Delta Lake. - BREAKING: The
Bytestype is now serialized and deserialized with base64 encoding and decoding when the JSON format is used. A string field is used to store the encoded contents. - BREAKING: The
Arraytype is now serialized and deserialized as an object with two fields:shapedenoting the shape of the stored multi-dimensional array andelementsdenoting the elements of the flattened array. - BREAKING: Marked package as py.typed to indicate support for type hints.
Removed
- BREAKING: Removed undocumented
license_keyargument frompw.runandpw.run_allmethods. Instead,pw.set_license_keyshould be used.
0.17.0 - 2025-01-30
Added
pw.io.iceberg.readmethod for reading Apache Iceberg tables into Pathway.- methods
pw.io.postgres.writeandpw.io.postgres.write_snapshotnow accept an additional argumentinit_mode, which allows initializing the table before writing. pw.io.deltalake.readnow supports serialization and deserialization for all Pathway data types.- New parser
pathway.xpacks.llm.parsers.DoclingParsersupporting parsing of pdfs with tables and images. - Output connectors now include an optional
nameparameter. If provided, this name will appear in logs and monitoring dashboards. - Automatic naming for input and output connectors has been enhanced.
Changed
- BREAKING:
pw.io.deltalake.readnow requires explicit specification of primary key fields. - BREAKING:
pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerernow returns a dictionary frompw_ai_answerendpoint. pw.xpacks.llm.question_answering.BaseRAGQuestionAnswererallows optionally returning context documents frompw_ai_answerendpoint.- BREAKING: When using delay in temporal behavior, current time is updated immediately, not in the next batch.
- BREAKING: The
Pointertype is now serialized to Delta Tables as raw bytes. pw.io.kafka.writenow allows to specifykeyandheadersfor JSON and CSV data formats.persistent_idparameter in connectors has been renamed toname. This newnameparameter allows you to assign names to connectors, which will appear in logs and monitoring dashboards.- Changed names of parsers to be more consistent:
ParseUnstrutured->UnstructuredParser,ParseUtf8->Utf8Parser.ParseUnstruturedandParseUtf8are now deprecated.
Fixed
generate_classmethod inSchemanow correctly renders columns ofUnionTypeandNonetypes.- a bug in delay in temporal behavior. It was possible to emit a single entry twice in a specific situation.
pw.io.postgres.write_snapshotnow correctly handles tables that only have primary key columns.
Removed
- BREAKING:
pw.indexing.build_sorted_index,pw.indexing.retrieve_prev_next_values,pw.indexing.sort_from_indexandpw.indexing.SortedIndexare removed. Sorting is now done withpw.Table.sort. - BREAKING: Removed deprecated methods
pw.Table.unsafe_promise_same_universe_as,pw.Table.unsafe_promise_universes_are_pairwise_disjoint,pw.Table.unsafe_promise_universe_is_subset_of,pw.Table.left_join,pw.Table.right_join,pw.Table.outer_join,pw.stdlib.utils.AsyncTransformer.result. - BREAKING: Removed deprecated column
_pw_shardin the result ofwindowby. - BREAKING: Removed deprecated functions
pw.debug.parse_to_table,pw.udf_async,pw.reducers.npsum,pw.reducers.int_sum,pw.stdlib.utils.col.flatten_column. - BREAKING: Removed deprecated module
pw.asynchronous. - BREAKING: Removed deprecated access to functions from
pw.ioinpw. - BREAKING: Removed deprecated classes
pw.UDFSync,pw.UDFAsync. - BREAKING: Removed class
pw.xpack.llm.parsers.OpenParse. It's functionality has been replaced withpw.xpack.llm.parsers.DoclingParser. - BREAKING: Removed deprecated arguments from input connectors:
value_columns,primary_key,types,default_values. Schema should be used instead.
0.16.4 - 2025-01-09
Fixed
- Google Drive connector in static mode now correctly displays in jupyter visualizations.
0.16.3 - 2025-01-02
Added
pw.io.iceberg.writemethod for writing Pathway tables into Apache Iceberg.
Changed
- values of non-deterministic UDFs are not stored in tables that are
append_only. pw.Table.ixhas better runtime error message that includes id of the missing row.
Fixed
- temporal behaviors in temporal operators (
windowby,interval_join) now consume no CPU when no data passes through them.
0.16.2 - 2024-12-19
Added
pw.xpacks.llm.prompts.RAGPromptTemplate, set of prompt utilities that enable verifying templates and creating UDFs from prompt strings or callables.pw.xpacks.llm.question_answering.BaseContextProcessorstreamlines development and tuning of representing retrieved context documents to the LLM.pw.io.kafka.readnow supportswith_metadataflag, which makes it possible to attach the metadata of the Kafka messages to the table entries.pw.io.deltalake.readcan now stream the tables with deletions, if no deletion vectors were used.
Changed
pw.io.sharepoint.readnow explicitly terminates with an error if it fails to read the data the specified number of times per row (the default is8).pw.xpacks.llm.prompts.prompt_qa, and other prompts expect 'context' and 'query' fields instead of 'docs'.- Removed support for
short_prompt_templateandlong_prompt_templateinpw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer. These prompt variants are no longer accepted during construction or in requests. pw.xpacks.llm.question_answering.BaseRAGQuestionAnswererallows setting user created prompts. Templates are verified to include 'context' and 'query' placeholders.pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerercan take aBaseContextProcessorthat represents context documents to the LLM. Defaults topw.xpacks.llm.question_answering.SimpleContextProcessorwhich filters metadata fields and joins the documents with new lines.
Fixed
- The input of
pw.io.fs.readandpw.io.s3.readis now correctly persisted in case deletions or modifications of already processed objects take place.
0.16.1 - 2024-12-12
Changed
pw.io.s3.readnow monitors object deletions and modifications in the S3 source, when ran in streaming mode. When an object is deleted in S3, it is also removed from the engine. Similarly, if an object is modified in S3, the engine updates its state to reflect those changes.pw.io.s3.readnow supportswith_metadataflag, which makes it possible to attach the metadata of the source object to the table entries.
Fixed
pw.xpacks.llm.document_store.DocumentStoreno longer requires_metadatacolumn in the input table.
0.16.0 - 2024-11-29
Added
pw.xpacks.llm.document_store.SlidesDocumentStore, which is a subclass ofpw.xpacks.llm.document_store.DocumentStorecustomized for retrieving slides from presentations.pw.temporal.inactivity_detectionandpw.temporal.utc_nowfunctions allowing for alerting and other time dependent usecases
Changed
pw.Table.concat,pw.Table.with_id,pw.Table.with_id_fromno longer perform checks if ids are unique. It improves memory usage.- table operations that store values (like
pw.Table.join,pw.Table.update_cells) no longer store columns that are not used downstream. append_onlycolumn property is now propagated better (there are more places where we can infer it).- BREAKING: Parsers and parser utilities including
OpenParse,ParseUnstructured,ParseUtf8,parse_imagesare now async. Parser interface in theVectorStoreandDocumentStoreremains unchanged. - BREAKING: Unused arguments from the constructor
pw.xpacks.llm.question_answering.DeckRetrieverare no longer accepted.
Fixed
query_as_of_nowofpw.stdlib.indexing.DataIndexandpw.stdlib.indexing.HybridIndexnow work in constant memory for infinite query stream (no query-related data is kept after query is answered).
0.15.4 - 2024-11-18
Added
pw.io.kafka.readnow supports reading entries starting from a specified timestamp.pw.io.nats.readandpw.io.nats.writemethods for reading from and writing Pathway tables to NATS.
Changed
pw.Table.diffnow supports settinginstanceparameter that allows computing differences for multiple groups.pw.io.postgres.write_snapshotnow keeps the Postgres table fully in sync with the current state of the table in Pathway. This means that if an entry is deleted in Pathway, the same entry will also be deleted from the Postgres table managed by the output connector.
Fixed
pw.PyObjectWrapperis now picklable.query_as_of_nowofpw.stdlib.indexing.DataIndexandpw.stdlib.indexing.HybridIndexnow work in constant memory for infinite query stream (no query-related data is kept after query is answered).
0.15.3 - 2024-11-07
Added
pw.io.mongodb.writeconnector for writing Pathway tables in MongoDB.pw.io.s3.readnow supports downloading objects from an S3 bucket in parallel.
Changed
pw.io.fs.readperformance has been improved for directories containing a large number of files.
0.15.2 - 2024-10-24
Added
pw.io.deltalake.readnow supports custom S3 Delta Lakes with HTTP endpoints.pw.io.deltalake.readnow supports specifying both a custom endpoint and a custom region for Delta Lakes viapw.io.s3.AwsS3Settings.
Changed
- Indices in
pathway.stdlib.indexing.nearest_neighborscan now work also on numpy arrays. Previously they only acceptedlist[float]. Working with numpy arrays improves memory efficiency. pw.io.s3.readhas been optimized to minimize new object requests whenever possible.- It is now possible to set the size limit of cache in
pw.udfs.DiskCache. - State persistence now uses a single backend for both metadata and stream storage. The
pw.persistence.Config.simple_configmethod is therefore deprecated. Now you can use thepw.persistence.Configconstructor with the same parameters that were previously used insimple_config.
Fixed
pw.io.bigquery.writeconnector now correctly handlespw.Jsoncolumns.
0.15.1 - 2024-10-04
Fixed
pw.temporal.sessionandpw.temporal.asof_joinnow correctly works with multiple entries with the same time.- Fixed an issue in
pw.stdlib.indexingwhere filters would cause runtime errors while usingHybridIndexFactory.
0.15.0 - 2024-09-12
Added
- Experimental A
pw.xpacks.llm.document_store.DocumentStoreto process and index documents. pw.xpacks.llm.servers.DocumentStoreServerused to expose REST server for retrieving documents frompw.xpacks.llm.document_store.DocumentStore.pw.xpacks.stdlib.indexing.HybridIndexused for querying multiple indices and combining their results.pw.io.airbyte.readnow also supports streams that only operate infull_refreshmode.
Changed
- Running servers for answering queries is extracted from
pw.xpacks.llm.question_answering.BaseRAGQuestionAnswererintopw.xpacks.llm.servers.QARestServerandpw.xpacks.llm.servers.QASummaryRestServer. - BREAKING:
queryandquery_as_of_nowofpathway.stdlib.indexing.data_index.DataIndexnow produce an empty list instead ofNoneif no match is found.
0.14.3 - 2024-08-22
Fixed
pw.io.deltalake.readandpw.io.deltalake.writenow correctly work with lakes hosted in S3 over min.io, Wasabi and Digital Ocean.
Added
- The Pathway CLI command
spawncan now execute code directly from a specified GitHub repository. - A new CLI command,
spawn-from-env, has been added. This command runs the Pathway CLIspawncommand using arguments provided in thePATHWAY_SPAWN_ARGSenvironment variable.
0.14.2 - 2024-08-06
Fixed
- Switched
pw.xpacks.llm.embedders.GeminiEmbedderto be sync to resolve compatibility issues with the Google Colab runs. - Pinned
surya-ocrmodule version for stability.
0.14.1 - 2024-08-05
Added
pw.xpacks.llm.embedders.GeminiEmbedderwhich is a wrapper for Google Gemini Embedding services.
0.14.0 - 2024-07-25
Fixed
pw.debug.table_to_pandasnow exportsint | Nonecolumns correctly.
Changed
pw.io.airbyte.readcan now be used with Airbyte connectors implemented in Python without requiring Docker.- BREAKING: UDFs now verify the type of returned values at runtime. If it is possible to cast a returned value to a proper type, the values is cast. If the value does not match the expected type and can't be cast, an error is raised.
- BREAKING:
pw.reducers.ndarrayreducer requires input column to either have typefloat,intorArray. pw.xpacks.llm.parsers.OpenParsecan now extract and parse images & diagrams from PDFs. This can be enabled by setting theparse_images.processing_pipelinecan be also set to customize the post processing of doc elements.
0.13.2 - 2024-07-08
Added
pw.io.deltalake.readnow supports S3 data sources.pw.xpacks.llm.parsers.ImageParserwhich allows parsing images with the vision LMs.pw.xpacks.llm.parsers.SlideParserthat enables parsing PDF and PPTX slides with the vision LMs.pw.xpacks.llm.parsers.question_answering.RAGClient, Python client for Pathway hosted RAG apps.pw.xpacks.llm.parsers.question_answeringDeckRetriever, a RAG app that enables searching through slide decks with visual-heavy elements.
Fixed
pw.xpacks.llm.vector_store.VectorStoreServernow uses new indexes.
Changed
pw.xpacks.llm.parsers.OpenParsenow supports any vision Language model including local and proprietary models via LiteLLM.
0.13.1 - 2024-06-27
Added
pw.io.kafka.readnow accepts an autogenerate_key flag. This flag determines the primary key generation policy to apply when reading raw data from the source. You can either use the key from the Kafka message or have Pathway autogenerate one.pw.io.deltalake.readinput connector that fetches changes from DeltaLake into a Pathway table.pw.xpacks.llm.parsers.OpenParsewhich allows parsing tables and images in PDFs.
Fixed
- All S3 input connectors (including S3, Min.io, Digital Ocean, and Wasabi) now automatically retry network operations if a failure occurs.
- The issue where the connection to the S3 source fails after partially ingesting an object has been resolved by downloading the object in full first.
0.13.0 - 2024-06-13
Added
pw.io.deltalake.writenow supports S3 destinations.
Changed
pw.debug.compute_and_printnow allows passing more than one table.- BREAKING:
pathparameter inpw.io.deltalake.writerenamed touri.
Fixed
- A bug in
pw.Table.deduplicate. Ifpersistent_idis not set, it is no longer generated inpw.PersistenceMode.SELECTIVE_PERSISTINGmode.
0.12.0 - 2024-06-08
Added
pw.PyObjectWrapperthat enables passing python objects of any type to the engine.cache_strategyoption added forpw.io.http.rest_connector. It enables cache configuration, which is useful for duplicated requests.allow_missesargument toTable.ixandTable.ix_refmethods which allows for filling rows with missing keys with None values.pw.io.deltalake.writeoutput connector that streams the changes of a given table into a DeltaLake storage.pw.io.airbyte.readnow supports data extraction with Google Cloud Runs.
Removed
- BREAKING: Removed
Table.havingmethod. - BREAKING: Removed
pw.DATE_TIME_UTC,pw.DATE_TIME_NAIVEandpw.DURATIONas dtype markers. Instead,pw.DateTimeUtc,pw.DateTimeNaiveandpw.Durationshould be used, which are wrappers for corresponding pandas types. - BREAKING: Removed class transformers from public API:
pw.ClassArg,pw.attribute,pw.input_attribute,pw.input_method,pw.method,pw.output_attributeandpw.transformer. - BREAKING: Removed several methods from
pw.indexingmodule:binsearch_oracle,filter_cmp_helper,filter_smallest_kandprefix_sum_oracle.
0.11.2 - 2024-05-27
Added
pathway.assert_table_has_schemaandpathway.table_transformernow acceptallow_subtypeargument, which, if True, allows column types in the Table be subtypes of types in the Schema.nextmethod topw.io.python.ConnectorSubject(python connector) that enables passing values of any type to the engine, not only values that are json-serializable. Thenextmethod should be the preferred way of passing values from the python connector.
Changed
- The
formatargument ofpw.io.python.readis deprecated. A data format is inferred from the method used (next_json,next_str,next_bytes) and the provided schema.
Removed
- Removed
pw.numba_applyandnumbadependency.
Fixed
- Fixed
pw.thisdesugaring bug, where__getitem__in.ixcontext was not working properly. pw.io.sqlite.readnow checks if the data matches the passed schema.
0.11.1 - 2024-05-16
Added
queryandquery_as_of_nowofpathway.stdlib.indexing.data_index.DataIndexnow accept inmetadata_columnparameter a column with data of typestr | None.pathway.xpacks.connectors.sharepointmodule, available with Pathway Scale License.
0.11.0 - 2024-05-10
Added
- Embedders in the LLM xpack now have method
get_embedding_dimensionthat returns number of dimension used by the chosen embedder. pathway.stdlib.indexing.nearest_neighbors, with implementations ofpathway.stdlib.indexing.data_index.InnerIndexbased on k-NN via LSH (implemented in Pathway), and k-NN provided by USearch library.pathway.stdlib.indexing.vector_document_index, with a few predefined instances ofpathway.stdlib.indexing.data_index.DataIndex.pathway.stdlib.indexing.bm25, with implementations ofpathway.stdlib.indexing.data_index.InnerIndexbased on BM25 index provided by Tantivy.pathway.stdlib.indexing.full_text_document_index, with a predefined instance ofpathway.stdlib.indexing.data_index.DataIndex.- Introduced the
rerankermodule underllm.xpacks. Includes few re-ranking strategies and utility functions for RAG applications.
Changed
- BREAKING:
windowbygenerates IDs of produced rows differently than in the previous version. - BREAKING:
pw.io.csv.writeprints printable non-ascii characters as regular text, not\u{xxxx}. - BREAKING: Connector methods
pw.io.elasticsearch.read,pw.io.debezium.read,pw.io.fs.read,pw.io.jsonlines.read,pw.io.kafka.read,pw.io.python.read,pw.io.redpanda.read,pw.io.s3.readnow check the type of the input data. Previously it was not checked if the provided format was"json"/"jsonlines". If the data is inconsistent with the provided schema, the row is skipped and the error message is emitted. - BREAKING:
queryandquery_as_of_nowmethods ofpathway.stdlib.indexing.data_index.DataIndexnow returnpathway.JoinResult, to allow resolving column name conflicts (between columns in the table with queries and table with index data). - BREAKING: DataIndex methods
queryandquery_as_of_nownow return score in a column named_pw_index_reply_score(defined as_SCOREvariable inpathway.stdlib.indexing.colnames.py).
Removed
- BREAKING:
pathway.stdlib.indexing.data_index.VectorDocumentIndexclass, some predefined instances are now meant to be obtained via methods provided inpathway.stdlib.indexing.vector_document_index. - BREAKING:
with_distancesparameter ofqueryandquery_as_of_nowmethods inpathway.stdlib.indexing.data_index.DataIndex. Instead of 'distance', we now operate with a more general term 'score' (higher = better). For distance based indices score is usually defined as negative distance. Score is now always included in the answer, as long as underlying index returns something that indicates quality of a match.
0.10.1 - 2024-04-30
Added
querymethod to VectorStoreServer to enable compatible API withDataIndex.AdaptiveRAGQuestionAnswererto xpacks.question_answering. End-to-end pipeline and accompanying code forPrivate RAGshowcase.
0.10.0 - 2024-04-24
Added
- Pathway now warns when unintentionally creating Table with empty universe.
pw.io.kafka.writeinrawandplaintextformats now supports output for tables with multiple columns. For such tables, it requires the specification of the column that must be used as a value of the produced Kafka messages and gives a possibility to provide column which must be used as a key.pw.io.kafka.writecan now output values from the table using Kafka message headers in 'raw' and 'plaintext' output format.
Changed
instancearguments togroupby,join,with_id_fromnow determine how entries are distributed between machines.flattenresults remain on the same machine as their source entries.joinsends each record between machines at most once.- BREAKING:
flatten,join,groupby(if used withinstance),with_id_from(if used withinstance) generate IDs of the produced rows differently than in the previous versions. pathway spawnwith multiple workers prints only output from the first worker.
0.9.0 - 2024-04-18
Added
pw.reducers.latestandpw.reducers.earliestthat return the value with respectively maximal and minimal processing time assigned.pw.io.kafka.writecan now produce messages containing raw bytes in case the table consists of a single binary column andrawmode is specified. Similarly, this method will provide plaintext messages ifplaintextmode is chosen and the table consists of a single string-typed column.pw.io.pubsub.writeconnector for publishing Pathway tables into Google PubSub.- Argument
strict_prompttoanswer_with_geometric_rag_strategyandanswer_with_geometric_rag_strategy_from_indexthat allows optimizing prompts for smaller open-source LLM models. - Temporarily switch LiteLLMChat's generation method to sync version due to a bug while using
jsonmode with Ollama.
Changed
- BREAKING:
pw.io.kafka.readwill not parse the messages from UTF-8 in caserawmode was specified. To preserve this behavior you can use theplaintextmode. - BREAKING:
Table.flattennow flattens one column and spreads every other column of the table, instead of taking other columns from the argument list.
0.8.6 - 2024-04-10
Added
pw.io.bigquery.writeconnector for writing Pathway tables into Google BigQuery.- parameter
filepath_globpatterntoquerymethod inVectorStoreClientfor specifying which files should be considered in the query. - Improved compatibility of
pw.Jsonwith standard methods such aslen(),int(),float(),bool(),iter(),reversed()when feasible.
Changed
pw.io.postgres.writecan now parallelize writes to several threads if several workers are configured.- Pathway now checks types of pointers rigorously. Indexing table with mismatched number/types of columns vs what was used to create index will now result in a TypeError.
pw.Json.as_float()method now supports integer JSON values.
0.8.5 - 2024-03-27
Added
- New function
answer_with_geometric_rag_strategy_from_index, which allows to useanswer_with_geometric_rag_strategywithout the need to first retrieve documents from index. - Added support for custom state serialization to
udf_reducer. - Introduced
instanceparameter inAsyncTransformer. All calls with a given(instance, processing_time)pair are returned at the same processing time. Ordering is preserved within a single instance. - Added
successful,failed,finishedproperties toAsyncTransformer. They return tables with successful calls, failed calls and all finished calls, respectively.
Changed
- Property
resultofAsyncTransformeris deprecated. Propertysuccessfulshould be used instead. pw.io.csv.read,pw.io.jsonlines.read,pw.io.fs.read,pw.io.plaintext.readnow handlepathas a glob pattern and read all matched files and directories recursively.
0.8.4 - 2024-03-18
Fixed
- Pathway will only require
LiteLLMpackage, if you use one of the wrappers forLiteLLM. - Retries are implemented in
pw.io.airbyte.read. - State processing protocol is updated in
pw.io.airbyte.read.
0.8.3 - 2024-03-13
Added
- New parameters of
pw.UDFclass andpw.udfdecorator:return_type,deterministic,propagate_none,executor,cache_strategy. - The LLM Xpack now provides integrations with LlamaIndex and LangChain for running the Pathway VectorStore server.
Changed
- Subclassing
UDFSyncandUDFAsyncis deprecated.UDFshould be subclassed to create a new UDF. - Passing keyword arguments to
pw.apply,pw.apply_with_type,pw.apply_asyncis deprecated. In the future, they'll be used for configuration, not passing data to the function.
Fixed
- Fixed a minor bug with
Table.groupby()method which sometimes prevented of accessing certain columns in the followingreduce(). - Fixed warnings from using OpenAI Async embedding model in the VectorStore in Colab.
0.8.2 - 2024-02-28
Added
%:ztimezone format code tostrptime.- Support for Airbyte connectors
pw.io.airbyte.
0.8.1 - 2024-02-15
Added
- Introduced the
send_alertsfunction in thepw.io.slacknamespace, enabling users to send messages from a specified column directly to a Slack channel. - Enhanced the
pw.io.http.rest_connectorby introducing an additional argument calledrequest_validator. This feature empowers users to validate payloads and raise anHTTP 400error if necessary.
Fixed
- Addressed an issue in
pw.io.xpacks.llm.VectorStoreServerwhere the computation of the last modification timestamp for an indexed document was incorrect.
Changed
- Improved the behavior of
pw.io.kafka.write. It now includes retries when sending data to the output topic encounters failures.
0.8.0 - 2024-02-01
Added
pw.io.http.rest_connectornow supports multiple HTTP request types.pw.io.http.PathwayWebservernow allows Cross-Origin Resource Sharing (CORS) to be enabled on newly added endpoints- Wrappers for LiteLLM and HuggingFace chat services and SentenceTransformers embedding service are now added to Pathway xpack for LLMs.
Changed
pw.runnow includes an additional parameterruntime_typecheckingthat enables strict type checking at runtime.- Embedders in pathway.xpacks.llm.embedders now correctly process empty strings as queries.
- BREAKING:
pw.runandpw.run_allnow only accept keyword arguments.
Fixed
pw.Durationcan now be returned from User-Defined Functions (UDFs) or used as a constant value without resulting in errors.pw.io.debezium.readnow correctly handles tables that do not have a primary key.
0.7.10 - 2024-01-26
Added
pw.io.http.rest_connectorcan now generate Open API 3.0.3 schema that will be returned by the route/_schema.- Wrappers for OpenAI Chat and Embedding services are now added to Pathway xpack for LLMs.
- A vector indexing pipeline that allows querying for the most similar documents. It is available as class
VectorStoreas part of Pathway xpack for LLMs.
Fixed
pw.debug.table_from_markdownnow uses schema parameter (when set) to properly assign simple types (int, bool, float, str, bytes) and optional simple types to columns.
0.7.9 - 2024-01-18
Changed
pw.io.http.rest_connectornow also accepts port as a string for backwards compatibility.pw.stdlib.ml.index.KNNIndexnow sorts by distance by default.
0.7.8 - 2024-01-18
Added
- Support for comparisons of tuples has been added.
- Standalone versions of methods such as
pw.groupby,pw.join,pw.join_inner,pw.join_left,pw.join_right, andpw.join_outerare now available. - The
absfunction from Python can now be used on Pathway expressions. - The
asof_joinmethod now has configurable temporal behavior. Thebehaviorparameter can be used to pass the configuration. - The state of the
deduplicateoperator can now be persisted.
Changed
interval_joincan now work with intervals of zero length.- The
pw.io.http.rest_connectorcan now open multiple endpoints on the same port using a newpw.io.http.PathwayWebserverclass. - The
pw.xpacks.connectors.sharepoint.readandpw.io.gdrive.readmethods now support the size limit for a single object. If set, it will exclude too large files and won't read them.
0.7.7 - 2023-12-27
Added
- pathway.xpacks.llm.splitter.TokenCountSplitter.
0.7.6 - 2023-12-22
New Features
Conversion Methods in pw.Json
- Introducing new methods for strict conversion of
pw.Jsonto desired types within a UDF body:as_int()as_float()as_str()as_bool()as_list()as_dict()
DateTime Functionality
- Added
table.col.dt.utc_from_timestampmethod: CreatesDateTimeUtcfrom timestamps represented asints orfloats. - Enhanced the
table.col.dt.timestampmethod with a newunitargument to specify the unit of the returned timestamp.
Experimental Features
- Introduced an experimental xpack with a Microsoft SharePoint input connector.
Enhancements
Improved JSON Handling
- Index operator (
[]) can now be directly applied topw.Jsonwithin UDFs to access elements of JSON objects, arrays, and strings.
Expanded Timestamp Functionality
- Enhanced the
table.col.dt.from_timestampmethod to createDateTimeNaivefrom timestamps represented asints orfloats. - Deprecated not specifying the
unitargument of thetable.col.dt.timestampmethod.
KNNIndex Enhancements
KNNIndexnow supports returning computed distances.- Added support for cosine similarity in
KNNIndex.
Deprecated Features
- The
offsetargument ofpw.stdlib.temporal.slidingandpw.stdlib.temporal.tumblingis deprecated. Useorigininstead, as it represents a point in time, not a duration.
Bug Fixes
DateTime Fixes
- Sliding window now works correctly with UTC Datetimes.
asof_join Improvements
- Temporal column in
asof_joinno longer has to be namedt. asof_joinincludes rows with equal times for all values of thedirectionparameter.
Fixed Issues
- Fixed an issue with
pw.io.gdrive.read: Shared folders support is now working seamlessly.
0.7.5 - 2023-12-15
Added
- Added Table.split() method for splitting table based on an expression into two tables.
- Columns with datatype duration can now be multiplied and divided by floats.
- Columns with datatype duration now support both true and floor division (
/and//) by integers.
Changed
- Pathway is better at typing if_else expressions when optional types are involved.
table.flatten()operator now supports Json array.- Buffers (used to delay outputs, configured via delay in
common_behavior) now flush the data when the computation is finished. The effect of this change can be seen when run in bounded (batch / multi-revision) mode. pw.io.subscribe()takes additional argumenton_time_end- the callback function to be called on each closed time of computation.pw.io.subscribe()is now a single-worker operator, guaranteeing thaton_endis triggered at most once.KNNIndexsupports now metadata filtering. Each query can specify it's own filter in the JMESPath format.
Fixed
- Resolved an optimization bug causing
pw.iterateto malfunction when handling columns effectively pointing to the same data.
0.7.4 - 2023-12-05
Changed
- Pathway now keeps track of
arraycolumntype better - it is able to keep track of Array dtype and number of dimensions, wherever applicable.
Fixed
- Fixed issues with standalone panel+Bokeh dashboards to ensure optimal functionality and performance.
0.7.3 - 2023-11-30
Added
- A method
weekdayhas been added to thedtnamespace, that can be called on column expressions containing datetime data. This method returns an integer that represents the day of the week. - EXPERIMENTAL: Methods
showandploton Tables, providing visualizations of data using HoloViz Panel. - Added support for
instanceparameter togroupby,join,windowbyand temporal join methods. pw.PersistenceMode.UDF_CACHINGpersistence mode enabling automatic caching ofAsyncTransformerinvocations.
Changed
- Methods
roundandflooron columns with datetimes now accept duration argument to be a string. pw.debug.compute_and_printandpw.debug.compute_and_print_update_streamhave a new argumentn_rowsthat limits the number of rows printed.pw.debug.table_to_pandashas a new argumentinclude_id(by defaultTrue). If set toFalse, creates a new index for the Pandas DataFrame, rather than using the keys of the Pathway Table.windowbyfunctionshardargument is now deprecated andinstanceshould be used.- Special column name
_pw_shardis now deprecated, and_pw_instanceshould be used. pw.ReplayModenow can be accessed aspw.PersistenceMode, while theSPEEDRUNandREALTIMEvariants are now accessible asSPEEDRUN_REPLAYandREALTIME_REPLAY.- EXPERIMENTAL:
pw.io.gdrive.readhas a new argumentwith_metadata(by defaultFalse). If set toTrue, adds a_metadatacolumn containing file metadata to the resulting table. - Methods
get_nearest_itemsandget_nearest_items_asof_nowofKNNIndexallow to specifyk(number of returned elements) separately in each query.
0.7.2 - 2023-11-24
Added
- Added ability of creating custom reducers using
pw.reducers.udf_reducerdecorator. Usepw.BaseCustomAccumulatoras a base class for creating accumulators. Decorating accumulator returns reducer following custom logic. - A function
pw.debug.compute_and_print_update_streamthat computes and prints the update stream of the table. - SQLite input connector (
pw.io.sqlite).
Changed
pw.debug.parse_to_tableis now deprecated,pw.debug.table_from_markdownshould be used instead.pw.schema_from_csvnow hasquoteanddouble_quote_escapesarguments.
Fixed
- Schema returned from
pw.schema_from_csvwill have quotes removed from column names, so it will now work properly withpw.io.csv.read.
0.7.1 - 2023-11-17
Added
- Experimental Google Drive input connector.
- Stateful deduplication function (
pw.stateful.deduplicate) allowing alerting on significant changes. - The ability to split data into batches in
pw.debug.table_from_markdownandpw.debug.table_from_pandas.
0.7.0 - 2023-11-16
Added
- class
Behavior, a superclass of all behavior classes. - class
ExactlyOnceBehaviorindicating we want to create aCommonBehaviorthat results in each window producing exactly one output (shifted in time by an optionalshiftparameter). - function
exactly_once_behaviorcreating an instance ofExactlyOnceBehavior.
Changed
- BREAKING:
WindowBehavioris now calledCommonBehavior, as it can be also used with interval joins. - BREAKING:
window_behavioris now calledcommon_behavior, as it can be also used with interval joins. - Deprecating parameter
keep_queriesinpw.io.http.rest_connector. Nowdelete_completed_querieswith an opposite meaning should be used instead. The default is stilldelete_completed_queries=True(equivalent tokeep_queries=False) but it will soon be required to be set explicitly.
0.6.0 - 2023-11-10
Added
- A flag
with_metadatafor the filesystem-based connectors to attach the source file metadata to the table entries. - Methods
pw.debug.table_from_list_of_batchesandpw.debug.table_from_list_of_batches_by_workersfor creating tables with defined data being inserted over time.
Changed
- BREAKING:
pw.debug.table_from_pandasandpw.debug.table_from_markdownnow will create tables in the streaming mode, instead of static, if given table definition contains_timecolumn. - BREAKING: Renamed the parameter
keep_queriesinpw.io.http.rest_connectortodelete_querieswith the opposite meaning. It changes the default behavior - it waskeep_queries=False, now it isdelete_queries=False.
0.5.3 - 2023-10-27
Added
- A method
get_nearest_items_asof_nowinKNNIndexthat allows to get nearest neighbors without updating old queries in the future. - A method
asof_now_joininTableto join rows from left side of the join with right side of the join at their processing time. Past rows from left side are not used when new data appears on the right side.
0.5.2 - 2023-10-19
Added
interval_joinnow supports forgetting old entries. The configuration can be passed usingbehaviorparameter ofinterval_joinmethod.- Decorator
@table_transformerfor marking that functions take Tables as arguments. - Namespace for all columns
Table.C.*. - Output connectors now provide logs about the number of entries written and time taken.
- Filesystem connectors now support reading whole files as rows.
- Command line option for
pathway spawnto record data andpathway replaycommand to replay data.
0.5.1 - 2023-10-04
Fixed
selectoperates only on consistent states.
0.5.0 - 2023-10-04
Added
Schemamethodtypehintsthat returns dict of mypy-compatible typehints.- Support for JSON parsing from CSV sources.
restrictmethod inTableto restrict table universe to the universe of the other table.- Better support for postgresql types in the output connector.
Changed
- BREAKING: renamed
Tablemethoddtypestotypehints. It now returns adictof mypy-compatible typehints. - BREAKING:
Schema.__getitem__returns a data classColumnSchemacontaining all related information on particular column. - BREAKING:
tuplereducer used after intervals_over window now sorts values by time. - BREAKING: expressions used in
select,filter,flatten,with_columns,with_id,with_id_fromhave to have the same universe as the table. Earlier it was possible to use an expression from a superset of a table universe. To use expressions from wider universes, one can userestricton the expression source table. - BREAKING:
pw.universes.promise_are_equal(t1, t2)no longer allows to use references fromt1andt2in a single expression. To change the universe of a table, usewith_universe_of. - BREAKING:
ixandix_refare temporarily broken inside joins (both temporal and ordinary). select,filter,concatkeep columns as a single stream. The work for other operators is ongoing.
Fixed
- Optional types other than string correctly output to PostgreSQL.
0.4.1 - 2023-09-25
Added
- Support for messages compressed with zstd in the Kafka connector.
0.4.0 - 2023-09-21
Added
- Support for JSON data format, including
pw.Jsontype. - Methods
as_int(),as_float(),as_str(),as_bool()to convert values fromJson. - New argument
skip_nonesfortupleandsorted_tuplereducers. - New argument
is_outerforintervals_overwindow. pw.schema_from_dictandpw.schema_from_csvfor generating schema based, respectively, on provided definition as a dictionary and CSV file with sample data.generate_classmethod inSchemaclass for generating schema class code.
Changed
- Method
get()and[]to support accessing elements in Jsons. - Function
pw.assert_table_has_schemafor writing asserts checking, whether given table has the same schema as the one that is given as an argument. - BREAKING:
ixandix_refoperations are now standalone transformations ofpw.Tableintopw.Table. Most of the usages remain the same, but sometimes user needs to provide a context (when e.g. using them insidejoinorgroupbyoperations).ixandix_refare temporarily broken inside temporal joins.
Fixed
- Fixed a bug where new-style optional types (e.g.
int | None) were translated toAnydtype.
0.3.4 - 2023-09-18
Fixed
- Incompatible
beartypeversion is now excluded from dependencies.
0.3.3 - 2023-09-14
Added
- Module
pathway.dtto construct and manipulate DTypes. - New argument
keep_queriesinpw.io.http.rest_connector.
Changed
- Internal representation of DTypes. Inputting types is compatible backwards.
- Temporal functions now accept arguments of mixed types (ints and floats). For example,
pw.temporal.intervalcan use ints while columns it interacts with are floats. - Single-element arrays are now treated as arrays, not as scalars.
Fixed
to_string()method on datetimes always prints 9 fractional digits.%fformat code instrptime()parses fractional part of a second correctly regardless of the number of digits.
0.3.2 - 2023-09-07
Added
Table.cast_to_types()function that can performpathway.caston multiple columns.intervals_overwindow, which allows to get temporally close data to given times.demo.replay_csv_with_timefunction that can replay a CSV file following the timestamps of a given column.
Fixed
- Static data is now copied to ensure immutability.
- Improved error tracing mechanism to work with any type of error.
0.3.1 - 2023-08-29
Added
tuplereducer, that returns a tuple with values.ndarrayreducer, that returns an array with values.
Changed
numpyarrays ofint32,uint32andfloat32are now converted to their 64-bit variants instead of tuples.- KNNIndex interface to take columns as inputs.
- Reducers now check types of their arguments.
Fixed
- Fixed delayed reporting of output connector errors.
- Python objects are now freed more often, reducing peak memory usage.
0.3.0 - 2023-08-07
Added
@(matrix multiplication) operator.
Changed
- Python version 3.10 or later is now required.
- Type checking is now more strict.
0.2.1 - 2023-07-31
Changed
- Immediately forget queries in REST connector.
- Make type annotations mandatory in
Schema.
Fixed
- Fixed IDs coming from CSV source.
- Fixed indices of dataframes from pandas transformer.