pathway.io.plaintext package

Functions

pw.io.plaintext.read(path, *, mode='streaming', persistent_id=None, autocommit_duration_ms=1500, debug_data=None)

Reads a table from a text file or a directory of text files. The resulting table will consist of a single column `data`, and have the number of rows equal to the number of lines in the file. Each cell will contain a single line from the file.

In case the folder is specified, and there are several files placed in the folder, their order is determined according to their modification times: the smaller the modification time is, the earlier the file will be passed to the engine.

  • Parameters
    • path (Union[str, PathLike]) – Path to a file or to a folder.
    • mode (str) – If set to “streaming”, the engine will wait for the new input files in the directory. Set it to “static”, it will only consider the available data and ingest all of it in one commit. Default value is “streaming”.
    • persistent_id (Optional[str]) – (unstable) An identifier, under which the state of the table will be persisted or None, if there is no need to persist the state of this table. When a program restarts, it restores the state for all input tables according to what was saved for their persistent_id. This way it’s possible to configure the start of computations from the moment they were terminated last time.
    • autocommit_duration_ms (Optional[int]) – the maximum time between two commits. Every autocommit_duration_ms milliseconds, the updates received by the connector are committed and pushed into Pathway’s computation graph.
    • debug_data – Static data replacing original one when debug mode is active.
  • Returns
    Table – The table read.

Example:

import pathway as pw
t = pw.io.plaintext.read("raw_dataset/lines.txt")