pathway.io.plaintext package


pathway.io.plaintext.read(path, mode='streaming', persistent_id=None, debug_data=None)

Reads a table from a text file or a directory of text files. The resulting table will consist of a single column data, and have the number of rows equal to the number of lines in the file. Each cell will contain a single line from the file.

In case the folder is specified, and there are several files placed in the folder, their order is determined according to their modification times: the smaller the modification time is, the earlier the file will be passed to the engine.

  • Parameters
    • path (str) – Path to a file or to a folder.
    • mode (str) – If set to “streaming”, the engine will wait for the new input files in the directory. Set it to “static”, it will only consider the available data and ingest all of it in one commit. Default value is “streaming”.
    • persistent_id (Optionalint) – (unstable) An identifier, under which the state of the table will be persisted or None, if there is no need to persist the state of this table. When a program restarts, it restores the state for all input tables according to what was saved for their persistent_id. This way it’s possible to configure the start of computations from the moment they were terminated last time.
    • debug_data – Static data replacing original one when debug mode is active.
  • Returns
    The table read.
  • Return type
    Table

Example:

import pathway as pw
t = pw.io.plaintext.read("raw_dataset/lines.txt")