pathway.io.plaintext package
pathway.io.plaintext.read(path, mode='streaming', persistent_id=None, debug_data=None)
Reads a table from a text file or a directory of text files. The resulting table
will consist of a single column data
, and have the number of rows equal to the number
of lines in the file. Each cell will contain a single line from the file.
In case the folder is specified, and there are several files placed in the folder, their order is determined according to their modification times: the smaller the modification time is, the earlier the file will be passed to the engine.
- Parameters
- path (
str
) – Path to a file or to a folder. - mode (
str
) – If set to “streaming”, the engine will wait for the new input files in the directory. Set it to “static”, it will only consider the available data and ingest all of it in one commit. Default value is “streaming”. - persistent_id (
Optional
int
) – (unstable) An identifier, under which the state of the table will be persisted orNone
, if there is no need to persist the state of this table. When a program restarts, it restores the state for all input tables according to what was saved for theirpersistent_id
. This way it’s possible to configure the start of computations from the moment they were terminated last time. - debug_data – Static data replacing original one when debug mode is active.
- path (
- Returns
The table read. - Return type
Table
Example:
import pathway as pw t = pw.io.plaintext.read("raw_dataset/lines.txt")