pathway.io.s3_csv package


pathway.io.s3_csv.read(path, aws_s3_settings, value_columns, id_columns=None, csv_settings=None, poll_new_objects=False, debug_data=None)

Reads a table from one or several objects in Amazon S3 bucket.

In case the prefix is specified, and there are several objects lying under this prefix, their order is determined according to to their modification times: the smaller the modification time is, the earlier the file will be passed to the engine.

  • Parameters
    • path (str) – Path to an object or to a folder of objects in Amazon S3 bucket.
    • aws_s3_settings (AwsS3Settings) – Connection parameters for the S3 account and the bucket.
    • value_columns (Liststr) – Names of the columns to be extracted from the files.
    • id_columns (OptionalListstr) – In case the table should have primary key generated according to a subset of its columns, the set of columns should be specified in this field. Otherwise, primary key will be generated as uuid4.
    • csv_settings (OptionalCsvParserSettings) – The settings for the CSV parser.
    • poll_new_objects (Optionalbool) – If set to true, the engine will wait for the new input files in the bucket, which fall under the path prefix.
    • debug_data – Static data replacing original one when debug mode is active.
  • Returns
    The table read.
  • Return type
    Table

Example:

>>> import os>>> import pathway as pw>>> t = pw.s3_csv.read(...     "animals/",...     aws_s3_settings=pw.io.s3_csv.AwsS3Settings.AwsS3Settings(...         bucket_name="datasets",...         region="eu-west-3",...         access_key=os.environ["S3_ACCESS_KEY"],...         secret_access_key=os.environ["S3_SECRET_ACCESS_KEY"],...     ),...     value_columns=["owner", "pet"],... )