pathway.io.s3_csv package
pathway.io.s3_csv.read(path, aws_s3_settings, value_columns, id_columns=None, csv_settings=None, poll_new_objects=False, debug_data=None)
Reads a table from one or several objects in Amazon S3 bucket.
In case the prefix is specified, and there are several objects lying under this prefix, their order is determined according to to their modification times: the smaller the modification time is, the earlier the file will be passed to the engine.
- Parameters
- path (
str
) – Path to an object or to a folder of objects in Amazon S3 bucket. - aws_s3_settings (
AwsS3Settings
) – Connection parameters for the S3 account and the bucket. - value_columns (
List
str
) – Names of the columns to be extracted from the files. - id_columns (
Optional
List
str
) – In case the table should have primary key generated according to a subset of its columns, the set of columns should be specified in this field. Otherwise, primary key will be generated as uuid4. - csv_settings (
Optional
CsvParserSettings
) – The settings for the CSV parser. - poll_new_objects (
Optional
bool
) – If set to true, the engine will wait for the new input files in the bucket, which fall under the path prefix. - debug_data – Static data replacing original one when debug mode is active.
- path (
- Returns
The table read. - Return type
Table
Example:
>>> import os>>> import pathway as pw>>> t = pw.s3_csv.read(... "animals/",... aws_s3_settings=pw.io.s3_csv.AwsS3Settings.AwsS3Settings(... bucket_name="datasets",... region="eu-west-3",... access_key=os.environ["S3_ACCESS_KEY"],... secret_access_key=os.environ["S3_SECRET_ACCESS_KEY"],... ),... value_columns=["owner", "pet"],... )