pw.io.gdrive
pw.io.gdrive.read(object_id, *, mode='streaming', object_size_limit=None, refresh_interval=30, service_user_credentials_file, with_metadata=False, file_name_pattern=None)
sourceReads a table from a Google Drive directory or file.
It will return a table with single column data containing each file in a binary format.
- Parameters
- object_id (
str
) – id of a directory or file. Directories will be scanned recursively. - mode (
str
) – denotes how the engine polls the new data from the source. Currently “streaming” and “static” are supported. If set to “streaming”, it will check for updates, deletions and new files every refresh_interval seconds. “static” mode will only consider the available data and ingest all of it in one commit. The default value is “streaming”. - object_size_limit (
int
|None
) – Maximum size (in bytes) of a file that will be processed by this connector or None if no filtering by size should be made; - refresh_interval (
int
) – time in seconds between scans. Applicable if mode is set to ‘streaming’. - service_user_credentials_file (
str
) – Google API service user json file. Please follow the instructions provided in the developer’s user guide to obtain them. - with_metadata (
bool
) – when set to True, the connector will add an additional column named _metadata to the table. This column will contain file metadata, such as: id, name, mimeType, parents, modifiedTime, thumbnailLink, lastModifyingUser. - file_name_pattern (
list
|str
|None
) – glob pattern (or list of patterns) to be used to filter files based on their names. Defaults to None which doesn’t filter anything. Doesn’t apply to folder names. For example, *.pdf will only return files that has .pdf extension.
- object_id (
- Returns
The table read.
Example:
import pathway as pw
table = pw.io.gdrive.read(
object_id="0BzDTMZY18pgfcGg4ZXFRTDFBX0j",
service_user_credentials_file="credentials.json"
)