apache_beam.io.aws.s3io module
AWS S3 client
- apache_beam.io.aws.s3io.parse_s3_path(s3_path, object_optional=False)[source]
Return the bucket and object names of the given s3:// path.
- class apache_beam.io.aws.s3io.S3IO(client=None, options=None)[source]
Bases:
object
S3 I/O client.
- open(filename, mode='r', read_buffer_size=16777216, mime_type='application/octet-stream')[source]
Open an S3 file path for reading or writing.
- Parameters:
- Returns:
S3 file object.
- Raises:
ValueError – Invalid open file mode.
- list_prefix(path, with_metadata=False)[source]
Lists files matching the prefix.
list_prefix
has been deprecated. Use list_files instead, which returns a generator of file information instead of a dict.- Parameters:
path – S3 file path pattern in the form s3://<bucket>/[name].
with_metadata – Experimental. Specify whether returns file metadata.
- Returns:
- dict of file name -> size; if
with_metadata
is True: dict of file name -> tuple(size, timestamp).
- Return type:
If
with_metadata
is False
- list_files(path, with_metadata=False)[source]
Lists files matching the prefix.
- Parameters:
path – S3 file path pattern in the form s3://<bucket>/[name].
with_metadata – Experimental. Specify whether returns file metadata.
- Returns:
generator of tuple(file name, size); if
with_metadata
is True: generator of tuple(file name, tuple(size, timestamp)).- Return type:
If
with_metadata
is False
- checksum(path)[source]
Looks up the checksum of an S3 object.
- Parameters:
path – S3 file path pattern in the form s3://<bucket>/<name>.
- copy(src, dest)[source]
Copies a single S3 file object from src to dest.
- Parameters:
src – S3 file path pattern in the form s3://<bucket>/<name>.
dest – S3 file path pattern in the form s3://<bucket>/<name>.
- Raises:
TimeoutError – on timeout.
- copy_paths(src_dest_pairs)[source]
Copies the given S3 objects from src to dest. This can handle directory or file paths.
- Parameters:
src_dest_pairs – list of (src, dest) tuples of s3://<bucket>/<name> file paths to copy from src to dest
- Returns: List of tuples of (src, dest, exception) in the same order as the
src_dest_pairs argument, where exception is None if the operation succeeded or the relevant exception if the operation failed.
- copy_tree(src, dest)[source]
Renames the given S3 directory and it’s contents recursively from src to dest.
- Parameters:
src – S3 file path pattern in the form s3://<bucket>/<name>/.
dest – S3 file path pattern in the form s3://<bucket>/<name>/.
- Returns:
List of tuples of (src, dest, exception) where exception is None if the operation succeeded or the relevant exception if the operation failed.
- delete(path)[source]
Deletes a single S3 file object from src to dest.
- Parameters:
src – S3 file path pattern in the form s3://<bucket>/<name>/.
dest – S3 file path pattern in the form s3://<bucket>/<name>/.
- Returns:
List of tuples of (src, dest, exception) in the same order as the src_dest_pairs argument, where exception is None if the operation succeeded or the relevant exception if the operation failed.
- delete_paths(paths)[source]
Deletes the given S3 objects from src to dest. This can handle directory or file paths.
- Parameters:
src – S3 file path pattern in the form s3://<bucket>/<name>/.
dest – S3 file path pattern in the form s3://<bucket>/<name>/.
- Returns:
List of tuples of (src, dest, exception) in the same order as the src_dest_pairs argument, where exception is None if the operation succeeded or the relevant exception if the operation failed.
- delete_files(paths, max_batch_size=1000)[source]
Deletes the given S3 file object from src to dest.
- Parameters:
paths – List of S3 file paths in the form s3://<bucket>/<name>
max_batch_size – Largest number of keys to send to the client to be deleted
simultaneously
- Returns: List of tuples of (path, exception) in the same order as the paths
argument, where exception is None if the operation succeeded or the relevant exception if the operation failed.
- delete_tree(root)[source]
Deletes all objects under the given S3 directory.
- Parameters:
path – S3 root path in the form s3://<bucket>/<name>/ (ending with a “/”)
- Returns: List of tuples of (path, exception), where each path is an object
under the given root. exception is None if the operation succeeded or the relevant exception if the operation failed.
- size(path)[source]
Returns the size of a single S3 object.
This method does not perform glob expansion. Hence the given path must be for a single S3 object.
Returns: size of the S3 object in bytes.
- rename(src, dest)[source]
Renames the given S3 object from src to dest.
- Parameters:
src – S3 file path pattern in the form s3://<bucket>/<name>.
dest – S3 file path pattern in the form s3://<bucket>/<name>.
- last_updated(path)[source]
Returns the last updated epoch time of a single S3 object.
This method does not perform glob expansion. Hence the given path must be for a single S3 object.
Returns: last updated time of the S3 object in second.
- exists(path)[source]
Returns whether the given S3 object exists.
- Parameters:
path – S3 file path pattern in the form s3://<bucket>/<name>.
- rename_files(src_dest_pairs)[source]
Renames the given S3 objects from src to dest.
- Parameters:
src_dest_pairs – list of (src, dest) tuples of s3://<bucket>/<name> file paths to rename from src to dest
- Returns: List of tuples of (src, dest, exception) in the same order as the
src_dest_pairs argument, where exception is None if the operation succeeded or the relevant exception if the operation failed.
- class apache_beam.io.aws.s3io.S3Downloader(client, path, buffer_size)[source]
Bases:
Downloader
- property size