apache_beam.io.filesystemio module

Utilities for FileSystem implementations.

class apache_beam.io.filesystemio.Downloader[source]

Bases: object

Download interface for a single file.

Implementations should support random access reads.

abstract property size

Size of file to download.

abstract get_range(start, end)[source]

Retrieve a given byte range [start, end) from this download.

Range must be in this form:

0 <= start < end: Fetch the bytes from start to end.

Parameters:
  • start – (int) Initial byte offset.

  • end – (int) Final byte offset, exclusive.

Returns:

(string) A buffer containing the requested data.

class apache_beam.io.filesystemio.Uploader[source]

Bases: object

Upload interface for a single file.

abstract put(data)[source]

Write data to file sequentially.

Parameters:

data – (memoryview) Data to write.

abstract finish()[source]

Signal to upload any remaining data and close the file.

File should be fully written upon return from this method.

Raises:

Any error encountered during the upload.

class apache_beam.io.filesystemio.DownloaderStream(downloader, read_buffer_size=8192, mode='rb')[source]

Bases: RawIOBase

Provides a stream interface for Downloader objects.

Initializes the stream.

Parameters:
  • downloader – (Downloader) Filesystem dependent implementation.

  • read_buffer_size – (int) Buffer size to use during read operations.

  • mode – (string) Python mode attribute for this stream.

readinto(b)[source]

Read up to len(b) bytes into b.

Returns number of bytes read (0 for EOF).

Parameters:

b – (bytearray/memoryview) Buffer to read into.

seek(offset, whence=0)[source]

Set the stream’s current offset.

Note if the new offset is out of bound, it is adjusted to either 0 or EOF.

Parameters:
  • offset – seek offset as number.

  • whence – seek mode. Supported modes are os.SEEK_SET (absolute seek), os.SEEK_CUR (seek relative to the current position), and os.SEEK_END (seek relative to the end, offset should be negative).

Raises:

ValueError – When this stream is closed or if whence is invalid.

tell()[source]

Tell the stream’s current offset.

Returns:

current offset in reading this stream.

Raises:

ValueError – When this stream is closed.

seekable()[source]
readable()[source]
readall()[source]

Read until EOF, using multiple read() call.

class apache_beam.io.filesystemio.UploaderStream(uploader, mode='wb')[source]

Bases: RawIOBase

Provides a stream interface for Uploader objects.

Initializes the stream.

Parameters:
  • uploader – (Uploader) Filesystem dependent implementation.

  • mode – (string) Python mode attribute for this stream.

tell()[source]
write(b)[source]

Write bytes from b.

Returns number of bytes written (<= len(b)).

Parameters:

b – (memoryview) Buffer with data to write.

close()[source]

Complete the upload and close this stream.

This method has no effect if the stream is already closed.

Raises:

Any error encountered by the uploader.

writable()[source]
class apache_beam.io.filesystemio.PipeStream(recv_pipe)[source]

Bases: object

A class that presents a pipe connection as a readable stream.

Not thread-safe.

Remembers the last size bytes read and allows rewinding the stream by that amount exactly. See BEAM-6380 for more.

read(size)[source]

Read data from the wrapped pipe connection.

Parameters:

size – Number of bytes to read. Actual number of bytes read is always equal to size unless EOF is reached.

Returns:

data read as str.

tell()[source]

Tell the file’s current offset.

Returns:

current offset in reading this file.

Raises:

ValueError – When this stream is closed.

seek(offset, whence=0)[source]