apache_beam.io.gcp.bigtableio module

BigTable connector

This module implements writing to BigTable tables. The default mode is to set row data to write to BigTable tables. The syntax supported is described here: https://cloud.google.com/bigtable/docs/quickstart-cbt

BigTable connector can be used as main outputs. A main output (common case) is expected to be massive and will be split into manageable chunks and processed in parallel. In the example below we created a list of rows then passed to the GeneratedDirectRows DoFn to set the Cells and then we call the BigTableWriteFn to insert those generated rows in the table.

main_table = (p

beam.Create(self._generate())

WriteToBigTable(project_id, instance_id, table_id))

class apache_beam.io.gcp.bigtableio.WriteToBigTable(project_id, instance_id, table_id, use_cross_language=False, expansion_service=None, flush_count=1000, max_row_bytes=5242880)[source]

Bases: PTransform

A transform that writes rows to a Bigtable table.

Takes an input PCollection of DirectRow objects containing un-committed mutations. For more information about this row object, visit https://cloud.google.com/python/docs/reference/bigtable/latest/row#class-googlecloudbigtablerowdirectrowrowkey-tablenone

If flag use_cross_language is set to true, this transform will use the multi-language transforms framework to inject the Java native write transform into the pipeline.

Initialize an WriteToBigTable transform.

Parameters:

table_id – The ID of the table to write to.
instance_id – The ID of the instance where the table resides.
project_id – The GCP project ID.
use_cross_language – If set to True, will use the Java native transform via cross-language.
expansion_service – The address of the expansion service in the case of using cross-language. If no expansion service is provided, will attempt to run the default GCP expansion service.
flush_count (int) – (Optional) Max number of rows to flush. Default is FLUSH_COUNT (1000 rows).
max_row_bytes (int) – (Optional) Max number of row mutations size to flush. Default is MAX_ROW_BYTES (5 MB).

URN = 'beam:schematransform:org.apache.beam:bigtable_write:v1'

expand(input)[source]

class apache_beam.io.gcp.bigtableio.ReadFromBigtable(project_id, instance_id, table_id, expansion_service=None)[source]

Bases: PTransform

Reads rows from Bigtable.

Returns a PCollection of PartialRowData objects, each representing a Bigtable row. For more information about this row object, visit https://cloud.google.com/python/docs/reference/bigtable/latest/row#class-googlecloudbigtablerowpartialrowdatarowkey

Initialize a ReadFromBigtable transform.

Parameters:

table_id – The ID of the table to read from.
instance_id – The ID of the instance where the table resides.
project_id – The GCP project ID.
expansion_service – The address of the expansion service. If no expansion service is provided, will attempt to run the default GCP expansion service.

URN = 'beam:schematransform:org.apache.beam:bigtable_read:v1'

expand(input)[source]