apache_beam.io.gcp.bigtableio module¶
BigTable connector
This module implements writing to BigTable tables. The default mode is to set row data to write to BigTable tables. The syntax supported is described here: https://cloud.google.com/bigtable/docs/quickstart-cbt
BigTable connector can be used as main outputs. A main output (common case) is expected to be massive and will be split into manageable chunks and processed in parallel. In the example below we created a list of rows then passed to the GeneratedDirectRows DoFn to set the Cells and then we call the BigTableWriteFn to insert those generated rows in the table.
- main_table = (p
beam.Create(self._generate())WriteToBigTable(project_id, instance_id, table_id))
- class apache_beam.io.gcp.bigtableio.WriteToBigTable(project_id, instance_id, table_id, use_cross_language=False, expansion_service=None)[source]¶
Bases:
PTransform
A transform that writes rows to a Bigtable table.
Takes an input PCollection of DirectRow objects containing un-committed mutations. For more information about this row object, visit https://cloud.google.com/python/docs/reference/bigtable/latest/row#class-googlecloudbigtablerowdirectrowrowkey-tablenone
If flag use_cross_language is set to true, this transform will use the multi-language transforms framework to inject the Java native write transform into the pipeline.
Initialize an WriteToBigTable transform.
- Parameters:
table_id – The ID of the table to write to.
instance_id – The ID of the instance where the table resides.
project_id – The GCP project ID.
use_cross_language – If set to True, will use the Java native transform via cross-language.
expansion_service – The address of the expansion service in the case of using cross-language. If no expansion service is provided, will attempt to run the default GCP expansion service.
- URN = 'beam:schematransform:org.apache.beam:bigtable_write:v1'¶
- class apache_beam.io.gcp.bigtableio.ReadFromBigtable(project_id, instance_id, table_id, expansion_service=None)[source]¶
Bases:
PTransform
Reads rows from Bigtable.
Returns a PCollection of PartialRowData objects, each representing a Bigtable row. For more information about this row object, visit https://cloud.google.com/python/docs/reference/bigtable/latest/row#class-googlecloudbigtablerowpartialrowdatarowkey
Initialize a ReadFromBigtable transform.
- Parameters:
table_id – The ID of the table to read from.
instance_id – The ID of the instance where the table resides.
project_id – The GCP project ID.
expansion_service – The address of the expansion service. If no expansion service is provided, will attempt to run the default GCP expansion service.
- URN = 'beam:schematransform:org.apache.beam:bigtable_read:v1'¶