apache_beam.ml.transforms.embeddings.vertex_ai module

class apache_beam.ml.transforms.embeddings.vertex_ai.VertexAITextEmbeddings(model_name: str, columns: list[str], title: str | None = None, task_type: str = 'RETRIEVAL_DOCUMENT', project: str | None = None, location: str | None = None, credentials: Credentials | None = None, **kwargs)[source]

Bases: EmbeddingsManager

Embedding Config for Vertex AI Text Embedding models following https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings # pylint: disable=line-too-long Text Embeddings are generated for a batch of text using the Vertex AI SDK. Embeddings are returned in a list for each text in the batch. Look at https://cloud.google.com/vertex-ai/docs/generative-ai/learn/model-versioning#stable-versions-available.md # pylint: disable=line-too-long for more information on model versions and lifecycle.

Parameters:

model_name – The name of the Vertex AI Text Embedding model.
columns – The columns containing the text to be embedded.
task_type – The downstream task for the embeddings. Valid values are RETRIEVAL_QUERY, RETRIEVAL_DOCUMENT, SEMANTIC_SIMILARITY, CLASSIFICATION, CLUSTERING. For more information on the task type, look at https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings # pylint: disable=line-too-long
title – Identifier of the text content.
project – The default GCP project for API calls.
location – The default location for API calls.
credentials – Custom credentials for API calls. Defaults to environment credentials.

get_model_handler() → ModelHandler[source]

get_ptransform_for_processing(**kwargs) → PTransform[source]

class apache_beam.ml.transforms.embeddings.vertex_ai.VertexAIImageEmbeddings(model_name: str, columns: list[str], dimension: int | None, project: str | None = None, location: str | None = None, credentials: Credentials | None = None, **kwargs)[source]

Bases: EmbeddingsManager

Embedding Config for Vertex AI Image Embedding models following https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-multimodal-embeddings # pylint: disable=line-too-long Image Embeddings are generated for a batch of images using the Vertex AI API. Embeddings are returned in a list for each image in the batch. This transform makes remote calls to the Vertex AI service and may incur costs for use.

Parameters:

model_name – The name of the Vertex AI Multi-Modal Embedding model.
columns – The columns containing the image to be embedded.
dimension – The length of the embedding vector to generate. Must be one of 128, 256, 512, or 1408. If not set, Vertex AI’s default value is 1408.
project – The default GCP project for API calls.
location – The default location for API calls.
credentials – Custom credentials for API calls. Defaults to environment credentials.

get_model_handler() → ModelHandler[source]

get_ptransform_for_processing(**kwargs) → PTransform[source]

class apache_beam.ml.transforms.embeddings.vertex_ai.VertexAIMultiModalEmbeddings(model_name: str, image_column: str | None = None, video_column: str | None = None, text_column: str | None = None, dimension: int | None = None, project: str | None = None, location: str | None = None, credentials: Credentials | None = None, **kwargs)[source]

Bases: EmbeddingsManager

Embedding Config for Vertex AI Multi-Modal Embedding models following https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-multimodal-embeddings # pylint: disable=line-too-long Multi-Modal Embeddings are generated for a batch of image, video, and string groupings using the Vertex AI API. Embeddings are returned in a list for each image in the batch as MultiModalEmbeddingResponses. This transform makes remote calls to the Vertex AI service and may incur costs for use.

Parameters:

model_name – The name of the Vertex AI Multi-Modal Embedding model.
image_column – The column containing image data to be embedded. This data is expected to be formatted as VertexImage objects, containing a Vertex Image object.
video_column – The column containing video data to be embedded. This data is expected to be formatted as VertexVideo objects, containing a Vertex Video object an a VideoSegmentConfig object.
text_column – The column containing text data to be embedded. This data is expected to be formatted as Chunk objects, containing the string to be embedded in the Chunk’s content field.
dimension – The length of the embedding vector to generate. Must be one of 128, 256, 512, or 1408. If not set, Vertex AI’s default value is 1408. If submitting video content, dimension musst be 1408.
project – The default GCP project for API calls.
location – The default location for API calls.
credentials – Custom credentials for API calls. Defaults to environment credentials.

get_model_handler() → ModelHandler[source]

get_ptransform_for_processing(**kwargs) → PTransform[source]

class apache_beam.ml.transforms.embeddings.vertex_ai.VertexAIMultiModalInput(image: apache_beam.ml.transforms.embeddings.vertex_ai.VertexImage | None = None, video: apache_beam.ml.transforms.embeddings.vertex_ai.VertexVideo | None = None, contextual_text: apache_beam.ml.rag.types.Chunk | None = None)[source]

Bases: object

image: VertexImage | None = None

video: VertexVideo | None = None

contextual_text: Chunk | None = None