Get started with AI/ML pipelines
| Javadoc |
Being productive and successful as a machine learning practitioner is often dependent on your ability to efficiently leverage large volumes of data in a way that is uniquely tailored to your resources, requirements, and budget. Whether starting your next AI/ML project or upscaling an existing project, consider adding Apache Beam to your project.
Use Beam ML
I want use Beam ML to do:
Prediction and inference
Beam provides different ways to implement inference as part of your pipeline. You can run your ML model directly in your pipeline and apply it on big scale datasets, both in batch and streaming pipelines.
RunInference
The RunInfernce API is available with the Beam Python SDK versions 2.40.0 and later. You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. Tensorflow models are supported through tfx-bsl
. For more deatils about using RunInference, see About Beam ML.
The RunInference API is available with the Beam Java SDK versions 2.41.0 and later through Apache Beam’s Multi-language Pipelines framework. For information about the Java wrapper transform, see RunInference.java. To try it out, see the Java Sklearn Mnist Classification example.
You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
Task | Example |
---|---|
I want to use the RunInference transform | Modify a Python pipeline to use an ML model |
I want to use RunInference with PyTorch | Use RunInference with PyTorch |
I want to use RunInference with Sklearn | Use RunInference with Sklearn |
I want to use pre-trained models (PyTorch, Scikit-learn, or TensorFlow) | Use pre-trained models |
I want to update my model in production | Use WatchFilePattern to auto-update ML models in RunInference |
Prediction and inference examples
Task | Example |
---|---|
I want to build a pipeline with multiple models | Multi-Model Pipelines |
I want to build a custom model handler with TensorRT | Use TensorRT with RunInference |
I want to use LLM inference | Large Language Model Inference |
I want to build a multi-language inference pipeline | Using RunInference from Java SDK |
Data processing
You can use Apache Beam for data validation and preprocessing by setting up data pipelines that transform your data and output metrics computed from your data. Beam has a rich set of I/O connectors for ingesting and writing data, which allows you to integrate it with your existing file system, database, or messaging queue.
Task | Example |
---|---|
I want to transform my data for preprocessing | Preprocess data with MLTransform |
I want to explore my data | Data exploration workflow and example |
I want to enrich my data | Data enrichment wth Enrichment transform |
Workflow orchestration
In order to automate and track the AI/ML workflows throughout your project, you can use orchestrators such as Kubeflow pipelines (KFP) or TensorFlow Extended (TFX). These orchestrators automate your different building blocks and handle the transitions between them.
Task | Example |
---|---|
I want to use ML-OPS workflow orchestrators | Workflow Orchestration |
I want to use Beam with TensorFlow Extended (TFX) | Tensorflow Extended (TFX) |
I want to use Beam with Kubeflow | Kubeflow pipelines (KFP) |
When you use Apache Beam as one of the building blocks in your project, these orchestrators are able to launch your Apache Beam job and to keep track of the input and output of your pipeline. These tasks are essential when moving your AI/ML solution into production, because they allow you to handle your model and data over time and improve the quality and reproducibility of results.
Model training
Task | Example |
---|---|
I want to use per-entity training | Per Entity Training |
I want to cluster text | Online Clustering Example |
I want to benchmark model performance | ML Model Evaluation |
Use cases
Task | Example |
---|---|
I want to build an anomaly detection pipeline | Anomaly Detection Example |
Reference
Last updated on 2025/01/20
Have you found everything you were looking for?
Was it all useful and clear? Is there anything that you would like to change? Let us know!