Beam SDK for Python dependencies
This page provides the information about the Apache Beam Python SDK dependencies.
If your pipeline requires additional dependencies, see Managing Python Pipeline Dependencies.
Dependencies of the Apache Beam Python SDK are defined in the setup.py
file in the Beam repository. To view them, take the following steps:
Open
setup.py
.https://github.com/apache/beam/blob/release-<VERSION_NUMBER>/sdks/python/setup.py
Replace `<VERSION_NUMBER>` with the major.minor.patch version of the SDK. For example, https://github.com/apache/beam/blob/release-2.60.0/sdks/python/setup.py provides the dependencies for the 2.60.0 release.
Review the core dependency list under
REQUIRED_PACKAGES
.Note: If you need extra features, such as
gcp
ordataframe
, review the lists inextras_require
for additional dependencies.
You can also retrieve the dependency list from the command line using the following process:
Create a clean virtual environment on your local machine using a supported python version.
$ python3 -m venv env && source ./env/bin/activate && pip install --upgrade pip setuptools wheel
Retrieve the list of dependencies.
$ pip install pipdeptree && pipdeptree -p apache-beam
If you have a docker
installation, you can inspect the dependencies
preinstalled in Beam Python SDK container
images by creating a container from an
image, for example: docker run --rm -it --entrypoint=/bin/sh apache/beam_python3.10_sdk:2.55.0 -c "pip list"
.
You can also find the list of the dependencies installed in Beam containers in
base_image_requirements.txt
files in the Beam repository for a corresponding Beam release branch and Python minor version.