Beam SDK for Python dependencies
This page provides the information about the Apache Beam Python SDK dependencies.
If your pipeline requires additional dependencies, see Managing Python Pipeline Dependencies.
Dependencies of the Apache Beam Python SDK are defined in the setup.py file in the Beam repository. To view them, take the following steps:
Open
setup.py.https://github.com/apache/beam/blob/release-<VERSION_NUMBER>/sdks/python/setup.pyReplace `<VERSION_NUMBER>` with the major.minor.patch version of the SDK. For example, https://github.com/apache/beam/blob/release-2.68.0/sdks/python/setup.py provides the dependencies for the 2.68.0 release.
Review the core dependency list under
REQUIRED_PACKAGES.Note: If you need extra features, such as
gcpordataframe, review the lists inextras_requirefor additional dependencies.
You can also retrieve the dependency list from the command line using the following process:
Create a clean virtual environment on your local machine using a supported python version.
$ python3 -m venv env && source ./env/bin/activate && pip install --upgrade pip setuptools wheelRetrieve the list of dependencies.
$ pip install pipdeptree && pipdeptree -p apache-beam
If you have a docker installation, you can inspect the dependencies
preinstalled in Beam Python SDK container
images by creating a container from an
image, for example: docker run --rm -it --entrypoint=/bin/sh apache/beam_python3.10_sdk:2.55.0 -c "pip list".
You can also find the list of the dependencies installed in Beam containers in
base_image_requirements.txt files in the Beam repository for a corresponding Beam release branch and Python minor version.

