blog & release
2024/02/14
Apache Beam 2.54.0Robert Burke [@lostluck]
We are happy to present the new 2.54.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.
For more information on changes in 2.54.0, check out the detailed release notes.
Highlights
- Enrichment Transform along with GCP BigTable handler added to Python SDK (#30001).
- Beam Java Batch pipelines run on Google Cloud Dataflow will default to the Portable Runner (v2) starting with this version. (All other languages are already on Runner V2.) See Runner V2 documentation for how to enable or disable it intentionally.
I/Os
- Added support for writing to BigQuery dynamic destinations with Python’s Storage Write API (#30045)
- Adding support for Tuples DataType in ClickHouse (Java) (#29715).
- Added support for handling bad records to FileIO, TextIO, AvroIO (#29670).
- Added support for handling bad records to BigtableIO (#29885).
New Features / Improvements
- Enrichment Transform along with GCP BigTable handler added to Python SDK (#30001).
Breaking Changes
- N/A
Deprecations
- N/A
Bugfixes
- Fixed a memory leak affecting some Go SDK since 2.46.0. (#28142)
Security Fixes
- N/A
Known Issues
- Some Python pipelines that run with 2.52.0-2.54.0 SDKs and use large materialized side inputs might be affected by a performance regression. To restore the prior behavior on these SDK versions, supply the
--max_cache_memory_usage_mb=0
pipeline option. (#30360). - Python pipelines that run with 2.53.0-2.54.0 SDKs and perform file operations on GCS might be affected by excess HTTP requests. This could lead to a performance regression or a permission issue. (#28398)
- In Python pipelines, when shutting down inactive bundle processors, shutdown logic can overaggressively hold the lock, blocking acceptance of new work. Symptoms of this issue include slowness or stuckness in long-running jobs. Fixed in 2.56.0 (#30679).
- Python pipelines that run with 2.53.0-2.58.0 SDKs and read data from GCS might be affected by a data corruption issue (#32169). The issue will be fixed in 2.59.0 (#32135). To work around this, update the google-cloud-storage package to version 2.18.2 or newer.
For the most up to date list of known issues, see https://github.com/apache/beam/blob/master/CHANGES.md
List of Contributors
According to git shortlog, the following people contributed to the 2.54.0 release. Thank you to all contributors!
Ahmed Abualsaud
Alexey Romanenko
Anand Inguva
Andrew Crites
Arun Pandian
Bruno Volpato
caneff
Chamikara Jayalath
Changyu Li
Cheskel Twersky
Claire McGinty
clmccart
Damon
Danny McCormick
dependabot[bot]
Edward Cheng
Ferran Fernández Garrido
Hai Joey Tran
hugo-syn
Issac
Jack McCluskey
Jan Lukavský
JayajP
Jeffrey Kinard
Jerry Wang
Jing
Joey Tran
johnjcasey
Kenneth Knowles
Knut Olav Løite
liferoad
Marc
Mark Zitnik
martin trieu
Mattie Fu
Naireen Hussain
Neeraj Bansal
Niel Markwick
Oleh Borysevych
pablo rodriguez defino
Rebecca Szper
Ritesh Ghorse
Robert Bradshaw
Robert Burke
Sam Whittle
Shunping Huang
Svetak Sundhar
S. Veyrié
Talat UYARER
tvalentyn
Vlado Djerek
Yi Hu
Zechen Jian