blog & release
2023/05/10
Apache Beam 2.47.0Jack R. McCluskey [@jrmccluskey]
We are happy to present the new 2.47.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.
For more information on changes in 2.47.0, check out the detailed release notes.
Highlights
- Apache Beam adds Python 3.11 support (#23848).
I/Os
- BigQuery Storage Write API is now available in Python SDK via cross-language (#21961).
- Added HbaseIO support for writing RowMutations (ordered by rowkey) to Hbase (Java) (#25830).
- Added fileio transforms MatchFiles, MatchAll and ReadMatches (Go) (#25779).
- Add integration test for JmsIO + fix issue with multiple connections (Java) (#25887).
New Features / Improvements
- The Flink runner now supports Flink 1.16.x (#25046).
- Schema’d PTransforms can now be directly applied to Beam dataframes just like PCollections.
(Note that when doing multiple operations, it may be more efficient to explicitly chain the operations
like
df | (Transform1 | Transform2 | ...)
to avoid excessive conversions.) - The Go SDK adds new transforms periodic.Impulse and periodic.Sequence that extends support for slowly updating side input patterns. (#23106)
- Several Google client libraries in Python SDK dependency chain were updated to latest available major versions. (#24599)
Breaking Changes
- If a main session fails to load, the pipeline will now fail at worker startup. (#25401).
- Python pipeline options will now ignore unparsed command line flags prefixed with a single dash. (#25943).
- The SmallestPerKey combiner now requires keyword-only arguments for specifying optional parameters, such as
key
andreverse
. (#25888).
Deprecations
- Cloud Debugger support and its pipeline options are deprecated and will be removed in the next Beam version, in response to the Google Cloud Debugger service turning down. (Java) (#25959).
Bugfixes
- BigQuery sink in STORAGE_WRITE_API mode in batch pipelines might result in data consistency issues during the handling of other unrelated transient errors for Beam SDKs 2.35.0 - 2.46.0 (inclusive). For more details see: https://github.com/apache/beam/issues/26521
Known Issues
- BigQueryIO Storage API write with autoUpdateSchema may cause data corruption for Beam SDKs 2.45.0 - 2.47.0 (inclusive) (#26789)
- Long-running Python pipelines might experience a memory leak: #28246.
List of Contributors
According to git shortlog, the following people contributed to the 2.47.0 release. Thank you to all contributors!
Ahmed Abualsaud
Ahmet Altay
Alexey Romanenko
Amir Fayazi
Amrane Ait Zeouay
Anand Inguva
Andrew Pilloud
Andrey Kot
Bjorn Pedersen
Bruno Volpato
Buqian Zheng
Chamikara Jayalath
ChangyuLi28
Damon
Danny McCormick
Dmitry Repin
George Ma
Jack Dingilian
Jack McCluskey
Jasper Van den Bossche
Jeremy Edwards
Jiangjie (Becket) Qin
Johanna Öjeling
Juta Staes
Kenneth Knowles
Kyle Weaver
Mattie Fu
Moritz Mack
Nick Li
Oleh Borysevych
Pablo Estrada
Rebecca Szper
Reuven Lax
Reza Rokni
Ritesh Ghorse
Robert Bradshaw
Robert Burke
Saadat Su
Saifuddin53
Sam Rohde
Shubham Krishna
Svetak Sundhar
Theodore Ni
Thomas Gaddy
Timur Sultanov
Udi Meiri
Valentyn Tymofieiev
Xinyu Liu
Yanan Hao
Yi Hu
Yuvi Panda
andres-vv
bochap
dannikay
darshan-sj
dependabot[bot]
harrisonlimh
hnnsgstfssn
jrmccluskey
liferoad
tvalentyn
xianhualiu
zhangskz