blog & release
2022/04/20
Apache Beam 2.38.0Daniel Oliviera
We are happy to present the new 2.38.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.
For more information on changes in 2.38.0 check out the detailed release notes.
I/Os
- Introduce projection pushdown optimizer to the Java SDK (BEAM-12976). The optimizer currently only works on the BigQuery Storage API, but more I/Os will be added in future releases. If you encounter a bug with the optimizer, please file a JIRA and disable the optimizer using pipeline option
--experiments=disable_projection_pushdown
. - A new IO for Neo4j graph databases was added. (BEAM-1857) It has the ability to update nodes and relationships using UNWIND statements and to read data using cypher statements with parameters.
amazon-web-services2
has reached feature parity and is finally recommended over the earlieramazon-web-services
andkinesis
modules (Java). These will be deprecated in one of the next releases (BEAM-13174).- Long outstanding write support for
Kinesis
was added (BEAM-13175). - Configuration was simplified and made consistent across all IOs, including the usage of
AwsOptions
(BEAM-13563, BEAM-13663, BEAM-13587). - Additionally, there’s a long list of recent improvements and fixes to
S3
Filesystem (BEAM-13245, BEAM-13246, BEAM-13441, BEAM-13445, BEAM-14011),DynamoDB
IO (BEAM-13209, BEAM-13209),SQS
IO (BEAM-13631, BEAM-13510) and others.
- Long outstanding write support for
New Features / Improvements
- Pipeline dependencies supplied through
--requirements_file
will now be staged to the runner using binary distributions (wheels) of the PyPI packages for linux_x86_64 platform (BEAM-4032). To restore the behavior to use source distributions, set pipeline option--requirements_cache_only_sources
. To skip staging the packages at submission time, set pipeline option--requirements_cache=skip
(Python). - The Flink runner now supports Flink 1.14.x (BEAM-13106).
- Interactive Beam now supports remotely executing Flink pipelines on Dataproc (Python) (BEAM-14071).
Breaking Changes
- (Python) Previously
DoFn.infer_output_types
was expected to returnIterable[element_type]
whereelement_type
is the PCollection elemnt type. It is now expected to returnelement_type
. Take care if you have overrideninfer_output_type
in aDoFn
(this is not common). See BEAM-13860. - (
amazon-web-services2
) The types ofawsRegion
/endpoint
inAwsOptions
changed from String toRegion
/URI
(BEAM-13563).
Deprecations
- Beam 2.38.0 will be the last minor release to support Flink 1.11.
- (
amazon-web-services2
) Client providers (withXYZClientProvider()
) as well as IO specificRetryConfiguration
s are deprecated, instead usewithClientConfiguration()
orAwsOptions
to configure AWS IOs / clients. Custom implementations of client providers shall be replaced with a respectiveClientBuilderFactory
and configured throughAwsOptions
(BEAM-13563).
Bugfixes
- Fix S3 copy for large objects (Java) (BEAM-14011)
- Fix quadratic behavior of pipeline canonicalization (Go) (BEAM-14128)
- This caused unnecessarily long pre-processing times before job submission for large complex pipelines.
- Fix
pyarrow
version parsing (Python)(BEAM-14235)
Known Issues
- See a full list of open issues that affect this version.
List of Contributors
According to git shortlog, the following people contributed to the 2.38.0 release. Thank you to all contributors!
abhijeet-lele Ahmet Altay akustov Alexander Alexander Zhuravlev Alexey Romanenko AlikRodriguez Anand Inguva andoni-guzman andreukus Andy Ye Ankur Goenka ansh0l Artur Khanin Aydar Farrakhov Aydar Zainutdinov Benjamin Gonzalez Brian Hulette brucearctor bulat safiullin bullet03 Carl Mastrangelo Chamikara Jayalath Chun Yang Daniela Martín Daniel Oliveira Danny McCormick daria.malkova David Cavazos David Huntsperger dmitryor Dmytro Sadovnychyi dpcollins-google egalpin Elias Segundo Antonio emily Etienne Chauchot Hengfeng Li Ismaël Mejía Israel Herraiz Jack McCluskey Jakub Kukul Janek Bevendorff Jeff Klukas Johan Sternby Kamil Breguła Kenneth Knowles Ke Wu Kiley Kyle Weaver laraschmidt Lara Schmidt LE QUELLEC Olivier Luka Kalinovcic Luke Cwik Marcin Kuthan masahitojp Masato Nakamura Matt Casters Melissa Pashniak Michael Li Miguel Hernandez Moritz Mack mosche nancyxu123 Nathan J Mehl Niel Markwick Ning Kang Pablo Estrada paul-tlh Pavel Avilov Rahul Iyer Reuven Lax Ritesh Ghorse Robert Bradshaw Robert Burke Ryan Skraba Ryan Thompson Sam Whittle Seth Vargo sp029619 Steven Niemitz Thiago Nunes Udi Meiri Valentyn Tymofieiev Victor vitaly.terentyev Yichi Zhang Yi Hu yirutang Zachary Houfek Zoe