Post-commit tests policies
Post-commit tests validate that Beam works correctly in a live environment. The tests also catch errors that are hard to predict in the design and implementation stages.
Even though post-commit tests run after the code is merged into the repository,
it is important that the tests pass reliably. Jenkins executes post-commit tests
against the HEAD of the master
branch. If post-commit tests fail, there is a
problem with the HEAD build. In addition, post-commit tests are time consuming
to run, and it is often hard to triage test failures.
Policies
To ensure that Beam’s post-commit tests are reliable and healthy, the Beam community follows these post-commit test policies:
- Rollback first
- A failing test is a critical bug
- A flaky test is a critical bug
- Flaky tests must either be fixed or removed
- Fixes for post-commit failures should include a corresponding new pre-commit test
Post-commit test failure scenarios
When a post-commit test fails, follow the provided steps for your situation.
I found a test failure
- Create a GitHub issue and assign it to yourself.
- Components: testing, anything else relevant
- Label: precommit
- Reference this page in the description.
- Do high level triage of the failure.
- Assign the issue to a relevant person.
I was assigned an issue for a test failure
- Rollback the culprit change.
- If you determine that rollback will take longer than 8 hours, disable the test temporarily while you rollback or create a fix.
Note: Rollback is always the first course of action. If a fix is trivial, open a pull request with the proposed fix while doing rollback.
My change was rolled back due to a test failure
After rollback there is time for deeper investigation. Start by looking at the GitHub issue to see the background information for the rollback. These scenarios are all common:
- Your change contained a bug.
- Your change exposed an existing bug.
- Your change exposed a bad test (flaky, overspecified, etc).
These are all valid reasons for rollback. Maintaining clear signal is the highest priority.
The high level steps are the same:
- Create a fix and re-run the post-commit tests.
- Implement new pre-commit tests that will catch similar failures before future code is merged into the repository.
- Open a new PR that contains your fix and the new pre-commit tests.
If the bug is not in your code, here is how to “create a fix”:
- File a ticket for the existing bug, if it does not already exist. Remember that a flaky test is a critical bug. Other bad tests are similar: they may fail for arbitrary reasons having nothing to do with what is being tested, making our signal unreliable.
- Mark the problematic test to be skipped, with a link to the GitHub issue.
Useful links
References
- Keeping post-commit tests green mailing list proposal thread.
Last updated on 2024/11/14
Have you found everything you were looking for?
Was it all useful and clear? Is there anything that you would like to change? Let us know!