Skip to content

[ESM] Fix flaky discarding record age test #12552

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 25, 2025

Conversation

gregfurman
Copy link
Contributor

@gregfurman gregfurman commented Apr 24, 2025

Motivation

Fixes a flaky test that was triggerd by TCP connections to Kinesis Mock timing out/going stale.

Changes

  • Patches the maximum record age to allow tests against LS to complete faster, in turn reducing likelihood of the connection issues.

@gregfurman gregfurman added semver: patch Non-breaking changes which can be included in patch releases aws:lambda:event-source-mapping AWS Lambda Event Source Mapping (ESM) labels Apr 24, 2025
@gregfurman gregfurman added this to the 4.4 milestone Apr 24, 2025
@gregfurman gregfurman self-assigned this Apr 24, 2025
Copy link

github-actions bot commented Apr 24, 2025

LocalStack Community integration with Pro

    2 files  ±    0      2 suites  ±0   1h 23m 59s ⏱️ - 34m 24s
3 196 tests  - 1 184  2 937 ✅  - 1 086  259 💤  - 98  0 ❌ ±0 
3 198 runs   - 1 184  2 937 ✅  - 1 086  261 💤  - 98  0 ❌ ±0 

Results for commit cf95416. ± Comparison against base commit 49dcd93.

This pull request removes 1188 and adds 4 tests. Note that renamed tests count towards both.
tests.aws.scenario.bookstore.test_bookstore.TestBookstoreApplication ‑ test_lambda_dynamodb
tests.aws.scenario.bookstore.test_bookstore.TestBookstoreApplication ‑ test_opensearch_crud
tests.aws.scenario.bookstore.test_bookstore.TestBookstoreApplication ‑ test_search_books
tests.aws.scenario.bookstore.test_bookstore.TestBookstoreApplication ‑ test_setup
tests.aws.scenario.kinesis_firehose.test_kinesis_firehose.TestKinesisFirehoseScenario ‑ test_kinesis_firehose_s3
tests.aws.scenario.lambda_destination.test_lambda_destination_scenario.TestLambdaDestinationScenario ‑ test_destination_sns
tests.aws.scenario.lambda_destination.test_lambda_destination_scenario.TestLambdaDestinationScenario ‑ test_infra
tests.aws.scenario.loan_broker.test_loan_broker.TestLoanBrokerScenario ‑ test_prefill_dynamodb_table
tests.aws.scenario.loan_broker.test_loan_broker.TestLoanBrokerScenario ‑ test_stepfunctions_input_recipient_list[step_function_input0-SUCCEEDED]
tests.aws.scenario.loan_broker.test_loan_broker.TestLoanBrokerScenario ‑ test_stepfunctions_input_recipient_list[step_function_input1-SUCCEEDED]
…
tests.aws.services.cloudformation.v2.test_change_set_parameters.TestChangeSetParameters ‑ test_update_parameter_default_value
tests.aws.services.cloudformation.v2.test_change_set_parameters.TestChangeSetParameters ‑ test_update_parameter_default_value_with_dynamic_overrides
tests.aws.services.cloudformation.v2.test_change_set_parameters.TestChangeSetParameters ‑ test_update_parameter_with_added_default_value
tests.aws.services.cloudformation.v2.test_change_set_parameters.TestChangeSetParameters ‑ test_update_parameter_with_removed_default_value
This pull request removes 102 skipped tests and adds 4 skipped tests. Note that renamed tests count towards both.
tests.aws.scenario.kinesis_firehose.test_kinesis_firehose.TestKinesisFirehoseScenario ‑ test_kinesis_firehose_s3
tests.aws.scenario.loan_broker.test_loan_broker.TestLoanBrokerScenario ‑ test_stepfunctions_input_recipient_list[step_function_input4-FAILED]
tests.aws.scenario.mythical_mysfits.test_mythical_misfits.TestMythicalMisfitsScenario ‑ test_deployed_infra_state
tests.aws.scenario.mythical_mysfits.test_mythical_misfits.TestMythicalMisfitsScenario ‑ test_populate_data
tests.aws.scenario.mythical_mysfits.test_mythical_misfits.TestMythicalMisfitsScenario ‑ test_user_clicks_are_stored
tests.aws.services.cloudcontrol.test_cloudcontrol_api.TestCloudControlResourceApi ‑ test_api_exceptions
tests.aws.services.cloudcontrol.test_cloudcontrol_api.TestCloudControlResourceApi ‑ test_create_exceptions
tests.aws.services.cloudcontrol.test_cloudcontrol_api.TestCloudControlResourceApi ‑ test_create_invalid_desiredstate
tests.aws.services.cloudcontrol.test_cloudcontrol_api.TestCloudControlResourceApi ‑ test_double_create_with_client_token
tests.aws.services.cloudcontrol.test_cloudcontrol_api.TestCloudControlResourceApi ‑ test_lifecycle
…
tests.aws.services.cloudformation.v2.test_change_set_parameters.TestChangeSetParameters ‑ test_update_parameter_default_value
tests.aws.services.cloudformation.v2.test_change_set_parameters.TestChangeSetParameters ‑ test_update_parameter_default_value_with_dynamic_overrides
tests.aws.services.cloudformation.v2.test_change_set_parameters.TestChangeSetParameters ‑ test_update_parameter_with_added_default_value
tests.aws.services.cloudformation.v2.test_change_set_parameters.TestChangeSetParameters ‑ test_update_parameter_with_removed_default_value

♻️ This comment has been updated with latest results.

@gregfurman gregfurman marked this pull request as ready for review April 24, 2025 16:22
Copy link
Member

@joe4dev joe4dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling test flakiness @gregfurman

TCP connections to Kinesis Mock timing out/going stale
question: Is there a way/need to solve the underlying problem of expiring TCP connections? The workaround mitigates test flakiness, but wouldn't users experience the root cause without patching these internals?

StreamName=stream_name,
)
# The first record in the batch will have expired with the remaining batch not exceeding any age-limits.
aws_client.kinesis.put_records(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good batching 👍

):
# snapshot setup
snapshot.add_transformer(snapshot.transform.key_value("MD5OfBody"))
snapshot.add_transformer(snapshot.transform.key_value("ReceiptHandle"))
snapshot.add_transformer(snapshot.transform.key_value("startSequenceNumber"))

snapshot.add_transformer(
SortingTransformer(
"Records", lambda x: base64.b64decode(x["kinesis"]["data"]).decode("utf-8")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Does the same partition key not enforce ordering? Maybe, I miss something why this is needed 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test now writes the data in as a batch request, which could mess with the ordering 🫠

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth a comment since it's not obvious

@@ -1107,15 +1110,26 @@ def test_kinesis_maximum_record_age_exceeded(
"StreamDescription"
]["StreamARN"]

if not is_aws_cloud():
# Override the MaximumRecordAgeInSeconds to make test shorter in LS
# which would otherwise fail validations (where MaximumRecordAgeInSeconds >= 60s).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clarification: I didn't quite understand the second line of this comment. I guess it's to byepass AWS validations, which wouldn't allow setting it as low, right?

Idea: Maybe we can clearly label this as LS test optimization and say that we are patching the implementation directly to byepass AWS API validation 💭

Copy link
Contributor Author

@gregfurman gregfurman Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MaximumRecordAgeInSeconds can only ever be greater than 60s. This means that if we were properly validating this field (which we are not) then the below would fail:

        create_event_source_mapping_response = create_event_source_mapping(
            FunctionName=function_name,
            MaximumRecordAgeInSeconds=60 if is_aws_cloud() else 5, # <-- should raise ValidationException
            # other params
        )

Hence, we set the value (as normal) to 60 seconds and then need to monkeypatch it at the provider level to ensure these tests don't last 60+ seconds.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would something along these lines be clearer and more specific?

LocalStack test optimization: Override MaximumRecordAgeInSeconds directly in the poller to bypass the AWS API validation (where MaximumRecordAgeInSeconds >= 60s). This saves 55s waiting time.

@gregfurman
Copy link
Contributor Author

@joe4dev regarding this question:

question: Is there a way/need to solve the underlying problem of expiring TCP connections? The workaround mitigates test flakiness, but wouldn't users experience the root cause without patching these internals?

Yeah there is a way we can do this. I've started extending the utils.net to include some keepalive functionality when creating sockets. I'll add these in a follow-up with some tests.

@gregfurman gregfurman merged commit 7a3666f into master Apr 25, 2025
31 checks passed
@gregfurman gregfurman deleted the fix/flaky/kinesis-esm-test branch April 25, 2025 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws:lambda:event-source-mapping AWS Lambda Event Source Mapping (ESM) semver: patch Non-breaking changes which can be included in patch releases
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants