Skip to content

S3: implement new checksum algorithm CRC64NVME add ChecksumType for PutObject #12182

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jan 27, 2025

Conversation

bentsku
Copy link
Contributor

@bentsku bentsku commented Jan 24, 2025

Motivation

This PR implements the new checksum algorithm for S3, CRC64NVME, and add the ChecksumType field for object created with PutObject. The ChecksumType is not too relevant for PutObject, as this is a feature mostly related to Multipart Uploads.

More information about the new S3 data integrity push AWS has been doing can be read about here: https://docs.aws.amazon.com/AmazonS3/latest/userguide/checking-object-integrity.html

This PR also fix a bit of parity reported in #12187, where we would report the wrong exception if the checksum was wrong, as the previous tests were sending values that would not correspond to a valid checksum.

Changes

  • implement the new checksum algorithm CRC64NVME
  • implement the ChecksumType field for PutObject and operations retrieving objects
  • add validation for the checksum value (valid checksum or not)
  • remove skip markers previously added

fixes #12187

@bentsku bentsku added aws:s3 Amazon Simple Storage Service semver: patch Non-breaking changes which can be included in patch releases labels Jan 24, 2025
@bentsku bentsku added this to the 4.1 milestone Jan 24, 2025
@bentsku bentsku self-assigned this Jan 24, 2025
Copy link

github-actions bot commented Jan 25, 2025

S3 Image Test Results (AMD64 / ARM64)

  2 files    2 suites   5m 28s ⏱️
446 tests 394 ✅  52 💤 0 ❌
892 runs  788 ✅ 104 💤 0 ❌

Results for commit 13a19bce.

♻️ This comment has been updated with latest results.

Copy link

github-actions bot commented Jan 25, 2025

LocalStack Community integration with Pro

    2 files  ±0      2 suites  ±0   1h 51m 41s ⏱️ + 1m 11s
4 024 tests +2  3 709 ✅ +3  315 💤  - 1  0 ❌ ±0 
4 026 runs  +2  3 709 ✅ +3  317 💤  - 1  0 ❌ ±0 

Results for commit 13a19bce. ± Comparison against base commit 1790318.

This pull request removes 15 and adds 17 tests. Note that renamed tests count towards both.
tests.aws.services.s3.test_s3.TestS3 ‑ test_complete_multipart_parts_checksum
tests.aws.services.s3.test_s3.TestS3 ‑ test_multipart_parts_checksum_exceptions
tests.aws.services.s3.test_s3.TestS3 ‑ test_put_object_checksum[CRC32C]
tests.aws.services.s3.test_s3.TestS3 ‑ test_put_object_checksum[CRC32]
tests.aws.services.s3.test_s3.TestS3 ‑ test_put_object_checksum[SHA1]
tests.aws.services.s3.test_s3.TestS3 ‑ test_put_object_checksum[SHA256]
tests.aws.services.s3.test_s3.TestS3 ‑ test_s3_checksum_no_algorithm
tests.aws.services.s3.test_s3.TestS3 ‑ test_s3_checksum_no_automatic_sdk_calculation
tests.aws.services.s3.test_s3.TestS3 ‑ test_s3_checksum_with_content_encoding
tests.aws.services.s3.test_s3.TestS3 ‑ test_s3_get_object_checksum[CRC32C]
…
tests.aws.services.s3.test_s3.TestS3 ‑ test_s3_copy_object_with_checksum[CRC64NVME]
tests.aws.services.s3.test_s3.TestS3MultipartUploadChecksum ‑ test_complete_multipart_parts_checksum
tests.aws.services.s3.test_s3.TestS3MultipartUploadChecksum ‑ test_multipart_parts_checksum_exceptions
tests.aws.services.s3.test_s3.TestS3PutObjectChecksum ‑ test_put_object_checksum[CRC32C]
tests.aws.services.s3.test_s3.TestS3PutObjectChecksum ‑ test_put_object_checksum[CRC32]
tests.aws.services.s3.test_s3.TestS3PutObjectChecksum ‑ test_put_object_checksum[CRC64NVME]
tests.aws.services.s3.test_s3.TestS3PutObjectChecksum ‑ test_put_object_checksum[SHA1]
tests.aws.services.s3.test_s3.TestS3PutObjectChecksum ‑ test_put_object_checksum[SHA256]
tests.aws.services.s3.test_s3.TestS3PutObjectChecksum ‑ test_s3_checksum_no_algorithm
tests.aws.services.s3.test_s3.TestS3PutObjectChecksum ‑ test_s3_checksum_no_automatic_sdk_calculation
…

♻️ This comment has been updated with latest results.

@bentsku bentsku added semver: minor Non-breaking changes which can be included in minor releases, but not in patch releases and removed semver: patch Non-breaking changes which can be included in patch releases labels Jan 25, 2025
@bentsku bentsku marked this pull request as ready for review January 27, 2025 00:25
@bentsku bentsku requested a review from k-a-il January 27, 2025 00:26
@@ -261,7 +261,7 @@ def test_sfn_start_execution_implicit_json_serialisation(
)
# it seems the SFn internal client does not return the checksum values from the object yet, maybe it hasn't
# been updated to parse those fields?
@markers.snapshot.skip_snapshot_verify(paths=["$..ChecksumCrc32"])
@markers.snapshot.skip_snapshot_verify(paths=["$..ChecksumCrc32", "$..ChecksumType"])
def test_s3_get_object(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see this added to LocalStack already 🚀!

I’m leaving this comment to figure out whether this is something we should fix in the SFN provider, though I’m otherwise fine with skipping these response fields in the tests.

My understanding is that, contrary to the previous comment, LocalStack’s S3 client returns two additional checksum fields in its getObject response compared to AWS. Because this response is dispatched via the AWS SDK integration, which primarily normalizes response keys, I can see the extra fields even before normalization in localstack.services.stepfunctions.asl.component.state.state_execution.state_task.service.state_task_service_aws_sdk.StateTaskServiceAwsSdk._eval_service_task. The same applies for the test case below aws.services.stepfunctions.v2.services.test_aws_sdk_task_service.TestTaskServiceAwsSdk.test_s3_put_object.

In these scenarios, should s3 exclude the checksum values in its responses, or should we look into modifying the aws-sdk Step Functions integration to remove these fields from the responses instead? Perhaps this S3 change hasn’t yet been reflected in Step Functions through the aws-sdk, so we might consider keeping an eye on the state of this test case going forward.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the difference is coming from how the internal SDK handles the request. I believe this is due to how the default values for Boto changed:

  • when creating an object, boto will automatically adds the checksum algorithm, thus the object being created (which will show up in the response). This will be for PutObject. I'll update the snapshot right now to snapshot the full GetObject call to see how the object is being created. Update: It seems I've updated the snapshot again and now they are setting the Checksum 😂 the only issue we now have is a casing issue, but it seems SFN is now creating the object with checksums ✅
  • when getting an object, boto will automatically set the ChecksumMode field set to ENABLED, but I'm not sure we can disable that. That way, it returns the 2 fields, which I don't think the current SFN SDK does. For GetObject, we can be sure the object is created with checksums, as the call is done from boto, but it seems the SDK does not return those, probably because it does not set ChecksumMode. We could manually remove those fields from the response for now.

TLDR;
For PutObject, after refreshing the snapshots again, it seems like the only issue remaining is casing
For GetObject, we may need to remove the fields for now

@bentsku bentsku merged commit faef710 into master Jan 27, 2025
40 checks passed
@bentsku bentsku deleted the s3-feat-checksum-1 branch January 27, 2025 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws:s3 Amazon Simple Storage Service semver: minor Non-breaking changes which can be included in minor releases, but not in patch releases
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: S3 put object returns different code for invalid checksum from actual S3 API
3 participants