Skip to content

Cleanup, document and fortify test.support.hashlib_helper #137371

@picnixz

Description

@picnixz

Proposal:

Cryptographic modules are one of the most annoying components to maintain because they are deeply interconnected but spread across multiple C modules. Most of CPython modules are well-contained but everything related to cryptography is actually spread around. At one point, I would like to suggest a single package holding the cryptographic primitives, similar to compression package we now have. Currently we have:

  • hashlib: high-level API for getting message digests and others
  • hmac: high-level API for HMAC
  • _hmac: C implementation of HMAC, backend is HACL*
  • _hashlib: C implementation of hash functions and HMAC, backend is OpenSSL.
  • _md5, _sha1, _sha2, _sha3: C implementation of hash functions, backend is HACL*.
  • _blake2: C implementation of BLAKE-2, backend is HACL*.

The reason why I discriminated against BLAKE-2 is because we always prefer our implementation of BLAKE-2 because it is more versatile compared to OpenSSL's implementation which lacks personal identification & co. In addition, HACL* BLAKE-2 supports SIMD instructions (same as for HACL* HMAC) so it has different configuration options as well.

For testing, this becomes quite messy. I recently added an interface for blocking message digests for specific backends and it works well but it's still incomplete. I'm in the process of thinking how to configure the GIL_MINSIZE of hashes (see #91331) but this requires me to also identify hash functions by their family (in order to be able to distinguish them by module: for instance, SHA-224 and SHA-256 are both implemented in _sha2 as _sha2.SHA224Type and _sha2.SHA256Type respectively, or in _hashlib as both _hashlib.HASH instances).

Because of that, I end up changing hashlib_helper every so often, and it tires me out. So I try to plan forward and introduce helpers and a better extensible structure. The provided interface is essentially based on imported objects (for instance, to create a SHA224 object, I can use hashlib.sha224, _sha2.sha224, _hashlib.openssl_sha224, _hashlib.new("sha224") or hashlib.new("sha224"). When using new(), the string used is what I call a "canonical hash name", and when using named constructor functions, I just need to know <module_name>.<method_name> and so I can directly import the functions).

Anyway, this issue serves as a tracker for improving test.support.hashlib_helper. I couldn't find any usage in the wild, and it's not documented, so I don't think I need to strive for maintainability. I will still make a NEWS entry though just in case.

Finally, the new helpers I introduced, and plan to introduce, are 3.15+ so I'm already in full conflict with 3.14. The good news is that cryptographic modules don't evolve fast, so usually, if there's a bug, it's either a security issue and I'll get conflicts up to the oldest security-only branch, or it's only 3 branches that I need to maintain which is fine for me (we already have conflicts between 3.14 and 3.13 for those modules...).

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Linked PRs

Metadata

Metadata

Assignees

Labels

testsTests in the Lib/test dirtype-refactorCode refactoring (with no changes in behavior)

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions