Skip to content

Add schemas to stores #12931

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10 commits into from
Closed

Add schemas to stores #12931

wants to merge 10 commits into from

Conversation

giograno
Copy link
Member

@giograno giograno commented Jul 30, 2025

Motivation

This PR introduces a heuristic to extract a schema representation for a given store class.

Context

Service providers implemented in LocalStack store their data (if they are not moto based or rely on 3rd party services) in stores (see #6444 for more details).

LocalStack serializes these classes to implement persistence. Therefore, we should treat them as public APIs and avoid breaking them. As stores evolve within the LocalStack codebase, a few issues can occur when deserializing a previous state:

  • A class gets renamed, and we are not able to load that module and class anymore;
  • One attribute in the store is removed. Therefore, the information stored there is lost.

Unfortunately, we don't have much visibility on how the structure of the stores changes over time.
For this reason, this PR introduces a heuristic to extract a schema representation from a store definition.
It allows us to reason about store changes. For instance:

  • If a type can't be loaded anymore, we can compare an old and a new schema version to understand which new type should be used instead.
  • By comparing the schema across two versions, we can detect if an attribute has been removed and come up with a migration path.

Changes

  • Introducing a new localstack.state.schema module that returns a schema definition from a BaseStore subclass. To achieve this goal, we heavily rely on the type hints of the store attributes. The docstring of the StoreSchemaBuilder class reports a few examples.
  • A few unit tests.

Closes PNX-46.

@giograno giograno added this to the Playground milestone Jul 30, 2025
@giograno giograno self-assigned this Jul 30, 2025
@giograno giograno added area: persistence Retain state between LocalStack runs semver: patch Non-breaking changes which can be included in patch releases labels Jul 30, 2025
Copy link

github-actions bot commented Jul 30, 2025

Test Results - Preflight, Unit

22 067 tests  +4   20 333 ✅ +4   6m 18s ⏱️ -2s
     1 suites ±0    1 734 💤 ±0 
     1 files   ±0        0 ❌ ±0 

Results for commit 607c522. ± Comparison against base commit 2d08a27.

♻️ This comment has been updated with latest results.

Copy link

github-actions bot commented Jul 30, 2025

Test Results (amd64) - Acceptance

7 tests  ±0   5 ✅ ±0   3m 23s ⏱️ +18s
1 suites ±0   2 💤 ±0 
1 files   ±0   0 ❌ ±0 

Results for commit 607c522. ± Comparison against base commit 2d08a27.

♻️ This comment has been updated with latest results.

Copy link

github-actions bot commented Jul 30, 2025

Test Results (amd64) - Integration, Bootstrap

    5 files      5 suites   2h 21m 20s ⏱️
4 980 tests 4 393 ✅ 587 💤 0 ❌
4 986 runs  4 393 ✅ 593 💤 0 ❌

Results for commit 607c522.

♻️ This comment has been updated with latest results.

@coveralls
Copy link

Coverage Status

coverage: 83.13% (+31.7%) from 51.39%
when pulling a4de739 on json/schema
into 8e93d8a on main.

Copy link

github-actions bot commented Jul 30, 2025

LocalStack Community integration with Pro

    2 files  ±0      2 suites  ±0   1h 42m 55s ⏱️ - 1m 27s
4 621 tests ±0  4 186 ✅ ±0  435 💤 ±0  0 ❌ ±0 
4 623 runs  ±0  4 186 ✅ ±0  437 💤 ±0  0 ❌ ±0 

Results for commit 607c522. ± Comparison against base commit 2d08a27.

♻️ This comment has been updated with latest results.

@giograno giograno force-pushed the json/schema branch 3 times, most recently from 62ff9b1 to 5838002 Compare July 31, 2025 11:32
@giograno giograno requested review from viren-nadkarni and thrau July 31, 2025 13:12
@giograno giograno marked this pull request as ready for review July 31, 2025 13:17
Copy link
Member

@viren-nadkarni viren-nadkarni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good foundation to start with. This approach will build the schema out of how stores are declared, but unfortunately not what goes in them. Arguably that's a quality of weakly typed languages. One could say an attribute is of a certain type, but keep a different type in there, and this cannot be detected without running an exhaustive static type check.

Overall this tackles one aspect of schema comparison -- the store side. The other aspect is state side. I imagine each persisted state will have its own derivable schema which describes the actual data in it. This could then be checked against the store schema for load compatibility.

I'm curious to hear @thrau's thoughts.

module = getattr(obj, "__module__", None)
qualname = getattr(obj, "__qualname__", None)
if module and qualname:
return f"{module}.{qualname}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use a different separator than . incase there is a need to reverse the operation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using :: not (see 607c522)

if args:
_hint[TAG_ARGS] = [self._serialize_hint(_arg) for _arg in args]
return _hint
case _:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts how cases where stores have member functions e.g. SqsStore.expire_deleted() be handled? How could such callables be represented in the schema as changes within them can also affect store compatibility?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice observation. My idea was to not include these functions in the schema definition, as they won't be serialized by a JSON schema backend. Our goal with the project is indeed to serialize only data and not code anymore.

Comment on lines 10 to 29
TypeHint = types.GenericAlias | type

INTERNAL_MODULE_PREFIXES = ["localstack", "moto"]
"""Modules that starts with this prefix are considered internal classes and are evaluated"""


AttributeName = str
FQN = str
SerializedHint = str | dict[str, typing.Any]

AttributeSchema = dict[AttributeName, SerializedHint]
"""Maps an attribute name its serialized hints"""

AdditionalClasses = dict[FQN, AttributeSchema]
"""Maps the a FQN of a class to its Attribute Schema"""

TAG_TYPE = "LS/TYPE"
TAG_ARGS = "LS/ARGS"
"""Tags for subscribed types and their args. See ``StoreSchemaBuilder`` for examples."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be simplified. Can TAG_TYPE and TAG_ARGS be statically defined instead of constants? This will also allow defining SerializedHint recursively.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I fully understood the suggestion. In 7e0ba59 I made the SerializedHint recursive and switched to a StrEnum for the tags.

@giograno giograno requested a review from viren-nadkarni August 7, 2025 09:26
Copy link
Member

@thrau thrau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in Today's meeting with Gio and Viren, we've come to the agreement that the direction of generating a schema from stores is the correct approach, but using some existing standard (like avro, protobuf, ...) for the schema definition would be better, so we can leverage the ecosystem around the tools.

@giograno
Copy link
Member Author

giograno commented Aug 7, 2025

Good learning here. Closing for further investigation with the other standards mentioned above.

@giograno giograno closed this Aug 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: persistence Retain state between LocalStack runs semver: patch Non-breaking changes which can be included in patch releases
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants