-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Add schemas to stores #12931
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add schemas to stores #12931
Conversation
Test Results (amd64) - Integration, Bootstrap 5 files 5 suites 2h 21m 20s ⏱️ Results for commit 607c522. ♻️ This comment has been updated with latest results. |
62ff9b1
to
5838002
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good foundation to start with. This approach will build the schema out of how stores are declared, but unfortunately not what goes in them. Arguably that's a quality of weakly typed languages. One could say an attribute is of a certain type, but keep a different type in there, and this cannot be detected without running an exhaustive static type check.
Overall this tackles one aspect of schema comparison -- the store side. The other aspect is state side. I imagine each persisted state will have its own derivable schema which describes the actual data in it. This could then be checked against the store schema for load compatibility.
I'm curious to hear @thrau's thoughts.
module = getattr(obj, "__module__", None) | ||
qualname = getattr(obj, "__qualname__", None) | ||
if module and qualname: | ||
return f"{module}.{qualname}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use a different separator than .
incase there is a need to reverse the operation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using ::
not (see 607c522)
if args: | ||
_hint[TAG_ARGS] = [self._serialize_hint(_arg) for _arg in args] | ||
return _hint | ||
case _: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any thoughts how cases where stores have member functions e.g. SqsStore.expire_deleted()
be handled? How could such callables be represented in the schema as changes within them can also affect store compatibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice observation. My idea was to not include these functions in the schema definition, as they won't be serialized by a JSON schema backend. Our goal with the project is indeed to serialize only data and not code anymore.
TypeHint = types.GenericAlias | type | ||
|
||
INTERNAL_MODULE_PREFIXES = ["localstack", "moto"] | ||
"""Modules that starts with this prefix are considered internal classes and are evaluated""" | ||
|
||
|
||
AttributeName = str | ||
FQN = str | ||
SerializedHint = str | dict[str, typing.Any] | ||
|
||
AttributeSchema = dict[AttributeName, SerializedHint] | ||
"""Maps an attribute name its serialized hints""" | ||
|
||
AdditionalClasses = dict[FQN, AttributeSchema] | ||
"""Maps the a FQN of a class to its Attribute Schema""" | ||
|
||
TAG_TYPE = "LS/TYPE" | ||
TAG_ARGS = "LS/ARGS" | ||
"""Tags for subscribed types and their args. See ``StoreSchemaBuilder`` for examples.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be simplified. Can TAG_TYPE
and TAG_ARGS
be statically defined instead of constants? This will also allow defining SerializedHint
recursively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I fully understood the suggestion. In 7e0ba59 I made the SerializedHint
recursive and switched to a StrEnum
for the tags.
(cherry picked from commit aa92061)
Co-authored-by: Viren Nadkarni <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in Today's meeting with Gio and Viren, we've come to the agreement that the direction of generating a schema from stores is the correct approach, but using some existing standard (like avro, protobuf, ...) for the schema definition would be better, so we can leverage the ecosystem around the tools.
Good learning here. Closing for further investigation with the other standards mentioned above. |
Motivation
This PR introduces a heuristic to extract a schema representation for a given store class.
Context
Service providers implemented in LocalStack store their data (if they are not
moto
based or rely on 3rd party services) instores
(see #6444 for more details).LocalStack serializes these classes to implement persistence. Therefore, we should treat them as public APIs and avoid breaking them. As stores evolve within the LocalStack codebase, a few issues can occur when deserializing a previous state:
Unfortunately, we don't have much visibility on how the structure of the stores changes over time.
For this reason, this PR introduces a heuristic to extract a schema representation from a store definition.
It allows us to reason about store changes. For instance:
Changes
localstack.state.schema
module that returns a schema definition from aBaseStore
subclass. To achieve this goal, we heavily rely on the type hints of the store attributes. The docstring of theStoreSchemaBuilder
class reports a few examples.Closes PNX-46.