Skip to content

HHH-18455 Strict XML validation compliance #9558

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

jrenaat
Copy link
Member

@jrenaat jrenaat commented Jan 3, 2025


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license
and can be relicensed under the terms of the LGPL v2.1 license in the future at the maintainers' discretion.
For more information on licensing, please check here.


https://hibernate.atlassian.net/browse/HHH-18455

@jrenaat jrenaat force-pushed the HHH-18455_strictxmlcompliance branch 2 times, most recently from 1f4b365 to 551447f Compare January 6, 2025 13:05
@jrenaat jrenaat requested a review from sebersole January 6, 2025 13:57
@jrenaat jrenaat marked this pull request as ready for review January 6, 2025 13:57
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps InputstreamRepeatableAccess would be a better name for this ...
I also wonder about what to do (if anything) with that chunk of memory the byte[] occupies, once this repeatable access is no longer needed; should there be some mechanism to release this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jrenaat 👋🏻 😃

I remembered that we have something for XML validation in Validator code here: https://github.com/hibernate/hibernate-validator/blob/b14c620a3f7d93a322fa122b52e1b9e83c376c35/engine/src/main/java/org/hibernate/validator/internal/xml/mapping/MappingXmlParser.java#L98-L143

you'll see that that code relies on InputStream#mark()/InputStream#reset() instead of keeping all the file. And the way we make sure that these methods are "supported" by the stream is here:
https://github.com/hibernate/hibernate-validator/blob/b14c620a3f7d93a322fa122b52e1b9e83c376c35/engine/src/main/java/org/hibernate/validator/internal/engine/AbstractConfigurationImpl.java#L287

IDK if that would be applicable here or not, but I thought I'd let you know about it, and you can then decide 😃.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pointers. I did consider the mark/reset approach but i'm not 100% it would work in our case, i think we actually need 2 separate inputstreams.

@jrenaat jrenaat force-pushed the HHH-18455_strictxmlcompliance branch from 551447f to 25cc853 Compare January 6, 2025 15:08
@jrenaat jrenaat force-pushed the HHH-18455_strictxmlcompliance branch from 25cc853 to aa45191 Compare January 13, 2025 16:14
@jrenaat jrenaat force-pushed the HHH-18455_strictxmlcompliance branch from aa45191 to 18e4484 Compare July 30, 2025 20:44
@hibernate-github-bot
Copy link

hibernate-github-bot bot commented Jul 30, 2025

Thanks for your pull request!

This pull request appears to follow the contribution rules.

› This message was automatically generated.

@jrenaat jrenaat force-pushed the HHH-18455_strictxmlcompliance branch from 18e4484 to 8c5bafa Compare August 6, 2025 17:47
Copy link
Member

@sebersole sebersole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments.

Let's chat about it some more and work through those.

@@ -36,15 +38,18 @@ public abstract class AbstractBinder<T> implements Binder<T> {

private final LocalXmlResourceResolver xmlResourceResolver;

protected InputStreamAccess streamAccess;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really not a fan of storing this as an inst var here. As an illustration of why, the only use of it actually does not even check whether it is null prior to using it even though only 1 of the 2 exposed methods actually sets it (and then only in 1 of the 2 impls).

I think we need to think about a way to not store this as an inst var, especially here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass it as a parameter to the doBind methods?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's great either. Basically there are 2 forms of bind here - one accepting a Source and one accepting a InputStream(Access).

Possibly having our XMLEventReader impls have a way to reset or rebuild themselves?

Copy link
Member

@sebersole sebersole Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we say we don't care about this for the Source forms...

As far as I can see, there are 2 places where the Source form comes into play and I think neither is a real problem:

  1. org.hibernate.boot.jaxb.internal.XmlSources#fromDocument wraps the DOM document as a Source. We no longer use this XmlSources#fromDocument method (as we no longer support DOM) and since its internal we could deprecate or even remove it.
  2. org.hibernate.jpa.boot.spi.PersistenceXmlParser#loadUrlWithJaxb which is loading configuration and chooses to wrap an InputStream as a StreamSource to pass to the binder. But (a) we could instead wrap it as a InputStreamAccess and (b) we don't really care about strict validation for configuration files.

So maybe a better holostic strategy here is to:

  1. Deprecate Binder#bind(Source, Origin)
  2. Remove XmlSources#fromDocument (its internal and we do not use it)
  3. Change PersistenceXmlParser#loadUrlWithJaxb to pass the InputStream(Access)
  4. Change AbstractBinder#createReader to return a XMLEventReader whihc provides access to the underlying InputStreamAccess. In the (now deprecated) cases that a Source was used, that reader would just throw an exception.

wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other thing... you changed the signature of Binder#bind(InputStream,Origin) to Binder#bind(InputStreamAccess,Origin). Technically Binder is an SPI and we really ought to not be changing that.

However, we could deprecate the old form in favor of a new one with that signature, the old form just wrapping the passed InputStream as an InputStreamAccess and calling the new form.

@jrenaat jrenaat force-pushed the HHH-18455_strictxmlcompliance branch from 8c5bafa to 3626420 Compare August 6, 2025 18:57
jrenaat added 3 commits August 7, 2025 18:03
…to bind(InputStreamAccess, Origin) to allow repeatable access to the InputStream, needed for strict Jpa XML validation

HHH-18455 - Implement option to run strict JPA compliance validation

Signed-off-by: Jan Schatteman <[email protected]>
Introduce JaxbBindingSource interface to avoid needing an InputStreamAccess instance variable inside AbtractBinder
Remove unused XmlSources.fromDocument
Remove unused JaxpSourceXmlSource

Signed-off-by: Jan Schatteman <[email protected]>
@jrenaat jrenaat force-pushed the HHH-18455_strictxmlcompliance branch from 3626420 to 084dfbc Compare August 7, 2025 16:03
public XMLEventReader getEventReader() {
try {
// create a standard StAX reader
final XMLEventReader staxReader = staxFactory().createXMLEventReader( streamAccess.accessInputStream() );

Check failure

Code scanning / CodeQL

Resolving XML external entity in user-controlled data Critical

XML parsing depends on a
user-provided value
without guarding against external entity expansion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants