Skip to content

Conversation

WillLillis
Copy link
Member

Node::utf16_text uses byte offsets to index into a slice of u16s. These indices need to be adjusted.

I added handling in case these adjusted indices happen to fall on a utf16 surrogate. I'm not sure how to construct such a pathological test case that has a node's text boundary split on a surrogate pair, but I think it's better to be safe than sorry in this case.

Alternatively, we could just use String::from_utf16, however this would return owned data (also making the requisite heap allocation), as well as break parity with Node::utf8_text's return type.

@ObserverOfTime
Copy link
Member

ObserverOfTime commented Jul 16, 2025

We (I?) normally use the bindings scope for the binding templates. I suggest using rust instead.
We should probably document these at some point…

@WillLillis WillLillis added the ci:backport release-0.25 Backport label label Jul 26, 2025
@WillLillis WillLillis merged commit d3c2fed into tree-sitter:master Aug 2, 2025
19 checks passed
@WillLillis WillLillis deleted the utf16_range branch August 2, 2025 20:03
@tree-sitter-ci-bot
Copy link

Successfully created backport PR for release-0.25:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

utf16_text index out of range after parse_utf16_le in Rust

4 participants