Skip to content

utf16_text index out of range after parse_utf16_le in Rust #4616

@anqur

Description

@anqur

Problem

The correct implementation might be:

&source[self.start_byte() / 2..self.end_byte() / 2]

But the current implementation is (including Go binding):

&source[self.start_byte()..self.end_byte()]

Or maybe I misunderstood this API?

Steps to reproduce

Create a new grammar and a test crate:

mkdir utf16le
cd utf16le
tree-sitter init
tree-sitter generate
cargo new --lib hello

Add dependencies to crate hello:

tree-sitter = "0.25.8"
tree-sitter-utf16le = { path = "../" }

Write the Rust test:

#[cfg(test)]
mod tests {
    use tree_sitter::Parser;
    use tree_sitter_utf16le::LANGUAGE;

    #[test]
    fn it_works() {
        let mut parser = Parser::new();
        parser.set_language(&LANGUAGE.into()).unwrap();
        let text = "你好".encode_utf16().collect::<Box<_>>();
        let tree = parser.parse_utf16_le(&text, None).unwrap();
        let root = tree.root_node().utf16_text(&text);
//                                  ^~~~~~~~~~~~~~~~^ fails here
        let hello = String::from_utf16(root).unwrap();
        assert_eq!(hello, "你好");
    }
}

Run the test:

cd hello
cargo test

Result:

---- tests::it_works stdout ----

thread 'tests::it_works' panicked at C:\Users\XXX\.cargo\registry\src\index.crates.io-XXX\tree-sitter-0.25.8\binding_rust\lib.rs:2065:16:
range end index 4 out of range for slice of length 2

Expected behavior

Get the "你好" text via String::from_utf16.

Tree-sitter version (tree-sitter --version)

tree-sitter 0.25.4

Operating system/version

Windows 10 22H2

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions