Tree-sitter 1.0 Checklist

In the not-too-distant future, I'd like to bump Tree-sitter's version to 1.0, indicating a greater degree of stability and completeness. After that I'd like to regenerate all of the parsers in the [tree-sitter](https://github.com/tree-sitter) github org, and bump them to 1.0 as well. Before doing this, there are several important problems with the framework that I think should be fixed.

## Tasks

* [x] **Unicode character properties** - Support ECMAScript unicode property escapes in regexes.
  * [x] Implement basic support for this construct (https://github.com/tree-sitter/tree-sitter/pull/906)
  * [x] Regenerate all parsers to use unicode property escapes, fix any bugs that surface

* [x] **Partial Precedence Orderings** - The integer precedence system makes some grammars shockingly difficult to maintain.
  * [x] Enhance the precedence system to allow precedences to be expressed in a pairwise *partial ordering* instead of requiring a total ordering based on integers. (https://github.com/tree-sitter/tree-sitter/pull/939)
  * [x] Update `tree-sitter-javascript` and `tree-sitter-typescript` to use this more flexible precedence scheme. Right now, the integer precedence system is making it very difficult to continue development of `tree-sitter-typescript` in particular, because of the mix of different conflicts between types and expressions.
  * *Dynamic* precedence should probably stay integer-only, for simplicity

* [x] **Grammars with many fields, aliases** - By historical accident, generated parsers use too small an integer type (`uint8_t`) for storing nodes' field and alias information. Parsers with large numbers of fields can cause integer overflows (https://github.com/tree-sitter/tree-sitter/issues/511)
  * [x] Start representing nodes' `production_id` as a `uint16_t` (https://github.com/tree-sitter/tree-sitter/pull/943)
  * [x] **Strategy** - Decide whether we're going to bother to maintain backward compatibility with old generated parsers, if so, the library code will need to become a bit more complicated in order to consume both binary formats.
  * [x] **Grammars** - Regenerate all the parsers with the new representation.

* [x] Fix issues with the `get_column` external scanner API (https://github.com/tree-sitter/tree-sitter/pull/978)

* [x] **CLI Ergonomics**
  * [x] Generate Rust bindings for parsers, and structure the Node.js bindings more consistently with the Rust ones (https://github.com/tree-sitter/tree-sitter/pull/948)
  * [x] In `parse` command, auto-detect UTF-16 files and decode them accordingly. This will help windows users who currently trip over the suggested `echo` command in the docs. (https://github.com/tree-sitter/tree-sitter/pull/2368)
  * [x] Support grammars defined as [ECMAScript modules](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Modules) instead of CommonJS module. 
  * [x] **Reduce Coupling to Node** - Introduce some Tree-sitter specific `GRAMMAR_PATH` setting where the CLI will search for grammar modules, instead of relying on `node_modules` and `npm`.

* [ ] **Mergeable Git Repos** - Make it easier to collaborate on grammars by removing generated files from version control.
  * [x] **CLI commands** - Add new `pack` and `publish` subcommands to the Tree-sitter CLI, for uploading tarballs and compiled `.wasm` files to the GitHub releases API. https://github.com/tree-sitter/tree-sitter/issues/730#issuecomment-736018228
  * [ ] **Cleanup** - Remove generated files from all the grammar repos in the tree-sitter org

* [ ] **Documentation**
  * [x] Document the ability to match against supertypes in queries with the `expression/identifier` syntax.
  * [ ] Add more thorough explanations of LR conflicts, precedence, and dynamic conflict-resolution with GLR.
  * [ ] Make it clear how to use Tree-sitter for basic syntax highlighting *without* the `tree-sitter-highlight` rust crate (just using tree queries directly).
  * [x] Document the `tags.scm` queries used for code navigation on GitHub. #660
  * [x] Create a CHANGELOG file and start maintaining it. #527

## Stretch Goals

I'm recording these here even though they are a bit less urgent.

* [ ] **Incremental Parsing Perf** - Enhance the external scanner API to allow for looser state comparisons, avoiding the catastrophic node-reuse failures seen in the HTML parser (https://github.com/tree-sitter/tree-sitter-html/issues/23)
  * [ ] Figure out if the new scanner function can be made *optional* (with the parser generator inspecting `scanner.c` to decide whether to link against a `_compare` function).
  * [ ] Update `tree-sitter-html` to use this API, improving its incremental performance
  
* [x] **Native Library, WASM parsers** - Add a compile-time option to link the C library against a [standard WASM engine](https://github.com/WebAssembly/wasm-c-api) (V8, wasmtime, or wasmer). When this feature is enabled, allow the native library to load WASM parsers, marshaling the parse table into native memory, and using WASM execution only for the *lexing* phase. This will make it more useful to distribute parsers as pre-compiled `.wasm` files, instead of as C code. The performance cost should be small, because all of the expensive parsing operations will still be native. #1864

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Tree-sitter 1.0 Checklist #930

Tasks

Stretch Goals

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Tree-sitter 1.0 Checklist #930

Description

Tasks

Stretch Goals

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions