Skip to content

Conversation

byroot
Copy link
Member

@byroot byroot commented Aug 30, 2025

Since BUILTIN_TYPE and RCLASS_SINGLETON_P are both stored in RBasic.flags, we can combine these two checks in a single bitmask.

This rely on T_ICLASS and T_CLASS not overlapping, and assume klass is always either of these types.

Just combining the masks brings a small but consistent 1.08x speedup on the simple case benchmark.

compare-ruby: ruby 3.5.0dev (2025-08-30T01:45:42Z obj-class 01a57bd6cd) +YJIT +PRISM [arm64-darwin24]
built-ruby: ruby 3.5.0dev (2025-08-30T09:56:24Z obj-class 2685f8dbb4) +YJIT +PRISM [arm64-darwin24]

|           |compare-ruby|built-ruby|
|:----------|-----------:|---------:|
|obj        |     444.410|   478.895|
|           |           -|     1.08x|
|extended   |     135.139|   140.206|
|           |           -|     1.04x|
|singleton  |     165.155|   155.832|
|           |       1.06x|         -|
|immediate  |     380.103|   432.090|
|           |           -|     1.14x|

But with the RB_UNLIKELY compiler hint, it's much more significant, however the singleton and enxtended cases are slowed down.
However we can assume the simple case is way more common than the other two.

compare-ruby: ruby 3.5.0dev (2025-08-30T01:45:42Z obj-class 01a57bd6cd) +YJIT +PRISM [arm64-darwin24]
built-ruby: ruby 3.5.0dev (2025-08-30T09:51:01Z obj-class 12d01a1b02) +YJIT +PRISM [arm64-darwin24]

|           |compare-ruby|built-ruby|
|:----------|-----------:|---------:|
|obj        |     444.951|   556.191|
|           |           -|     1.25x|
|extended   |     136.836|   113.871|
|           |       1.20x|         -|
|singleton  |     166.335|   167.747|
|           |           -|     1.01x|
|immediate  |     379.642|   509.515|
|           |           -|     1.34x|

And since the logic is very similar, I was able to reuse fake_class_p in rb_class_real simply by ensuring T_MODULE never has FL_SINGLETON set.

Then, when we're calling from Ruby's Kernel#class method, we know for sure RBasic.klass can't possibly be 0, so if we add a specialized function that skips the null check, we can speed the method some more:

compare-ruby: ruby 3.5.0dev (2025-08-30T01:45:42Z obj-class 01a57bd6cd) +YJIT +PRISM [arm64-darwin24]
built-ruby: ruby 3.5.0dev (2025-08-30T10:21:10Z obj-class b67c16c477) +YJIT +PRISM [arm64-darwin24]
# Iteration per second (i/s)

|           |compare-ruby|built-ruby|
|:----------|-----------:|---------:|
|obj        |     445.217|   642.446|
|           |           -|     1.44x|
|extended   |     136.826|   117.974|
|           |       1.16x|         -|
|singleton  |     166.269|   166.695|
|           |           -|     1.00x|
|immediate  |     380.243|   515.775|
|           |           -|     1.36x|

In a followup, It could be worth exploring the possibility of ensuring T_ICLASS always had FL_SINGLETON set, as it would allow reducing fake_class_p to a single bit check. But not sure it would make much of a difference.

Another thing worth exploring would be YJIT codegen.

byroot added 4 commits August 30, 2025 12:09
Since `BUILTIN_TYPE` and `RCLASS_SINGLETON_P` are both stored in
`RBasic.flags`, we can combine these two checks in a single bitmask.

This rely on `T_ICLASS` and `T_CLASS` not overlapping, and assume
`klass` is always either of these types.

Just combining the masks brings a small but consistent 1.08x speedup on the simple case benchmark.

```
compare-ruby: ruby 3.5.0dev (2025-08-30T01:45:42Z obj-class 01a57bd) +YJIT +PRISM [arm64-darwin24]
built-ruby: ruby 3.5.0dev (2025-08-30T09:56:24Z obj-class 2685f8dbb4) +YJIT +PRISM [arm64-darwin24]

|           |compare-ruby|built-ruby|
|:----------|-----------:|---------:|
|obj        |     444.410|   478.895|
|           |           -|     1.08x|
|extended   |     135.139|   140.206|
|           |           -|     1.04x|
|singleton  |     165.155|   155.832|
|           |       1.06x|         -|
|immediate  |     380.103|   432.090|
|           |           -|     1.14x|
```

But with the RB_UNLIKELY compiler hint, it's much more significant, however
the singleton and enxtended cases are slowed down.
However we can assume the simple case is way more common than the other two.

```
compare-ruby: ruby 3.5.0dev (2025-08-30T01:45:42Z obj-class 01a57bd) +YJIT +PRISM [arm64-darwin24]
built-ruby: ruby 3.5.0dev (2025-08-30T09:51:01Z obj-class 12d01a1b02) +YJIT +PRISM [arm64-darwin24]

|           |compare-ruby|built-ruby|
|:----------|-----------:|---------:|
|obj        |     444.951|   556.191|
|           |           -|     1.25x|
|extended   |     136.836|   113.871|
|           |       1.20x|         -|
|singleton  |     166.335|   167.747|
|           |           -|     1.01x|
|immediate  |     379.642|   509.515|
|           |           -|     1.34x|
```
This requires ensuring T_MODULE never has FL_SINGLETON set,
so RMODULE_IS_REFINEMENT had to be moved.
`Kernel#class` can't possibly be called on an hidden object,
hence we don't need to check for `klass == 0`.

```
compare-ruby: ruby 3.5.0dev (2025-08-30T01:45:42Z obj-class 01a57bd) +YJIT +PRISM [arm64-darwin24]
built-ruby: ruby 3.5.0dev (2025-08-30T10:21:10Z obj-class b67c16c) +YJIT +PRISM [arm64-darwin24]

|           |compare-ruby|built-ruby|
|:----------|-----------:|---------:|
|obj        |     445.217|   642.446|
|           |           -|     1.44x|
|extended   |     136.826|   117.974|
|           |       1.16x|         -|
|singleton  |     166.269|   166.695|
|           |           -|     1.00x|
|immediate  |     380.243|   515.775|
|           |           -|     1.36x|
```
@byroot byroot merged commit d89e734 into ruby:master Aug 30, 2025
85 checks passed
@byroot byroot deleted the obj-class branch August 30, 2025 12:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant