Skip to content

rbByteEncode fails to no-op when encodings are the same #8686

@headius

Description

@headius

While investigating fixes for #8682 I discovered that rbByteEncode does not properly handle the case of source and destination encoding being the same.

As described in #8682 (comment):

IOOutputStream calls RubyIO.write and passes only the incoming bytes, offsets, and encoding:

@Override
public void write(final byte[] b,final int off, final int len) throws IOException {
ThreadContext context = runtime.getCurrentContext();
RubyIO realIO = this.realIO;
if (realIO != null) {
realIO.write(context, b, off, len, encoding);
} else {
IRubyObject io = this.io;
writeAdapter.call(context, io, io, RubyString.newStringLight(runtime, new ByteList(b, off, len, encoding, false)));
}
}

This path was added as an optimization in 236f7ba to avoid constructing string objects just to immediately unwrap them for IO writes.

Unfortunately it hits a bug eventually in EncodingUtils.rbByteEncode where two identical encodings will not no-op, but instead will trigger an encoding error because it rejects encoding from and to the same encoding.

The short-circuit checks here:

if (encoding.isAsciiCompatible() && to.isAsciiCompatible()) {
if (cr == StringSupport.CR_7BIT) {
return null;
}
} else if (encodingEqual(sname, dname)) {
return null;
}

...were never updated when the sister logic for RubyString was updated in 621369c. The correct logic, with additional refactoring, looks like this:

if (senc != null && senc == denc) {
return strTranscodeScrub(context, forceEncoding, str, ecflags, ecopts, result, explicitlyInvalidReplace, denc, senc);
} else if (is7BitCompat(str, denc, senc)) {
return result.apply(context, str, denc, str);
} else if (encodingEqual(sname, dname)) {
if (forceEncoding.isNil()) denc = null;
return result.apply(context, str, denc, str);
}

Ignoring the problems with IOOutputStream only being able to specify a single encoding for all String writes, we need to fix this issue in rbByteEncode and ensure same-encoding calls no-op the same way as rbStrEncode.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions