Skip to content

Add tcl9.0 compatibility #15

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

Add tcl9.0 compatibility #15

wants to merge 9 commits into from

Conversation

mark987
Copy link
Contributor

@mark987 mark987 commented Jul 17, 2025

Modify git-gui to work correctly under Tcl9.0 allowing this to be used in addition to Tcl 8.6. This depends upon #14.

@j6t
Copy link
Owner

j6t commented Jul 18, 2025

Nice!

Please sign off 872b4ac.

@mark987
Copy link
Contributor Author

mark987 commented Jul 19, 2025

Rebased on the updated #14, added signoff as noted.

@j6t
Copy link
Owner

j6t commented Jul 20, 2025

Thanks. I just noticed that this series doesn't updated a number of -encoding and -translation cases in lib/choose_repository.tcl for reasons. Let's wait until after I have merged #14 and #16, and then rebase this series.

@mark987
Copy link
Contributor Author

mark987 commented Jul 20, 2025

Yes - I tried to keep these branches independent, and there are no merge conflicts, but obviously lib/choose_repository.tcl will not work correctly on Tcl 9.0 UNLESS do_clone is updated per #16. But, I think that patch is ok again as updated just now. Let me know if/when you wish this rebased.

@mark987 mark987 mentioned this pull request Jul 21, 2025
@j6t
Copy link
Owner

j6t commented Jul 22, 2025

This can now be rebased on the master branch that I have just pushed out. (Did you receive a notification of that event?)

mark987 added 4 commits July 22, 2025 12:32
Per 6eb420e ("git-gui: Always disable the Tcl EOF character when
reading", 2007-07-17), git-gui should disable Tcl's EOF character
detection on all files when on Windows: the default is disabled on all
other platforms (and with Tcl 9.0, is disabled on Windows too).  This
EOF character is for compatibility with files / applications written for
file systems that know only the disc sectors allocated, and not the
number of bytes used.  This has nothing to do with git.

But, git-gui does not set -eofchar {} on all channels.  To avoid any
further leakage, let's just add this to the Windows specific override of
open.  This override is needed only as long as Tcl 8.x is in use (Tcl
9.0 makes -eofchar {} default on all platforms).

Signed-off-by: Mark Levedahl <[email protected]>
git-gui has many cases where -translation binary and -encoding binary
are configured on the same channel. But, -translation binary defines a
binary channel, which sets up -encoding iso8859-1 as part of its work.
Tcl 8.x defines -encoding binary as an alias of -encoding iso8859-1, and
this alias is deleted in Tcl 9.0.  Let's delete the redundant encoding
definition now.

Signed-off-by: Mark Levedahl <[email protected]>
git-gui currently configures some channels as '-encoding binary' when
the channel is not really binary (e.g, the channel is consumed as lines
of text).  In 8.6, '-encoding binary' is an alias for '-encoding
iso8859), but TIP 699 removes this alias for Tcl 9.0. Let's switch to
'-encoding iso8859-1' to be compatible across Tcl versions.

Signed-off-by: Mark Levedahl <[email protected]>
git-gui has many instances of '-translation binary' and '-encoding
$SOMETHING' on the same channel.  As eofchar is always null given a
prior commit, the net effect of having '-translation binary' in such
configuration is only to change how text line endings are handled.

For cases where the channel is opened to be consumed via gets, the eol
translation is irrelevant because Tcl's gets is documented to recognize
any of \n, \r, and \r\n as a line ending.  So, keep only the '-encoding
$SOMETHING' configuration in these cases, making the configuration more
clear.

Signed-off-by: Mark Levedahl <[email protected]>
@mark987
Copy link
Contributor Author

mark987 commented Jul 22, 2025

This was just a rebase - no changes to code or commit messages.

Yes, I got notification of update to origin/master, thank you!. Progress :^)

@j6t
Copy link
Owner

j6t commented Jul 29, 2025

Commit cfa7125 causes an unacceptable regression for me. I have files checked into the repository with CRLF ending. With this change, the diff looks like this for me:

@@ -981,7 +985,7 @@ IDR_MAINFRAME MENU
     POPUP "&File"
     BEGIN
         MENUITEM "&New\tCtrl+N",                ID_FILE_NEW
-        MENUITEM "New &SOM...",                 ID_FILE_NEW_SOM
+        MENUITEM "New &SOM...\tCtrl+M",         ID_FILE_NEW_SOM

         MENUITEM "&Open...\tCtrl+O",            ID_FILE_OPEN
         MENUITEM SEPARATOR
         POPUP "&Export"

Apparently, the CR causes an additional line break. The command is git diff-files --textconv -p --color -U3 -- filename. The octal dump of the command output (just the + and - lines) looks like this:

0004500 033   [   3   1   m   -                                   M   E
0004520   N   U   I   T   E   M       "   N   e   w       &   S   O   M
0004540   .   .   .   "   ,
0004560                           I   D   _   F   I   L   E   _   N   E
0004600   W   _   S   O   M 033   [   m  \r  \n 033   [   3   2   m   +
0004620 033   [   m 033   [   3   2   m
0004640   M   E   N   U   I   T   E   M       "   N   e   w       &   S
0004660   O   M   .   .   .   \   t   C   t   r   l   +   M   "   ,
0004700                                   I   D   _   F   I   L   E   _
0004720   N   E   W   _   S   O   M  \r 033   [   m  \n

As you can see, the \r\n combination on the + lines is interrupted by a color code escape sequence. I haven't dug into why this happens only on the +, but not on the - lines, but my suspicion is that the spot before \n on + lines is provided for the color code that indicates an error in trailing whitespace. This option is not enabled in this diff.

A solution is direly needed.

@mark987
Copy link
Contributor Author

mark987 commented Jul 29, 2025

Pure conjecture here, I won't be in a place where I can try / test things for a couple of days...
I presume git believes the file in question is text, so your repository has \n line endings stored, and git is translating \n to \r\n on output.

The valid line separators are \r, \r\n, and \n, with \r retired since Apple switched to BSD unix. To my knowledge, \r \n is not a valid line ending: if git diff is introducing this on output, there is a bug that should be fixed, regardless of a workaround in gi-gui.

I do not understand the mechanism where \r \n does not translate into two lines, I'll need to stare at code for a while when I can. It is possible that Tcl's crlf translation code ignores ANSI escapes, and my patch disables that translation so we get two line endings because other git-gui code now gets invoked.

Does disabling the change to lib/diff.tcl in cfa7125 fix the problem you see?

@j6t
Copy link
Owner

j6t commented Jul 29, 2025

Pure conjecture here, I won't be in a place where I can try / test things for a couple of days... I presume git believes the file in question is text, so your repository has \n line endings stored, and git is translating \n to \r\n on output.

This is definitely not the case. The files have \r\n both in the worktree and the repository. Git passes these bytes through unchanged. I have neither core.autocrlf nor any eol attributes set. Git doesn't apply EOL translation to the files: the \r is arbitrary whitespace that happens to occur right before the line break \n.

The valid line separators are \r, \r\n, and \n, with \r retired since Apple switched to BSD unix. To my knowledge, \r \n is not a valid line ending: if git diff is introducing this on output, there is a bug that should be fixed, regardless of a workaround in gi-gui.

Git has no bug here. It passes the bytes through unchanged.

I do not understand the mechanism where \r \n does not translate into two lines, I'll need to stare at code for a while when I can. It is possible that Tcl's crlf translation code ignores ANSI escapes, and my patch disables that translation so we get two line endings because other git-gui code now gets invoked.

If the Tcl translation would ignore the ANSI escapes, then it would see a \r\n sequence and would translate it to \n. But it does not. It sees a lone \r (before the escape sequence) and translates it (I assume) to \n and then the lone \n that needs no translation. So I get \n<escape sequence>\n, hence, an empty line.

Does disabling the change to lib/diff.tcl in cfa7125 fix the problem you see?

I'll test this in a moment.

@j6t
Copy link
Owner

j6t commented Jul 29, 2025

Does disabling the change to lib/diff.tcl in cfa7125 fix the problem you see?

I'll test this in a moment.

Reverting just the change in lib/diff.tcl does fix the problem and restores to original behavior.

@mark987
Copy link
Contributor Author

mark987 commented Jul 30, 2025 via email

@j6t
Copy link
Owner

j6t commented Jul 30, 2025

I maintain that git-diff splitting the eol marker is a bug.

It isn't, really. If Git isn't told to do anything special with the file contents, but still regard it as text, then the \r is just random trailing whitespace and not part of an EOL marker. The EOL marker is solely \n.

The color escape is (likely) a reset. It is present in the added line, because at this point, right before the \n we usually have the escape sequence needed to mark up erroneous trailing whitespace. Since trailing whitespace are never an error on removed lines, there is no corresponding escape sequence in the removed lines. In this particular case, the \r is trailing whitespace, but not marked up as error. Git diff still inserts an escape sequence, which would be the reset after a trailing whitespace error. The reset is still present, because it doesn't hurt (usually!) even if no color shift took place earlier.

BTW, the lines look like this in git diff:

-        MENUITEM "New &SOM...",                 ID_FILE_NEW_SOM
+        MENUITEM "New &SOM...\tCtrl+M",         ID_FILE_NEW_SOM^M

In the removed line, less sees a \r\n combination and uses it as the EOL marker. In the added line, we have a \r without a subsequent \n due to the escape sequence, which it it shows it as ^M.

@j6t
Copy link
Owner

j6t commented Jul 30, 2025

I just discovered another case that shows that is important not to mangle the output of git diff in any way: It is required to keep Stage Hunk and Stage Line working. If the line terminators are mangled, these commands do not work anymore, because the patch that Git GUI constructs does not match the data that git apply sees in the repository and files.

It also means, that the data written into the patch must not undergo any translation, either. But this seems to be the case already, because with the change in lib/diff.tcl reverted, these commands work again.

@mark987
Copy link
Contributor Author

mark987 commented Jul 30, 2025 via email

mark987 added 5 commits July 31, 2025 13:50
git-gui configures '-translation lf' on a number of channels. The
default configuration is 'auto', which on input changes any occurrence
of \n, \r, or \r\n to \n, and on output changes any such EOL sequence to
a platform dependent value (\n on Unix, \r\n on Windows). Such
translation can be necessary, but much of what is configured now is
redundant.

In particular, many of the channels configured this way are then
consumed by gets, which already recognizes any of \n, \r, or \r\n as
terminators.  Configuring a channel to first change these line endings,
then give the result to gets, is redundant.

The valid uses of -translation lf are for output where we do not want
\r\n on Windows, and for consuming entire files without going through
gets, assuring that \n will be used internally. Let's remove all the
others that only serve to confuse.

lib/diff.tcl must have -translation lf because \r\n might be stored in
the repository (e.g., on Windows, with no crlf translation enabled), and
git will treat \n as the line ending, while the preceding \r is just
whitespace, and these may be split by ANSI color coding. git-gui's
read_diff handles this correctly as-is.

Signed-off-by: Mark Levedahl <[email protected]>
Tcl 9 imposes strict requirements on namespaces for variables, while Tcl
8 does not. lib/themed.tcl does not use the fully qualified name for the
"color" namespace, with result that variables are not found with Tcl
9.0. Fix this.

Signed-off-by: Mark Levedahl <[email protected]>
git-gui invokes many git commands expecting output in utf-8 encoding,
but git accepts extended ascii (code page unknown) as utf-8 without
validating, so cannot guarantee valid utf-8 on output.  In particular,
using any extended ascii code page has long been acceptable on git given
that everyone on a project is aware of and uses that same code page to
view all data. utf-8 accepts only 7-bit ascii characters in single
bytes, and any characters outside of that base set require at least two
bytes for representation in unicode.

Tcl is a string based language, and transcodes all input data to an
internal unicode format, and to whatever format is requested on output:
"pure" binary is recoded byte by byte using iso8859-1.  Tcl8.x silently
recodes invalid utf-8 as binary data, so extended ascii characters
maintain their binary value on output but may not display correctly.

Tcl 8.7 added three profiles to control this behaviour: strict (raises
exceptions), replace (replaces each invalid byte with ?), and the
default tcl8 maintaining the old behavior.  Tcl 9 changes the default
profile to strict, meaning any invalid utf-8 raises an exception that
git-gui does not handle.

An example of this in the git repository is commit 7eb93c8965 ("[PATCH]
Simplify git script", 2005-09-07). This includes extended ascii
characters in the author name and commit message.

The tcl8 profile used so far has acceptable behavior given git-gui's
acceptance: this allows git-gui to accept extended ascii though it may
display incorrectly.  Let's continue that behavior by overriding open to
use the tcl8 profile on Tcl9 and later: Tcl 8.6 does not understand
fconfigure -profile, and Tcl 8.7 maintains the tcl8 profile.

Signed-off-by: Mark Levedahl <[email protected]>
git-gui in the prior commit learned to apply -profile tcl8 when reading
files, avoiding errors on non-binary data streams whose encoding is not
utf-8. But, git-gui also consumes binary data streams (generally blobs
from commits) as the output of commands, and internally decodes this to
support various displays.

With Tcl9, errors occur in this decoding for the same reasons described
in the previous commit: basically, the underlying data may contain
extended ascii characters violating the assumption of utf-8 encoding.

This problem has a similar fix to the prior issue: we must use the tlc8
profile when converting this data to the internal unicode format. Do so,
again only on Tcl9 as Tcl8.6 does not recognize -profile, and only Tcl
9.0 makes strict the default.

Signed-off-by: Mark Levedahl <[email protected]>
TclTk 9.0 is now shipping, and git-gui is now patched to support use of
this newer version. Adjust required versions to allow Tcl/Tk >= 8.6,
including 9.x.

Signed-off-by: Mark Levedahl <[email protected]>
@mark987
Copy link
Contributor Author

mark987 commented Jul 31, 2025

lib/diff.tcl again has -translation lf for the channel reading output of git-diff. I added a comment in the file stating why, and that will link future readers back to this commit.

No other changes in the branch - I looked carefully at all the loops consuming output of channels modified by this, tested all with a repo containing dos line endings with changes to such lines, everything worked for me.

@j6t
Copy link
Owner

j6t commented Jul 31, 2025

Thank you very much for the update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants