ENH: Improved __str__ for polynomials#15666
Conversation
|
Out of curiosity, how is line breaking handled? |
__str__ for polynomials w/ unicode
|
As of right now, line breaking is not explicitly handled. That's certainly something that I would have to follow up on. Also, after reading around some of the open issues/PRs surrounding |
|
This will also need a release note. |
numpy/polynomial/_polybase.py
Outdated
There was a problem hiding this comment.
Make the keys strings, and then you can use str.translate below
There was a problem hiding this comment.
That's a great idea - better to use built-in functionality than reinvent the wheel. Addressed in 380c9f0d3
numpy/polynomial/_polybase.py
Outdated
There was a problem hiding this comment.
If you're not going to include the character directly, can you use the \N escapes which are more readable?
There was a problem hiding this comment.
Good point - including the characters directly is probably the most readable, so I've made that change in 380c9f0d3
|
Thanks for the initial feedback @eric-wieser , those are definitely good ideas. Re: linebreaks - it would be fairly straightforward to manually enforce 80-char limits and results in nice (e.g. no breaks in the middle of a single polynomial term), aligned output. However - introducing this manually here means that the printing wouldn't respect |
numpy/polynomial/_polybase.py
Outdated
There was a problem hiding this comment.
It's a bit weird to assume that a lack of basis_name implies a specific subclass.
It might be more consistent to define a _str_term to match the _repr_latex_term, and put the power series special case in the subclass override of that method
There was a problem hiding this comment.
Good idea, I'll take a look.
There was a problem hiding this comment.
I followed your suggestion in 795ceb6. I defined _str_term slightly differently than _repr_latex_term. since the __str__ doesn't (yet) support the variable mapping between domain <-> window, I left off the need_parens argument.
|
To give a better sense of the effects of this PR, I'm including a few examples below to show the difference between the current string representation of polynomials, and how they look after this PR: Currently: With this PR: |
|
This is an improvement, as the representation is more "mathematical" than "programatical". I wonder how this will affect useability for vision-impaired users. Maybe it is my aging eyes, but the unicode sub/superscript values in some fonts seem to be very small. The wikipedia page for these symbols explains that they were designed for fractions, and quotes the recommendation "When used in mathematical context (MathML) it is recommended to consistently use style markup for superscripts and subscripts...". When I copy-paste the rendered string into my terminal (which I uses Monospace Regular) the rendered string seems to be more readable than on this web page (using Deja Vu Sans). Scanning across the monospace fonts shows variation between them. It seems there is an opportunity for someone to tweak the existing fonts to render these symbols a little larger. Note that I notice that IPython uses latex (I think, how do you choose between latex and html?) when outputting The original issue gh-8893 mentions using |
This is a great point and something that I had not considered. I went the unicode route due to it's ubiquity in an attempt to come up with a solution that would be terminal-independent. It's doubly-unfortunate that Although the font sizes are larger, in my opinion the readability suffers from the spacing. Any opinions/suggestions re: this issue would be very helpful.
Yes, LaTeX math is already used for the IPython front-ends that support it: namely Jupyter notebook and the QtConsole. As far as I know, most terminal emulators outside the frontends that are from the Jupyter project do not support these features, so the
This is a really nice suggestion: I like the idea of defaulting to Python syntax for exponents when formatting with ascii. I think it would be also reasonable to use As mentioned above, maybe this mechanism can also help address the usability concerns for vision-impaired users. |
|
I'm a little concerned that these characters are not going to work properly in a terminal on windows. I'm finding that I'm unable to even test if your code works, because copy-pasting your code into the windows terminal host strips all of the unicode characters. I suspect that it won't be able to display them either. |
|
Digging further, the paste failing appears to be an ipython bug. The large superscripts / subscripts are rendered as normal digits for me in the default windows console font. |
|
Thanks for the feedback. Platform-dependent encoding differences would definitely throw a wrench in the works. This is also a case where I'm not sure the tests will catch encoding differences. |
|
Out of curiosity, do you see paste problems in IPython with the |
|
Unfortunately depending on which of my open terminals I use and possibly the phase of the moon, I sometimes get That's probably a bug in either IPython, prompt-toolkit, or the terminal itself - but it's still possibly something we don't want to lead our users right into. |
|
Ah, it's a font issue
|
|
Part of the reason I wished to leave such things to the display :) @eric-wieser Are there any standard display calls on any of the windows terminals that you know of? I know Jupyter has such things. I am not a fan of the ascii form. IIRC, back when I was looked at this SymPy was recommended for having a good solution, it might be worth looking at how they do things and maybe stealing it. @asmeurer Thoughts? |
|
SymPy doesn't use unicode characters for superscript exponents. I think the font support for them isn't great, as you discovered. SymPy can also have arbitrary exponents, not just polynomials, which doesn't apply here. We do use As far as implementation, the SymPy printing system is far more generic than what would be needed here (it has to be able to print arbitrary symbolic expressions). Although you might steal the Unicode support detection code https://github.com/sympy/sympy/blob/c02515a8cfaa28c47300084b09b0d57798b867dd/sympy/printing/pretty/pretty_symbology.py#L50 (I don't know how it works, but I haven't heard any complaints about it). |
|
Thanks for all the additional info! In terms of character support, I had been most worried about platform-dependent encoding issues, though that was addressed in PEP 528. I hadn't considered font issues however. As mentioned, it seems like that will be the main sticking point as it can't be counted on for the fonts used in various shells/terminals to support the range of unicode necessary for the full set of numerical super/subscripts.
Thanks @asmeurer , this is a nice thing to be aware of. It looks like the checks in the linked function are limited to a small subset of specific unicode characters and doesn't probe the range for super/subscripts, but the concept could certainly be ported over. The font consideration really seems to be the limiting factor for a generic approach that will be supported in all terminals. To me, the only viable options seem to be to stick to ascii either printing with Python syntax as mentioned above (i.e. It's a shame as, though it's not perfect, I found the unicode much more visually appealing. |
__str__ for polynomials w/ unicode__str__ for polynomials
|
Summary Using unicode to represent super/subscripts for polynomial printing in the terminal is not a good approach due to font support issues and readability concerns. Two alternate approaches to Polynomial printing in the (non-Jupyter) terminal are presented below. After the most recent round of revisions, I went back and reformulated things away from the unicode approach. I've implemented two options: "Python-style" formatting and "poly1d-style" printing (which, according to the above discussion appears to be consistent with "Python-style" printing uses standard Python syntax for exponents and underscores to represent subscripts. Using similar examples to those above, the printing would look something like: "poly1d-style" printing uses only ascii characters, and denotes superscripts and subscripts on separate lines: To me, the "Python-style" printing has the advantage that it's more vertically compact, though it may be harder to interpret, especially for polynomials with many terms. The "poly1d-style" is nearer the traditional mathematical representation, but is not very compact vertically. One drawback of the "poly1d-style" is that it does not work well with non-monospace fonts (for example - take one of the printed outputs from the literal block above and paste it into an empty GitHub comment window - note that the alignment between the line containing the terms/coefficients and the line containing the powers will become misaligned). Please let me know if you have thoughts/opinions about the relative merit of these two approaches. I like the "Python-style" best as it's the simplest and most portable, even though it's not very aesthetically pleasing. Another option is to leave the printing as-is and close the book on this particular attempt to update it. If you have thoughts, please chime in! N.B. - Since there are two different options, I haven't modified the test suite, which is why it's currently failing. I will update it once a decision has been made. |
|
My preference, for what it's worth, is the Python-like syntax, with spaces before the symbol: (Given that we cannot rely on unicode in fonts.) |
|
The superscript numbers would probably be fine, so long as you detect if the terminal supports Unicode. The detection itself is really about if the terminal supports Unicode encoding (i.e., you don't have |
|
@rossbar, here's an example where the linewidth logic doesn't quite work as expected: The linewidth is set to 41, but that last line is 60 characters. |
WarrenWeckesser
left a comment
There was a problem hiding this comment.
Looks good. I think the history can be squashed to just one or two commits. @rossbar, can you rebase with a squash? That way you can edit the commit message(s) to include just the essential information, without the churn. (No need to preserve my commit authorship for the "suggested edits".)
|
Will do @WarrenWeckesser , thank you (and others) for the careful review! |
|
I've squashed everything down into acdaff7 and written a new commit message to give an overview of the changes (and preserving co-authorship, assuming I've done that correctly). |
numpy/polynomial/_polybase.py
Outdated
There was a problem hiding this comment.
Should this be thread-local?
There was a problem hiding this comment.
As long as not even print options seem to be, not sure its worth worrying about?
Changes the printing style of instances of the convenience classes in the polynomial package to a more "human-readable" format. __str__ has been modified and __format__ added to ABCPolyBase, modifying the string representation of polynomial instances, e.g. when printed. __repr__ and the _repr_latex method (which is used in the Jupyter environment are unchanged. Two print formats have been added: 'unicode' and 'ascii'. 'unicode' is the default mode on *nix systems, and uses unicode values for numeric subscripts and superscripts in the polynomial expression. The 'ascii' format is the default on Windows (due to font considerations) and uses Python-style syntax to represent powers, e.g. x**2. The default printing style can be controlled at the package-level with the set_default_printstyle function. The ABCPolyBase.__str__ has also been made to respect the linewidth printoption. Other parameters from the printoptions dictionary are not used. Co-Authored-By: Warren Weckesser <warren.weckesser@gmail.com> Co-authored-by: Eric Wieser <wieser.eric@gmail.com>
|
Okay, 9c83b13 incorporates @eric-wieser 's suggestion and adds a test to verify that complex coefficients are printed correctly. I've re-squashed everything into 9c83b13 and (hopefully) maintained attribution correctly. I admit I don't entirely understand the implications in the thread-local discussion, so I'll leave the conversation unresolved in case someone wants to expand on it. |
|
Do we have to do more about As to the threads, it is about things such changing the setting in a subthread modifying the global state. If this was a context manager (or used like one), then surprising things can happen unless you make it thread-local. I think we should not worry about it. Although, it might mean that we need to turn this into a property or a global getter function at some point. |
This is a good point. In principle, the coefficient arrays can be >>> p = np.polynomial.Polynomial(np.array([1, 'f', 2.0], dtype=object))
>>> p
Polynomial([1, 'f', 2], dtype=object, domain=[-1, 1], window=[-1, 1])
>>> print(p)
TypeError: '>=' not supported between instances of 'str' and 'int'Since this currently allowed, then |
|
Edit - the stricken text is now outdated.
I've also updated the tests to verify that this fallback works, as well as test |
Add a fallback for TypeErrors that are raised when attempting to compare arbitrary elements (e.g. strings or Python complex) to 0 in _generate_str.
|
I've reworked how I've also added tests for the Python complex case. |
|
@WarrenWeckesser I am happy with the local try/except solution and IIRC the rest was pretty far along. But I never reviewed this in detail. Do you want to do the honors? |
Update routines.polynomials.classes doc in the refguide to reflect changes to polynomial printing. Add additional information to the document about the various ways that the string representation of polynomial expressions can be controlled via formatting.
|
Thanks @rossbar |
The convenience classes derived from ABCPolyBase had a nickname attribute that was only used internally in the previous implementation of __str__. After the overhaul of __str__ in numpy#15666, this attr is no longer used.
* MAINT: Remove nickname from polynomial classes. The convenience classes derived from ABCPolyBase had a nickname attribute that was only used internally in the previous implementation of __str__. After the overhaul of __str__ in #15666, this attr is no longer used. * DOC: Add release note. Add release note to notify users of removal of the abstract property, and highlight users that may be affected by the change. * DOC: fixed rST in release note
Adds a symbol attribute to the polynomials from the np.polynomial package to allow the user to control/modify the symbol used to represent the independent variable for a polynomial expression. This attribute corresponds to the variable attribute of the poly1d class from the old np.lib.polynomial module. Marked as draft for now as it depends on #15666 - all _str* and _repr* methods of ABCPolyBase and derived classes would need to be modified (and tested) to support this change. Co-authored-by: Warren Weckesser <warren.weckesser@gmail.com>

This PR follows up on some ideas originally introduced in #8893.
Since numpy v1.19 drops support for python2, one of the original ideas of #8893 can be revisited - using unicode sub/superscript values to prettify the printing of polynomials in
np.polynomial.This PR implements this feature by modifying the
__str__method of theABCPolyBaseas well as introducing a few character mappings/helper methods to theABCPolyBaseclass.Please let me know if this is a desired feature at this time.
Note that I modified the test suite to reflect the changes while trying to maintain some similarity to the original structure of the tests. I erred on the side of being explicit, though some reorganization of the tests could likely improve things. Also, I'm not sure if having unicode characters directly in the test suite will cause problems on some platforms vis-a-vis file encoding.