Safari Version 3.0.4 (5523.10) and WebKit r28007 display the non-printing, invisible character "ZERO WIDTH NON-JOINER U+200C". This happened after installing leopard (V. 10.5.1). When I copied the word that contain ZWNJ from Safari and pasted it in TextEdit I noticed that while the Arabic /Persian word is displayed using Geeza Pro (as it should), the ZWNJ is borrowed from font Arial. See the attached picture. This is not normal and very irritating, ZWNJ is invisible non-printing character and Safari should not display it.
Created attachment 17509 [details] non-printing ZWNJ in Safari and Web kit
<rdar://problem/5611927>
Looking back at bug 13136 and subsequent patches, I think ZWJ and ZWNJ can simply be added to the set returning true from treatAsZeroWidthSpace(), given how that function is used.
Created attachment 50917 [details] Patch adding new layout tests for ZWJ and ZWNJ. Added layout tests for the handling of ZWJ and ZWNJ. These test that ZWJ and ZWNJ glyphs are not visible in simple (Latin) text, and that the control characters are taken into context for glyph-shaping in Persian.
Did the bug get fixed in some indirect way? Why is it ok to just land tests without code changes?
The ZWNJ character no longer shows in Safari v4.0.4 (5532.21.10) and WebKit r56096, nor does it show up when text containing ZWNJ is copied from Safari to WebKit, so the bug as originally described seems to have been resolved. However, ZWJ is now effectively being treated as ZWNJ (they are both invisible and cause the glyphs beside them not to be connected to each other). This is beyond the scope of the bug as originally described, though it's probably caused by whatever "fixed" it. I'm working on a patch for this right now.
Was it a change in WebKit, or just a difference in configuration? I'm concerned that this character could be coming from a different version of Arial, such as one installed by MS Office.
I've updated the patch to include code to suppress the display of ZWJ and ZWNJ glyphs.
Created attachment 51108 [details] Updated patch
Attachment 51108 [details] did not pass style-queue: Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1 WebCore/platform/graphics/FontFastPath.cpp:242: One space before end of line comments [whitespace/comments] [5] WebCore/platform/graphics/FontFastPath.cpp:247: One space before end of line comments [whitespace/comments] [5] Total errors found: 2 in 14 files If any of these errors are false positives, please file a bug against check-webkit-style.
Comment on attachment 51108 [details] Updated patch > + * platform/graphics/FontFastPath.cpp: > + (WebCore::Font::canUseGlyphCache): added Arabic presentation forms Why? And how does it relate to the present bug? > + // FIXME: A better place for this might be in GlyphPage::glyphDataForCharacter or > + // Font::glyphDataForCharacter. This should be done in GlyphPageTreeNode::initializePage() where LRM, RLM, LRE, RLE, PDF and other control characters are already handled.
Created attachment 51156 [details] Updated patch with code moved to GlyphPageTreeNode::initializePage() I've moved the code to GlyphPageTreeNode::initializePage(). The patch now makes the treatment of ZWNJ and ZWJ consistent with other Unicode format control characters such as LRM and RLM. How would a user access the glyphs for ZWNJ and ZWJ in situations where they should actually be displayed, such as for editors or IMEs? I'm not sure what the correct display behaviour is for format control characters when they are typed into edit fields. The patch suppresses their display, which is consistent with how the other control characters currently behave. The reason I added a check for Arabic presentation forms is because they should be handled by the complex font path. It relates to the bug in that ZWNJ and ZWJ are not handled correctly for Arabic script when the presentation forms are used. Some web pages are encoded using the presentation forms (even though that goes against Unicode's rules), and without the check they don't shape (apply ZWJ and ZWNJ) properly. If this bug is restricted to just the *visibility* of the ZWNJ and ZWJ *glyphs*, as opposed to the correct handling and display of ZWNJ and ZWJ more generally, I can create a new bug report for the other situations.
(In reply to comment #12) > The reason I added a check for Arabic presentation forms is because they should > be handled by the complex font path. You keep saying this but I do not undestand why that is so. Presentational forms are not subject to contextual shaping or required ligatures—they represent the contextual forms and ligatures themselves. Perhaps an example of where your change makes a different in behavior will help to understand why you think this change is needed.
The general principle is that a code change needs to be accompanied by a test case showing incorrect behavior before and correct behavior afterward. The check for Arabic presentation forms is no exception. A test case showing what's wrong and how the change fixes the problem is required for that specific change.
Comment on attachment 51156 [details] Updated patch with code moved to GlyphPageTreeNode::initializePage() This patch consist mostly of whitespace and coding style changes. It’s okay to make such changes in a function you’re already changing as part of the bugfix, but otherwise please keep large-scale changes out of a bugfix patch. > +2010-03-19 David Yonge-Mallo <davinci@chromium.org> > + > + Reviewed by NOBODY (OOPS!). > + > + Fix the (non)display of glyphs for ZWJ and ZWNJ in simple font code path. > + Added logic to use complex font code path for Arabic presentation forms. > + > + https://bugs.webkit.org/show_bug.cgi?id=16131 > + > + Tests: fast/text/format-control.html > + fast/text/zero-width-characters.html > + fast/text/international/zero-width-joiner-and-non-joiner.html > + > + * platform/graphics/Font.h: > + (WebCore::Font::treatAsZeroWidthSpace): treat ZWJ and ZWNJ as ZWSP. > + * platform/graphics/FontFastPath.cpp: > + (WebCore::Font::canUseGlyphCache): added Arabic presentation forms > + * platform/graphics/WidthIterator.cpp: > + (WebCore::WidthIterator::advance): don't display glyphs for ZWJ and ZWNJ. This change log entry is outdated. > - return (((c & ~0xFF) == 0 && gRoundingHackCharacterTable[c])); > + return (( !(c & ~0xFF) && gRoundingHackCharacterTable[c])); This is an unrelated style change, so like I said, it should not be included in this patch, but I can’t help pointing out that it leaves the redundant parentheses in place and adds a space before the ! that shouldn’t be there. > + buffer[zeroWidthNonJoiner - start] = zeroWidthSpace; > + buffer[zeroWidthJoiner - start] = zeroWidthSpace; This is good. > UErrorCode uStatus = U_ZERO_ERROR; > int32_t resultLength = unorm_normalize(m_run.data(currentCharacter), 2, > UNORM_NFC, UNORM_UNICODE_3_2, &normalizedCharacters[0], 2, &uStatus); > - if (resultLength == 1 && uStatus == 0) > + if (resultLength == 1 && !uStatus) It’s true that WebKit style is to use ! instead of == 0, but in this case, I think == U_ZERO_ERROR is even more correct.
Created attachment 51179 [details] Removed style changes and presentation form check from patch I've removed the stylistic changes from the patch. I only made them so the style checker would stop outputting errors. You're right that presentation forms should not be subject to shaping if web sites follow the rules, but some (I don't know how many) don't. This is particularly true for sites in Persian and Urdu, some of which use the presentation forms to force the display of certain glyphs which look better than the standard Arabic glyphs for those languages (especially ye and he-jimi). I can include a test case, but it would contain what is essentially incorrect Unicode. Rather than handle bad Unicode for backwards compatibility reasons, it's better to have correct language detection and have WebKit choose the right "pretty" glyphs. So, I've dropped the representation form check from the patch. This goes beyond the scope of the bug as originally described, but ZWNJ and ZWJ are not handled correctly in the complex text font path. On Mac, the ZWJ acts like ZWNJ. On Linux, the joining behaviour is correct, but the glyphs are displayed. I'll file a separate bug for those.
> I can include a test case, but it would contain what is essentially > incorrect Unicode. Rather than handle bad Unicode for backwards compatibility > reasons, it's better to have correct language detection and have WebKit choose > the right "pretty" glyphs. I think it's worth filing a new bug about this issue. First, language-based font selection may still be far away, so adding a workaround could still make sense. Second, even with language-based font selection, we'd need regression tests for the problems it fixes.
Comment on attachment 51179 [details] Removed style changes and presentation form check from patch Rejecting patch 51179 from commit-queue. Failed to run "['WebKitTools/Scripts/run-webkit-tests', '--no-launch-safari', '--exit-after-n-failures=1', '--quiet']" exit_code: 1 Running build-dumprendertree Compiling Java tests make: Nothing to be done for `default'. Running tests from /Users/eseidel/Projects/CommitQueue/LayoutTests Testing 12539 test cases. fast/text/international/zero-width-joiner-and-non-joiner.html -> failed Exiting early after 1 failures. 8723 tests run. 149.64s total testing time 8722 test cases (99%) succeeded 1 test case (<1%) had incorrect layout 5 test cases (<1%) had stderr output Full output: http://webkit-commit-queue.appspot.com/results/1223032
Comment on attachment 51179 [details] Removed style changes and presentation form check from patch Looks like this fail the new test and will need an update.
(In reply to comment #19) > (From update of attachment 51179 [details]) > Looks like this fail the new test and will need an update. I presume the simple font test passes and the complex test is failing. What makes more sense, to remove the complex font test entirely or to remove just the ZWJ part of it? I think the ZWJ/ZWNJ tests should be done together. The glyphs are no longer displaying so the bug is indeed fixed, but the test fails because joining is not working, which is a separate issue.
Created attachment 51638 [details] Removed ZWJ test case for complex font path I've removed the test case for the complex font path entirely. I also filed a new bug (bug 36596) for ZWJ not joining glyphs properly, and attached the removed test case there for reference.
Comment on attachment 51638 [details] Removed ZWJ test case for complex font path > +2010-03-25 David Yonge-Mallo <davinci@chromium.org> > + > + Reviewed by NOBODY (OOPS!). > + > + Added tests for the handling of ZWJ and ZWNJ characters in simple font > + path. These characters have zero width but may change the widths of text > + around them. Furthermore, their glyphs should not be displayed in > + simple static text. > + > + https://bugs.webkit.org/show_bug.cgi?id=16131 > + > + * fast/text/format-control.html: Added. > + * fast/text/zero-width-characters.html: modified to test ZWSP, ZWJ, and ZWNJ. > + * platform/mac/fast/text/format-control-expected.checksum: Added. > + * platform/mac/fast/text/format-control-expected.png: Added. > + * platform/mac/fast/text/format-control-expected.txt: Added. NIT: Since I'm going to recommend some minor changes, could you clean up the ChangeLog a little. The bug title "ZWNJ - Display non-printing, invisible character" is missing, and the bug link should go above the comments. Otherwise, this is a great comment and a good ChangeLog! The normal order (produced by prepare-ChangeLog) is: Reviewed By ... Bug Title https://bugs.webkit.org/show_bug.cgi?id=... Comments... * files and comments... > +<html><head> > +<meta charset="utf-8"> > +</head> > +<body> > +This tests the ZWJ and ZWNJ format control characters on basic Latin text. > + > +<p>fi fl ff ffi ffl</p> > +<p>fâi fâl fâf fâfâi fâfâl</p> > +<p>fâi fâl fâf fâfâi fâfâl</p> > +<textarea rows="5" cols="80">fi fl ff ffi ffl > +fâi fâl fâf fâfâi fâfâl > +fâi fâl fâf fâfâi fâfâl</textarea> > +</body> > +</html> This test is currently passing on Mac platforms. It would be needlessly difficult to make it fail by default on all possible platforms, but I think making it fail by default on mac platforms as well would help it to be caught on build bots. On Mac you can use a font like "Times New Roman". I would recommend duplicating the <p> and <textarea> down below but forcing that font: <div style="font-family: Times New Roman"> <p>fi fl ff ffi ffl</p> <p>fi fl ff ffi ffl</p> <p>fi fl ff ffi ffl</p> <textarea rows="5" cols="80">fi fl ff ffi ffl fi fl ff ffi ffl fi fl ff ffi ffl</textarea> </div> This would require updated test results. > +2010-03-25 David Yonge-Mallo <davinci@chromium.org> > + > + Reviewed by NOBODY (OOPS!). > + > + Fix the (non)display of glyphs for ZWJ and ZWNJ in simple font code path. > + > + https://bugs.webkit.org/show_bug.cgi?id=16131 > + > + Tests: fast/text/format-control.html > + fast/text/zero-width-characters.html > + > + * platform/graphics/Font.h: > + (WebCore::Font::operator!=): > + (WebCore::Font::treatAsZeroWidthSpace): treat ZWNJ and ZWJ as ZWSP. > + * platform/graphics/GlyphPageTreeNode.cpp: > + (WebCore::GlyphPageTreeNode::initializePage): added ZWNJ and ZWJ. > + * platform/text/CharacterNames.h: added ZWNJ and ZWJ. > + NIT: The same ChangeLog comments that I had above. The patch itself looks good to me! Just update the test so a regression could surely be caught!
Hello David. I'm checking in to if you're still actively working on this bug.
David has decided to pursue some other interests. Hopefully James can finish this off since it looks really close to being done.
I can provide an updated patch. Credit will still go to David though. I'll have one up soon.
Created attachment 53999 [details] [PATCH] Updated David's Patches For My Comments I addressed my comments about the tests and ChangeLogs. Since the meat was unchanged I left David's name on the ChangeLogs. I'm running through all layout tests now.
Results looked good to me!
Attachment 53999 [details] was posted by a committer and has review+, assigning to Joseph Pecoraro for commit.
Created attachment 54015 [details] [PATCH] Updated Test with Greater Pixel Test Difference Dan Bernstein pointed out to me that the diff in the pixel test might not be enough. This was my first time with pixel tests and I didn't know there was a tolerance! Indeed the last patch's test doesn't have more than 0.1% difference. I've got an updated test has 0.32%. This test mostly has larger fonts, and it removed the textarea from the tests, since I could not determine what that was testing before. Sorry about the extra email, and I'll be more careful with pixel tests next time.
Committed r58042 M WebCore/ChangeLog M WebCore/platform/text/CharacterNames.h M WebCore/platform/graphics/GlyphPageTreeNode.cpp M WebCore/platform/graphics/Font.h A LayoutTests/platform/mac/fast/text/format-control-expected.png A LayoutTests/platform/mac/fast/text/format-control-expected.txt A LayoutTests/platform/mac/fast/text/format-control-expected.checksum A LayoutTests/fast/text/format-control.html M LayoutTests/fast/text/zero-width-characters.html M LayoutTests/ChangeLog r58042 = ac6c98e60e5eaa8f0cd91ac73f17d8d727b545fc (trunk) http://trac.webkit.org/changeset/58042 Watching the bots.
(In reply to comment #29) > This test mostly has larger fonts, and it > removed the textarea from the tests, since I could not determine what that was > testing before. Hi, Joseph. Thanks for continuing this for me. I don't have access to the Mac on which I had originally generated the output files any more. The textarea was testing whether the ZWJ/ZWNJ characters were being displayed in an edit area or not. The control character glyphs should clearly not be displayed in a "display" environment, but there is some disagreement as to whether control characters ought to be displayed in an "edit" environment, and also how to indicate to the browser to turn on or force their display. I e-mailed several people about this, but this was never resolved. I think it makes sense for glyphs to be displayed in "edit mode" (since otherwise how would one know if one has typed, e.g., an invisible zero-width character?), but I was not able to find any standards or documentation that defines when this should happen. If such a standard does not exist, one ought to be made.
(In reply to comment #31) > The control character glyphs should clearly not be displayed in a > "display" environment, but there is some disagreement as to whether control > characters ought to be displayed in an "edit" environment, and also how to > indicate to the browser to turn on or force their display. I e-mailed several > people about this, but this was never resolved. I think it makes sense for > glyphs to be displayed in "edit mode" (since otherwise how would one know if > one has typed, e.g., an invisible zero-width character?), but I was not able to > find any standards or documentation that defines when this should happen. If > such a standard does not exist, one ought to be made. I see. Thanks for the explanation! I also don't really know what makes the most sense for these characters in an "edit mode" context.
While it seems fine to make these joining characters detectable in “edit mode”, I don’t think the right way to do it is to make them visible characters. The fact that characters are used to achieve the effect seems like an implementation detail to me, akin to displaying "<b>" instead of just showing bold text.