Bug 16131 - ZWNJ - Display non-printing, invisible character
Summary: ZWNJ - Display non-printing, invisible character
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 523.x (Safari 3)
Hardware: Mac OS X 10.5
: P2 Major
Assignee: Joseph Pecoraro
URL: http://irmug.org/portal/modules.php?o...
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2007-11-25 10:51 PST by Ahmad
Modified: 2010-04-22 17:38 PDT (History)
11 users (show)

See Also:


Attachments
non-printing ZWNJ in Safari and Web kit (30.37 KB, image/jpeg)
2007-11-25 10:53 PST, Ahmad
no flags Details
Patch adding new layout tests for ZWJ and ZWNJ. (40.50 KB, patch)
2010-03-17 10:10 PDT, David Yonge-Mallo
no flags Details | Formatted Diff | Diff
Updated patch (44.30 KB, patch)
2010-03-18 16:06 PDT, David Yonge-Mallo
mitz: review-
Details | Formatted Diff | Diff
Updated patch with code moved to GlyphPageTreeNode::initializePage() (56.25 KB, patch)
2010-03-19 08:29 PDT, David Yonge-Mallo
mitz: review-
Details | Formatted Diff | Diff
Removed style changes and presentation form check from patch (47.44 KB, patch)
2010-03-19 12:18 PDT, David Yonge-Mallo
eric: review-
commit-queue: commit-queue-
Details | Formatted Diff | Diff
Removed ZWJ test case for complex font path (23.49 KB, patch)
2010-03-25 07:18 PDT, David Yonge-Mallo
joepeck: review-
Details | Formatted Diff | Diff
[PATCH] Updated David's Patches For My Comments (27.95 KB, patch)
2010-04-21 16:08 PDT, Joseph Pecoraro
darin: review+
Details | Formatted Diff | Diff
[PATCH] Updated Test with Greater Pixel Test Difference (36.12 KB, patch)
2010-04-21 19:33 PDT, Joseph Pecoraro
mitz: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ahmad 2007-11-25 10:51:31 PST
Safari Version 3.0.4 (5523.10) and WebKit r28007 display the non-printing, invisible character "ZERO WIDTH NON-JOINER U+200C". This happened after installing leopard (V. 10.5.1).
When I copied the word that contain ZWNJ from Safari and pasted it in TextEdit I noticed that while the Arabic /Persian word is displayed using Geeza Pro (as it should), the ZWNJ is borrowed from font Arial.  See the attached picture. This is not normal and very irritating, ZWNJ is invisible non-printing character and Safari should not display it.
Comment 1 Ahmad 2007-11-25 10:53:08 PST
Created attachment 17509 [details]
non-printing ZWNJ in Safari and Web kit
Comment 2 mitz 2007-11-25 10:56:55 PST
<rdar://problem/5611927>
Comment 3 mitz 2007-11-25 18:37:07 PST
Looking back at bug 13136 and subsequent patches, I think ZWJ and ZWNJ can simply be added to the set returning true from treatAsZeroWidthSpace(), given how that function is used.
Comment 4 David Yonge-Mallo 2010-03-17 10:10:08 PDT
Created attachment 50917 [details]
Patch adding new layout tests for ZWJ and ZWNJ.

Added layout tests for the handling of ZWJ and ZWNJ.  These test that ZWJ and ZWNJ glyphs are not visible in simple (Latin) text, and that the control characters are taken into context for glyph-shaping in Persian.
Comment 5 Alexey Proskuryakov 2010-03-17 10:24:51 PDT
Did the bug get fixed in some indirect way? Why is it ok to just land tests without code changes?
Comment 6 David Yonge-Mallo 2010-03-17 11:20:54 PDT
The ZWNJ character no longer shows in Safari v4.0.4 (5532.21.10) and WebKit r56096, nor does it show up when text containing ZWNJ is copied from Safari to WebKit, so the bug as originally described seems to have been resolved.

However, ZWJ is now effectively being treated as ZWNJ (they are both invisible and cause the glyphs beside them not to be connected to each other).  This is beyond the scope of the bug as originally described, though it's probably caused by whatever "fixed" it.  I'm working on a patch for this right now.
Comment 7 Alexey Proskuryakov 2010-03-17 11:26:04 PDT
Was it a change in WebKit, or just a difference in configuration? I'm concerned that this character could be coming from a different version of Arial, such as one installed by MS Office.
Comment 8 David Yonge-Mallo 2010-03-18 16:03:14 PDT
I've updated the patch to include code to suppress the display of ZWJ and ZWNJ glyphs.
Comment 9 David Yonge-Mallo 2010-03-18 16:06:13 PDT
Created attachment 51108 [details]
Updated patch
Comment 10 WebKit Review Bot 2010-03-18 16:11:04 PDT
Attachment 51108 [details] did not pass style-queue:

Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1
WebCore/platform/graphics/FontFastPath.cpp:242:  One space before end of line comments  [whitespace/comments] [5]
WebCore/platform/graphics/FontFastPath.cpp:247:  One space before end of line comments  [whitespace/comments] [5]
Total errors found: 2 in 14 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 11 mitz 2010-03-18 16:12:54 PDT
Comment on attachment 51108 [details]
Updated patch

> +        * platform/graphics/FontFastPath.cpp:
> +        (WebCore::Font::canUseGlyphCache): added Arabic presentation forms

Why? And how does it relate to the present bug?

> +            // FIXME: A better place for this might be in GlyphPage::glyphDataForCharacter or
> +            // Font::glyphDataForCharacter.

This should be done in GlyphPageTreeNode::initializePage() where LRM, RLM, LRE, RLE, PDF and other control characters are already handled.
Comment 12 David Yonge-Mallo 2010-03-19 08:29:56 PDT
Created attachment 51156 [details]
Updated patch with code moved to GlyphPageTreeNode::initializePage()

I've moved the code to GlyphPageTreeNode::initializePage().  The patch now makes the treatment of ZWNJ and ZWJ consistent with other Unicode format control characters such as LRM and RLM.

How would a user access the glyphs for ZWNJ and ZWJ in situations where they should actually be displayed, such as for editors or IMEs?  I'm not sure what the correct display behaviour is for format control characters when they are typed into edit fields.  The patch suppresses their display, which is consistent with how the other control characters currently behave.

The reason I added a check for Arabic presentation forms is because they should be handled by the complex font path.  It relates to the bug in that ZWNJ and ZWJ are not handled correctly for Arabic script when the presentation forms are used.  Some web pages are encoded using the presentation forms (even though that goes against Unicode's rules), and without the check they don't shape (apply ZWJ and ZWNJ) properly.  

If this bug is restricted to just the *visibility* of the ZWNJ and ZWJ *glyphs*, as opposed to the correct handling and display of ZWNJ and ZWJ more generally, I can create a new bug report for the other situations.
Comment 13 mitz 2010-03-19 09:29:06 PDT
(In reply to comment #12)
> The reason I added a check for Arabic presentation forms is because they should
> be handled by the complex font path.

You keep saying this but I do not undestand why that is so. Presentational forms are not subject to contextual shaping or required ligatures—they represent the contextual forms and ligatures themselves. Perhaps an example of where your change makes a different in behavior will help to understand why you think this change is needed.
Comment 14 Darin Adler 2010-03-19 09:35:52 PDT
The general principle is that a code change needs to be accompanied by a test case showing incorrect behavior before and correct behavior afterward. The check for Arabic presentation forms is no exception. A test case showing what's wrong and how the change fixes the problem is required for that specific change.
Comment 15 mitz 2010-03-19 09:38:17 PDT
Comment on attachment 51156 [details]
Updated patch with code moved to GlyphPageTreeNode::initializePage()

This patch consist mostly of whitespace and coding style changes. It’s okay to make such changes in a function you’re already changing as part of the bugfix, but otherwise please keep large-scale changes out of a bugfix patch.

> +2010-03-19  David Yonge-Mallo  <davinci@chromium.org>
> +
> +        Reviewed by NOBODY (OOPS!).
> +
> +        Fix the (non)display of glyphs for ZWJ and ZWNJ in simple font code path.
> +        Added logic to use complex font code path for Arabic presentation forms.
> +
> +        https://bugs.webkit.org/show_bug.cgi?id=16131
> +
> +        Tests: fast/text/format-control.html
> +               fast/text/zero-width-characters.html
> +               fast/text/international/zero-width-joiner-and-non-joiner.html
> +
> +        * platform/graphics/Font.h:
> +        (WebCore::Font::treatAsZeroWidthSpace): treat ZWJ and ZWNJ as ZWSP.
> +        * platform/graphics/FontFastPath.cpp:
> +        (WebCore::Font::canUseGlyphCache): added Arabic presentation forms
> +        * platform/graphics/WidthIterator.cpp:
> +        (WebCore::WidthIterator::advance): don't display glyphs for ZWJ and ZWNJ.

This change log entry is outdated.


> -        return (((c & ~0xFF) == 0 && gRoundingHackCharacterTable[c]));
> +        return (( !(c & ~0xFF) && gRoundingHackCharacterTable[c]));

This is an unrelated style change, so like I said, it should not be included in this patch, but I can’t help pointing out that it leaves the redundant parentheses in place and adds a space before the ! that shouldn’t be there.


> +                    buffer[zeroWidthNonJoiner - start] = zeroWidthSpace;
> +                    buffer[zeroWidthJoiner - start] = zeroWidthSpace;

This is good.

>              UErrorCode uStatus = U_ZERO_ERROR;  
>              int32_t resultLength = unorm_normalize(m_run.data(currentCharacter), 2,
>                  UNORM_NFC, UNORM_UNICODE_3_2, &normalizedCharacters[0], 2, &uStatus);
> -            if (resultLength == 1 && uStatus == 0)
> +            if (resultLength == 1 && !uStatus)

It’s true that WebKit style is to use ! instead of == 0, but in this case, I think == U_ZERO_ERROR is even more correct.
Comment 16 David Yonge-Mallo 2010-03-19 12:18:13 PDT
Created attachment 51179 [details]
Removed style changes and presentation form check from patch

I've removed the stylistic changes from the patch.  I only made them so the style checker would stop outputting errors.

You're right that presentation forms should not be subject to shaping if web sites follow the rules, but some (I don't know how many) don't.  This is particularly true for sites in Persian and Urdu, some of which use the presentation forms to force the display of certain glyphs which look better than the standard Arabic glyphs for those languages (especially ye and he-jimi).  I can include a test case, but it would contain what is essentially incorrect Unicode.  Rather than handle bad Unicode for backwards compatibility reasons, it's better to have correct language detection and have WebKit choose the right "pretty" glyphs.  So, I've dropped the representation form check from the patch.

This goes beyond the scope of the bug as originally described, but ZWNJ and ZWJ are not handled correctly in the complex text font path.  On Mac, the ZWJ acts like ZWNJ.  On Linux, the joining behaviour is correct, but the glyphs are displayed.  I'll file a separate bug for those.
Comment 17 Alexey Proskuryakov 2010-03-19 13:08:43 PDT
> I can include a test case, but it would contain what is essentially
> incorrect Unicode.  Rather than handle bad Unicode for backwards compatibility
> reasons, it's better to have correct language detection and have WebKit choose
> the right "pretty" glyphs.

I think it's worth filing a new bug about this issue. First, language-based font selection may still be far away, so adding a workaround could still make sense. Second, even with language-based font selection, we'd need regression tests for the problems it fixes.
Comment 18 WebKit Commit Bot 2010-03-24 22:14:59 PDT
Comment on attachment 51179 [details]
Removed style changes and presentation form check from patch

Rejecting patch 51179 from commit-queue.

Failed to run "['WebKitTools/Scripts/run-webkit-tests', '--no-launch-safari', '--exit-after-n-failures=1', '--quiet']" exit_code: 1
Running build-dumprendertree
Compiling Java tests
make: Nothing to be done for `default'.
Running tests from /Users/eseidel/Projects/CommitQueue/LayoutTests
Testing 12539 test cases.
fast/text/international/zero-width-joiner-and-non-joiner.html -> failed

Exiting early after 1 failures. 8723 tests run.
149.64s total testing time

8722 test cases (99%) succeeded
1 test case (<1%) had incorrect layout
5 test cases (<1%) had stderr output

Full output: http://webkit-commit-queue.appspot.com/results/1223032
Comment 19 Eric Seidel (no email) 2010-03-25 01:03:15 PDT
Comment on attachment 51179 [details]
Removed style changes and presentation form check from patch

Looks like this fail the new test and will need an update.
Comment 20 David Yonge-Mallo 2010-03-25 06:33:01 PDT
(In reply to comment #19)
> (From update of attachment 51179 [details])
> Looks like this fail the new test and will need an update.

I presume the simple font test passes and the complex test is failing.

What makes more sense, to remove the complex font test entirely or to remove just the ZWJ part of it? I think the ZWJ/ZWNJ tests should be done together. The glyphs are no longer displaying so the bug is indeed fixed, but the test fails because joining is not working, which is a separate issue.
Comment 21 David Yonge-Mallo 2010-03-25 07:18:00 PDT
Created attachment 51638 [details]
Removed ZWJ test case for complex font path

I've removed the test case for the complex font path entirely.  I also filed a new bug (bug 36596) for ZWJ not joining glyphs properly, and attached the removed test case there for reference.
Comment 22 Joseph Pecoraro 2010-04-19 09:33:51 PDT
Comment on attachment 51638 [details]
Removed ZWJ test case for complex font path

> +2010-03-25  David Yonge-Mallo  <davinci@chromium.org>
> +
> +        Reviewed by NOBODY (OOPS!).
> +
> +        Added tests for the handling of ZWJ and ZWNJ characters in simple font
> +        path.  These characters have zero width but may change the widths of text 
> +        around them.  Furthermore, their glyphs should not be displayed in
> +        simple static text.
> +
> +        https://bugs.webkit.org/show_bug.cgi?id=16131
> +
> +        * fast/text/format-control.html: Added.
> +        * fast/text/zero-width-characters.html: modified to test ZWSP, ZWJ, and ZWNJ.
> +        * platform/mac/fast/text/format-control-expected.checksum: Added.
> +        * platform/mac/fast/text/format-control-expected.png: Added.
> +        * platform/mac/fast/text/format-control-expected.txt: Added.

NIT: Since I'm going to recommend some minor changes, could you clean up the ChangeLog a little. The bug title "ZWNJ - Display non-printing, invisible character" is missing, and the bug link should go above the comments. Otherwise, this is a great comment and a good ChangeLog!

The normal order (produced by prepare-ChangeLog) is:

  Reviewed By ...

  Bug Title
  https://bugs.webkit.org/show_bug.cgi?id=...

  Comments...

  * files and comments...


> +<html><head>
> +<meta charset="utf-8"> 
> +</head>
> +<body>
> +This tests the ZWJ and ZWNJ format control characters on basic Latin text.
> +
> +<p>fi fl ff ffi ffl</p> 
> +<p>fâi fâl fâf fâfâi fâfâl</p> 
> +<p>fâi fâl fâf fâfâi fâfâl</p> 
> +<textarea rows="5" cols="80">fi fl ff ffi ffl
> +fâi fâl fâf fâfâi fâfâl
> +fâi fâl fâf fâfâi fâfâl</textarea>
> +</body>
> +</html>

This test is currently passing on Mac platforms. It would be needlessly difficult to make it fail by default on all possible platforms, but I think making it fail by default on mac platforms as well would help it to be caught on build bots. On Mac you can use a font like "Times New Roman". I would recommend duplicating the <p> and <textarea> down below but forcing that font:

  <div style="font-family: Times New Roman">
  <p>fi fl ff ffi ffl</p> 
  <p>f‌i f‌l f‌f f‌f‌i f‌f‌l</p> 
  <p>f‍i f‍l f‍f f‍f‍i f‍f‍l</p> 
  <textarea rows="5" cols="80">fi fl ff ffi ffl
  f‌i f‌l f‌f f‌f‌i f‌f‌l
  f‍i f‍l f‍f f‍f‍i f‍f‍l</textarea>
  </div>

This would require updated test results.




> +2010-03-25  David Yonge-Mallo  <davinci@chromium.org>
> +
> +        Reviewed by NOBODY (OOPS!).
> +
> +        Fix the (non)display of glyphs for ZWJ and ZWNJ in simple font code path.
> +
> +        https://bugs.webkit.org/show_bug.cgi?id=16131
> +
> +        Tests: fast/text/format-control.html
> +               fast/text/zero-width-characters.html
> +
> +        * platform/graphics/Font.h:
> +        (WebCore::Font::operator!=):
> +        (WebCore::Font::treatAsZeroWidthSpace): treat ZWNJ and ZWJ as ZWSP.
> +        * platform/graphics/GlyphPageTreeNode.cpp:
> +        (WebCore::GlyphPageTreeNode::initializePage): added ZWNJ and ZWJ.
> +        * platform/text/CharacterNames.h: added ZWNJ and ZWJ.
> +

NIT: The same ChangeLog comments that I had above.


The patch itself looks good to me! Just update the test so a regression could surely be caught!
Comment 23 Joseph Pecoraro 2010-04-21 15:02:28 PDT
Hello David. I'm checking in to if you're still actively working on this bug.
Comment 24 Robert Kroeger 2010-04-21 15:06:54 PDT
David has decided to pursue some other interests. Hopefully James can finish this off since it looks really close to being done.
Comment 25 Joseph Pecoraro 2010-04-21 15:16:42 PDT
I can provide an updated patch. Credit will still go to David though. I'll have one up soon.
Comment 26 Joseph Pecoraro 2010-04-21 16:08:35 PDT
Created attachment 53999 [details]
[PATCH] Updated David's Patches For My Comments

I addressed my comments about the tests and ChangeLogs. Since the meat was unchanged I left David's name on the ChangeLogs. I'm running through all layout tests now.
Comment 27 Joseph Pecoraro 2010-04-21 16:20:23 PDT
Results looked good to me!
Comment 28 Eric Seidel (no email) 2010-04-21 18:15:20 PDT
Attachment 53999 [details] was posted by a committer and has review+, assigning to Joseph Pecoraro for commit.
Comment 29 Joseph Pecoraro 2010-04-21 19:33:15 PDT
Created attachment 54015 [details]
[PATCH] Updated Test with Greater Pixel Test Difference

Dan Bernstein pointed out to me that the diff in the pixel test might not be enough. This was my first time with pixel tests and I didn't know there was a tolerance! Indeed the last patch's test doesn't have more than 0.1% difference. I've got an updated test has 0.32%. This test mostly has larger fonts, and it removed the textarea from the tests, since I could not determine what that was testing before.

Sorry about the extra email, and I'll be more careful with pixel tests next time.
Comment 30 Joseph Pecoraro 2010-04-21 19:49:37 PDT
Committed r58042
	M	WebCore/ChangeLog
	M	WebCore/platform/text/CharacterNames.h
	M	WebCore/platform/graphics/GlyphPageTreeNode.cpp
	M	WebCore/platform/graphics/Font.h
	A	LayoutTests/platform/mac/fast/text/format-control-expected.png
	A	LayoutTests/platform/mac/fast/text/format-control-expected.txt
	A	LayoutTests/platform/mac/fast/text/format-control-expected.checksum
	A	LayoutTests/fast/text/format-control.html
	M	LayoutTests/fast/text/zero-width-characters.html
	M	LayoutTests/ChangeLog
r58042 = ac6c98e60e5eaa8f0cd91ac73f17d8d727b545fc (trunk)
http://trac.webkit.org/changeset/58042

Watching the bots.
Comment 31 David Yonge-Mallo 2010-04-22 06:08:23 PDT
(In reply to comment #29)
> This test mostly has larger fonts, and it
> removed the textarea from the tests, since I could not determine what that was
> testing before.

Hi, Joseph.  Thanks for continuing this for me.  I don't have access to the Mac on which I had originally generated the output files any more.  The textarea was testing whether the ZWJ/ZWNJ characters were being displayed in an edit area or not.  The control character glyphs should clearly not be displayed in a "display" environment, but there is some disagreement as to whether control characters ought to be displayed in an "edit" environment, and also how to indicate to the browser to turn on or force their display.  I e-mailed several people about this, but this was never resolved.  I think it makes sense for glyphs to be displayed in "edit mode" (since otherwise how would one know if one has typed, e.g., an invisible zero-width character?), but I was not able to find any standards or documentation that defines when this should happen.  If such a standard does not exist, one ought to be made.
Comment 32 Joseph Pecoraro 2010-04-22 08:16:07 PDT
(In reply to comment #31)
> The control character glyphs should clearly not be displayed in a
> "display" environment, but there is some disagreement as to whether control
> characters ought to be displayed in an "edit" environment, and also how to
> indicate to the browser to turn on or force their display.  I e-mailed several
> people about this, but this was never resolved.  I think it makes sense for
> glyphs to be displayed in "edit mode" (since otherwise how would one know if
> one has typed, e.g., an invisible zero-width character?), but I was not able to
> find any standards or documentation that defines when this should happen.  If
> such a standard does not exist, one ought to be made.

I see. Thanks for the explanation!

I also don't really know what makes the most sense for these characters in an "edit mode" context.
Comment 33 Darin Adler 2010-04-22 17:38:22 PDT
While it seems fine to make these joining characters detectable in “edit mode”, I don’t think the right way to do it is to make them visible characters. The fact that characters are used to achieve the effect seems like an implementation detail to me, akin to displaying "<b>" instead of just showing bold text.