Bug 39863

Summary: innerText does not include a letter with :first-letter applied
Product: WebKit Reporter: Yoshiki Hayashi <yhayashi>
Component: DOMAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: aestes, ap, commit-queue, darin, hamaji, mitz, ojan, tkent, webkit.review.bot, yhayashi
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: PC   
OS: All   
Bug Depends on:    
Bug Blocks: 53272    
Attachments:
Description Flags
Testcase
none
Proposed Patch
none
Proposed Patch
none
Proposed Patch
none
Proposed Patch
none
Proposed Patch
none
Proposed Patch
none
Proposed Patch
none
Proposed Patch
none
Proposed Patch with style fix
none
Proposed Patch without unnecessary return. Sorry for spam.
none
Proposed Patch with a few bug fixes none

Description Yoshiki Hayashi 2010-05-27 23:52:55 PDT
Created attachment 57297 [details]
Testcase

If :first-letter rules is applied to a block level element, innerText does not include the first letter, as can be seen by the attached test case.
Comment 1 Yoshiki Hayashi 2010-05-28 00:06:41 PDT
Created attachment 57300 [details]
Proposed Patch
Comment 2 Yoshiki Hayashi 2010-05-28 04:19:32 PDT
Sorry, forgot to include one test affected by this change.
Comment 3 Yoshiki Hayashi 2010-05-28 04:21:06 PDT
Created attachment 57316 [details]
Proposed Patch
Comment 4 WebKit Review Bot 2010-05-28 04:22:50 PDT
Attachment 57316 [details] did not pass style-queue:

Failed to run "['WebKitTools/Scripts/check-webkit-style', '--no-squash']" exit_code: 1
WebCore/editing/TextIterator.cpp:379:  Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side.  [whitespace/operators] [4]
Total errors found: 1 in 7 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 5 Darin Adler 2010-05-28 08:43:38 PDT
Comment on attachment 57316 [details]
Proposed Patch

Looks good.

It's frustrating that this ends up copying and pasting so much code. I would prefer a code path that somehow reused more of what was there for things like handling collapsed space and such.

I'd like to see the new logic factored out into a separate function so handleTextNode doesn't get so much bigger. That might also make it clearer how to factor this so it shares more code with the normal text node handling.

> +        if (fragment->firstLetter() && fragment->firstLetter()->firstChild() && fragment->firstLetter()->firstChild()->isText() && toRenderText(fragment->firstLetter()->firstChild())->isTextFragment()) {
> +            RenderTextFragment* firstLetter = static_cast<RenderTextFragment*>(fragment->firstLetter()->firstChild());

Is there a way to write this more generally? It seems fragile to assume here the particular arrangement of renderers. We don't want to have to modify the text iterator any time we change the details of how RenderTextFragment works.

> +                emitText(m_node, firstLetter, 0, 1);

This assumes that the first letter will be a single character at the beginning of the DOM node. That's not necessarily true. The first letter could be a surrogate pair, for example. I think the text length needs to come from a computation rather than a hard-coded 1. And perhaps the offset too. For example there could be collapsed whitespace at the beginning of the text node. We need test cases that cover this, I think.

I could almost say review+, but I think the "0, 1" thing here is wrong so I'll say review- and ask you to cover that case correctly. I may be mistaken, and if so, please feel free to put this same patch up for review again after explaining what I got wrong.
Comment 6 Alexey Proskuryakov 2010-05-28 12:35:06 PDT
See also: bug 6185.
Comment 7 Yoshiki Hayashi 2010-05-31 04:30:14 PDT
Thank you for review!

(In reply to comment #5)

> It's frustrating that this ends up copying and pasting so much code. I would prefer a code path that somehow reused more of what was there for things like handling collapsed space and such.

I created a new method named handleTextNodeFirstLetter as suggested below.  I tried to reuse as much logic as possible but I couldn't get rid of collapsed space handling code from handleTextNodeFirstLetter because there are lots of early exit code in handleTextNode and in each of them, handleTextNodeFirstLetter needs to be called and needs to handle outputting a space character when needed.

I ended up littering around lots of handleTextNodeFirstLetter calling code in handleTextNode and handleTextBox but this should be more correct with respect to right to left language than my previous patch.

> I'd like to see the new logic factored out into a separate function so handleTextNode doesn't get so much bigger. That might also make it clearer how to factor this so it shares more code with the normal text node handling.

This is now done.

> > +        if (fragment->firstLetter() && fragment->firstLetter()->firstChild() && fragment->firstLetter()->firstChild()->isText() && toRenderText(fragment->firstLetter()->firstChild())->isTextFragment()) {
> > +            RenderTextFragment* firstLetter = static_cast<RenderTextFragment*>(fragment->firstLetter()->firstChild());
> 
> Is there a way to write this more generally? It seems fragile to assume here the particular arrangement of renderers. We don't want to have to modify the text iterator any time we change the details of how RenderTextFragment works.

In the new patch, I made the assumption that firstLetter is a container (either RenderBlock or RenderInline) that holds at least one RenderText in children.  I could change it to traverse the whole tree of firstLetter to output all RenderText in that tree but it looked overkill to me.  I can certainly change the code that way if you prefer it.

> 
> > +                emitText(m_node, firstLetter, 0, 1);
> 
> This assumes that the first letter will be a single character at the beginning of the DOM node. That's not necessarily true. The first letter could be a surrogate pair, for example. I think the text length needs to come from a computation rather than a hard-coded 1. And perhaps the offset too. For example there could be collapsed whitespace at the beginning of the text node. We need test cases that cover this, I think.
> 
> I could almost say review+, but I think the "0, 1" thing here is wrong so I'll say review- and ask you to cover that case correctly. I may be mistaken, and if so, please feel free to put this same patch up for review again after explaining what I got wrong.

You are absolutely right about that.  In addition to surrogate pairs, the first letter object could contain multiple letters like '---a' because punctuations are not considered when finding the character to apply :first-letter and thus it needs to hold all previous characters until first non-punctuation letter is found.

As to bug 6185, this patch doesn't fix the problem but I'll see if I can fix that as well.
Comment 8 Yoshiki Hayashi 2010-05-31 04:31:07 PDT
Created attachment 57451 [details]
Proposed Patch
Comment 9 WebKit Review Bot 2010-05-31 04:34:56 PDT
Attachment 57451 [details] did not pass style-queue:

Failed to run "['WebKitTools/Scripts/check-webkit-style', '--no-squash']" exit_code: 1
WebCore/editing/TextIterator.cpp:379:  Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side.  [whitespace/operators] [4]
Total errors found: 1 in 7 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 10 Yoshiki Hayashi 2010-05-31 04:36:43 PDT
Created attachment 57452 [details]
Proposed Patch
Comment 11 WebKit Review Bot 2010-05-31 04:37:42 PDT
Attachment 57452 [details] did not pass style-queue:

Failed to run "['WebKitTools/Scripts/check-webkit-style', '--no-squash']" exit_code: 1
WebCore/editing/TextIterator.cpp:379:  Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side.  [whitespace/operators] [4]
Total errors found: 1 in 7 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 12 Yoshiki Hayashi 2010-05-31 04:39:51 PDT
Sorry, for some reason I can't upload new patch via webkit-patch script and it keeps uploading the previous one.  I'll try manually adding it.
Comment 13 Yoshiki Hayashi 2010-05-31 04:43:06 PDT
Created attachment 57453 [details]
Proposed Patch
Comment 14 WebKit Review Bot 2010-05-31 04:46:45 PDT
Attachment 57453 [details] did not pass style-queue:

Failed to run "['WebKitTools/Scripts/check-webkit-style', '--no-squash']" exit_code: 1
WebCore/editing/TextIterator.cpp:578:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
WebCore/editing/TextIterator.cpp:653:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
WebCore/editing/TextIterator.cpp:654:  An else if statement should be written as an if statement when the prior "if" concludes with a return, break, continue or goto statement.  [readability/control_flow] [4]
Total errors found: 3 in 7 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 15 mitz 2010-05-31 09:43:21 PDT
Comment on attachment 57453 [details]
Proposed Patch

Does this work with "white-space: pre" text?

WebCore/editing/TextIterator.cpp:593
 +                  while (runStart < runEnd && str[runStart] == ' ')
 +                      ++runStart;

What about other kinds of whitespace and newline? The logic in <http://trac.webkit.org/browser/trunk/WebCore/rendering/RenderBlock.cpp?rev=60408#L4914> is different. Could you instead use the InlineTextBox’s start and end, like handleTextBox() does?
Comment 16 Yoshiki Hayashi 2010-05-31 21:19:54 PDT
Thank you for review!

(In reply to comment #15)
> (From update of attachment 57453 [details])
> Does this work with "white-space: pre" text?

Good point.  No, it doesn't work.  I'm going to attach a new patch to fix it which just adds if (renderer->style()->collapseWhiteSpace()) in front of while loop.

> WebCore/editing/TextIterator.cpp:593
>  +                  while (runStart < runEnd && str[runStart] == ' ')
>  +                      ++runStart;
> 
> What about other kinds of whitespace and newline? The logic in <http://trac.webkit.org/browser/trunk/WebCore/rendering/RenderBlock.cpp?rev=60408#L4914> is different. Could you instead use the InlineTextBox’s start and end, like handleTextBox() does?

There are two cases when handling leading whitespaces.  One is whitespaces are collapsed and another is not collapsed.  When whitespace is not collapsed, all leading whitespaces should be preserved as you pointed out above.   So this whil e loop shouldn't be run.  When whitespaces should be collapsed, those should be already replaced to one ' ' so the above loop should correctly skip that.  The RenderBlock code you mentioned also skips punctuations when finding the first letter that could get :first-letter to be applied but here, we want to output them as text so we don't need special handling for those.  So unless I'm missing something (which is quite possible as I just did in the previous patch), I think the new code should work.

As to the latter point, if I understand the code correctly, RenderTextFragment doesn't have InlineTextBox and it holds text directly.  RenderTextFragment itself has start and end but those are offset into original text which will produce the text fragment (text_fragment = original_text.substring(start, end)) so those can't be used, either.  So looking at the RenderTextFragment.cpp, I don't know if there's a better way to do it.
Comment 17 Yoshiki Hayashi 2010-05-31 21:21:18 PDT
Created attachment 57515 [details]
Proposed Patch
Comment 18 WebKit Review Bot 2010-05-31 21:22:40 PDT
Attachment 57515 [details] did not pass style-queue:

Failed to run "['WebKitTools/Scripts/check-webkit-style', '--no-squash']" exit_code: 1
WebCore/editing/TextIterator.cpp:578:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
WebCore/editing/TextIterator.cpp:654:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
WebCore/editing/TextIterator.cpp:655:  An else if statement should be written as an if statement when the prior "if" concludes with a return, break, continue or goto statement.  [readability/control_flow] [4]
Total errors found: 3 in 7 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 19 Yoshiki Hayashi 2010-05-31 21:30:56 PDT
Oops.  Fixing style issues...
Comment 20 Yoshiki Hayashi 2010-05-31 21:31:55 PDT
Created attachment 57516 [details]
Proposed Patch
Comment 21 mitz 2010-05-31 21:48:24 PDT
(In reply to comment #16)

> There are two cases when handling leading whitespaces.  One is whitespaces are collapsed and another is not collapsed.  When whitespace is not collapsed, all leading whitespaces should be preserved as you pointed out above.   So this whil e loop shouldn't be run.  When whitespaces should be collapsed, those should be already replaced to one ' ' so the above loop should correctly skip that.  The RenderBlock code you mentioned also skips punctuations when finding the first letter that could get :first-letter to be applied but here, we want to output them as text so we don't need special handling for those.  So unless I'm missing something (which is quite possible as I just did in the previous patch), I think the new code should work.

I think you’re missing the part where isSpaceOrNewline() differs from comparing to ' '.

> As to the latter point, if I understand the code correctly, RenderTextFragment doesn't have InlineTextBox and it holds text directly.  RenderTextFragment itself has start and end but those are offset into original text which will produce the text fragment (text_fragment = original_text.substring(start, end)) so those can't be used, either.  So looking at the RenderTextFragment.cpp, I don't know if there's a better way to do it.

All rendered text is rendered by InlineTextBox instances. In your patch, handleTextNodeFirstLetter() is called from handleTextBox(), where the text box has a RenderTextFragment() as its renderer. The offsets in the text box are into the RenderTextFragment’s text(), and I think you can use the RenderTextFragment’s start and end to map those into offsets in the node. Does that make sense?
Comment 22 Yoshiki Hayashi 2010-06-01 04:59:52 PDT
(In reply to comment #21)
> (In reply to comment #16)
> 
> > There are two cases when handling leading whitespaces.  One is whitespaces are collapsed and another is not collapsed.  When whitespace is not collapsed, all leading whitespaces should be preserved as you pointed out above.   So this whil e loop shouldn't be run.  When whitespaces should be collapsed, those should be already replaced to one ' ' so the above loop should correctly skip that.  The RenderBlock code you mentioned also skips punctuations when finding the first letter that could get :first-letter to be applied but here, we want to output them as text so we don't need special handling for those.  So unless I'm missing something (which is quite possible as I just did in the previous patch), I think the new code should work.
> 
> I think you’re missing the part where isSpaceOrNewline() differs from comparing to ' '.

Sorry for being dense but if you have html fragment that only works with isSpaceOrNewline, I'd like to add it to the testcase.  Meanwhile, I'll change it to use isSpaceOrNewline.

> > As to the latter point, if I understand the code correctly, RenderTextFragment doesn't have InlineTextBox and it holds text directly.  RenderTextFragment itself has start and end but those are offset into original text which will produce the text fragment (text_fragment = original_text.substring(start, end)) so those can't be used, either.  So looking at the RenderTextFragment.cpp, I don't know if there's a better way to do it.
> 
> All rendered text is rendered by InlineTextBox instances. In your patch, handleTextNodeFirstLetter() is called from handleTextBox(), where the text box has a RenderTextFragment() as its renderer. The offsets in the text box are into the RenderTextFragment’s text(), and I think you can use the RenderTextFragment’s start and end to map those into offsets in the node. Does that make sense?

Ah, yes, thank you for the explanation.  It means I can unify the logic more, so I'll work on rewriting my patch.  Unfortunately I was busy doing other stuff today but should be able to find some time tomorrow.
Comment 23 Yoshiki Hayashi 2010-06-03 02:32:48 PDT
It took more time than expected but revised patch is now ready and I'll attach it shortly.  It would be nice if you could take another look.
Comment 24 Yoshiki Hayashi 2010-06-03 02:35:33 PDT
Created attachment 57747 [details]
Proposed Patch
Comment 25 WebKit Review Bot 2010-06-03 02:38:17 PDT
Attachment 57747 [details] did not pass style-queue:

Failed to run "['WebKitTools/Scripts/check-webkit-style', '--no-squash']" exit_code: 1
WebCore/editing/TextIterator.cpp:514:  One line control clauses should not use braces.  [whitespace/braces] [4]
Total errors found: 1 in 7 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 26 Yoshiki Hayashi 2010-06-03 02:44:58 PDT
Created attachment 57748 [details]
Proposed Patch with style fix
Comment 27 Yoshiki Hayashi 2010-06-03 03:41:08 PDT
Created attachment 57750 [details]
Proposed Patch without unnecessary return.  Sorry for spam.
Comment 28 Yoshiki Hayashi 2010-06-04 01:57:35 PDT
Created attachment 57854 [details]
Proposed Patch with a few bug fixes
Comment 29 WebKit Review Bot 2010-06-04 02:00:26 PDT
Attachment 57854 [details] did not pass style-queue:

Failed to run "['WebKitTools/Scripts/check-webkit-style', '--no-squash']" exit_code: 1
WebCore/editing/TextIterator.cpp:512:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
Total errors found: 1 in 7 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 30 Ojan Vafai 2010-06-08 10:10:51 PDT
Comment on attachment 57854 [details]
Proposed Patch with a few bug fixes

Could not upload patch 57854 to rietveld. Rietveld is down or there's a bug in the upload bot.

Unexpected failure when landing patch!  Please file a bug against webkit-patch.
Failed to run "['/WebKit/WebKitTools/Scripts/webkit-patch', '--status-host=webkit-commit-queue.appspot.com', 'post-attachment-to-rietveld', '--force-clean', '--non-interactive', '--parent-command=rietveld-upload-queue', 57854]" exit_code: 1
Last 500 characters of output:
/WebKitTools/Scripts/webkitpy/tool/commands/stepsequence.py", line 77, in run_and_handle_errors
    self._run(tool, options, state)
  File "/WebKit/WebKitTools/Scripts/webkitpy/tool/commands/stepsequence.py", line 71, in _run
    step(tool, options).run(state)
  File "/WebKit/WebKitTools/Scripts/webkitpy/tool/steps/postcodereview.py", line 69, in run
    self._tool.bugs.set_flag_on_attachment(patch.id(), 'in-rietveld', '+')
TypeError: set_flag_on_attachment() takes exactly 6 arguments (4 given)
Comment 31 Ojan Vafai 2010-06-08 10:11:59 PDT
(In reply to comment #30)
> (From update of attachment 57854 [details])
> Could not upload patch 57854 to rietveld. Rietveld is down or there's a bug in the upload bot.

This is a bug in the rietveld upload bot. Sorry for the noise.
Comment 32 Shinichiro Hamaji 2010-08-04 22:19:28 PDT
Almost there? Dan, could you take another look on this patch when you have time?
Comment 33 Kent Tamura 2010-08-09 08:21:26 PDT
Comment on attachment 57854 [details]
Proposed Patch with a few bug fixes

I'm not familiar with this area, but it seems Yoshiki-san addressed all of issues from darin and mitz in the latest patch.
I'll set r+ tomorrow if no one objects.


WebCore/editing/TextIterator.cpp:512
 +      if (!m_handledFirstLetter && renderer->isTextFragment() && m_offset == 0)
We should write " ... && !m_offset)".
Comment 34 WebKit Commit Bot 2010-08-10 00:30:39 PDT
Comment on attachment 57854 [details]
Proposed Patch with a few bug fixes

Rejecting patch 57854 from commit-queue.

Failed to run "['WebKitTools/Scripts/build-webkit', '--debug']" exit_code: 1
Last 500 characters of output:
 setenv YACC /Developer/usr/bin/yacc
    /bin/sh -c /Users/eseidel/Projects/CommitQueue/WebKitBuild/WebCore.build/Debug/WebCore.build/Script-5DF50887116F3077005202AB.sh
** BUILD FAILED **

The following build commands failed:
WebCore:
	Distributed-CompileC /Users/eseidel/Projects/CommitQueue/WebKitBuild/WebCore.build/Debug/WebCore.build/Objects-normal/i386/TextIterator.o /Users/eseidel/Projects/CommitQueue/WebCore/editing/TextIterator.cpp normal i386 c++ com.apple.compilers.gcc.4_2
(1 failure)


Full output: http://queues.webkit.org/results/3776008
Comment 35 Kent Tamura 2010-08-10 03:09:46 PDT
Fixed the build error on Mac and the style error, and landed manually as r65062.
Comment 36 Andy Estes 2011-01-27 17:27:57 PST
This patch seems to have caused a regression where WebKit hangs when visiting http://www.kauppalehti.fi.

I filed https://bugs.webkit.org/show_bug.cgi?id=53272 to track this.