For HTML file whose content is "<textarea>foobar", "foobar" isn't seen even with view-source mode. This problem happens with script, style, textarea, title, xmp, and iframe tags. Related chromium bug: http://code.google.com/p/chromium/issues/detail?id=10686
Created attachment 29948 [details] Handle texts in unfinished special tags. LayoutTests/ChangeLog | 17 +++++++++ .../fast/frames/resources/viewsource-frame-3.html | 1 + .../frames/viewsource-unfinished-tags-expected.txt | 1 + .../fast/frames/viewsource-unfinished-tags.html | 39 ++++++++++++++++++++ WebCore/ChangeLog | 16 ++++++++ WebCore/html/HTMLTokenizer.cpp | 39 ++++++++++++-------- 6 files changed, 97 insertions(+), 16 deletions(-)
Is this patch meant for review? Please mark it as such by clicking Edit link to the right of it if it is. Briefly looking over the code, I think that "specialTagParsed" is not a great name for the variable. What do these tags have in common? Answering that question should help giving a more self-documanting name to it.
Comment on attachment 29948 [details] Handle texts in unfinished special tags. Thanks for the review! I forgot to change the flag for review, sorry. Regarding the name "special", I agree that this name isn't good, but it's a bit difficult to find a good name... How about "non HTML text" since these tags have non HTML text node as their child node? If it sounds good to you, I'll modify the patch in the beginning of the next week. I think we need to change the function name "parseSpecial" to "parseNonHtmlText" as well. Thanks!
Created attachment 30009 [details] fixes for layouttest and rename "special tag" LayoutTests/ChangeLog | 24 +++++++++ ...tion-using-js-object-with-toString-expected.txt | 3 + ...dow-location-using-js-object-with-toString.html | 4 +- ...indow-shadow-location-using-string-expected.txt | 3 + .../window-shadow-location-using-string.html | 4 +- .../viewsource-frame-unfinished-script.html | 1 + .../viewsource-frame-unfinished-textarea.html | 1 + .../frames/viewsource-unfinished-tags-expected.txt | 3 + .../fast/frames/viewsource-unfinished-tags.html | 46 ++++++++++++++++++ WebCore/ChangeLog | 20 ++++++++ WebCore/html/HTMLTokenizer.cpp | 50 ++++++++++---------- WebCore/html/HTMLTokenizer.h | 4 +- 12 files changed, 132 insertions(+), 31 deletions(-)
Comment on attachment 30009 [details] fixes for layouttest and rename "special tag" Updates from the previous patch were: - Renamed functions which contain "Special" using "NonHTMLText. - Fixed layouttest by using inAnyNonHTMLText(). - Added </iframe> for two tests and fixed the expected output.
Ping for review?
Comment on attachment 29948 [details] Handle texts in unfinished special tags. In general the fix looks good, but I would suggest doing performance tests such as iBench HTML or other page load test benchmarks to confirm no regression, since the tokenizer is very performance-sensitive. Please repost with performance data.
Comment on attachment 30009 [details] fixes for layouttest and rename "special tag" Looks great! Like I said in the other patch, though, I'd like to see some perf testing of this. Please repost with perf testing results.
Comment on attachment 29948 [details] Handle texts in unfinished special tags. Sorry, the next patch contains this patch and I should have marked this as obsolete.
> In general the fix looks good, but I would suggest doing performance tests such > as iBench HTML or other page load test benchmarks to confirm no regression, > since the tokenizer is very performance-sensitive. Please repost with > performance data. I see. Thanks for the advice! I'll learn the tool and update this patch.
I ran iBench and there seem to be no performance regression. I ran HTML load speed test 3 times. Before this patch: HTML Load Speed All iterations 15.15 First iteration (downloaded) 2.07 Subsequent iteration (cached) 1.87 HTML Load Speed All iterations 14.63 First iteration (downloaded) 1.86 Subsequent iteration (cached) 1.82 HTML Load Speed All iterations 14.66 First iteration (downloaded) 1.9 Subsequent iteration (cached) 1.82 After this patch: HTML Load Speed All iterations 15.51 First iteration (downloaded) 2.6 Subsequent iteration (cached) 1.84 HTML Load Speed All iterations 14.61 First iteration (downloaded) 1.86 Subsequent iteration (cached) 1.82 HTML Load Speed All iterations 14.53 First iteration (downloaded) 1.87 Subsequent iteration (cached) 1.81 It seems that the "First iteration" of the first trial has bigger variance than other tests due to network speed fluctuation. Other results show this patch don't introduce performance regression.
See also: bug 8961 and its duplicates. Maybe this is a duplicate, too?
(In reply to comment #12) > See also: bug 8961 and its duplicates. Maybe this is a duplicate, too? I think this is not the duplicate of the bug 8961. Hyatt's patch didn't fix this bug and my patch didn't fix the bug 8961 as well. By the way I've just found that this is the duplicate of this bug: https://bugs.webkit.org/show_bug.cgi?id=24860
Created attachment 30856 [details] patch v3 LayoutTests/ChangeLog | 20 ++++++++ ...tion-using-js-object-with-toString-expected.txt | 3 + ...dow-location-using-js-object-with-toString.html | 4 +- ...indow-shadow-location-using-string-expected.txt | 3 + .../window-shadow-location-using-string.html | 4 +- .../viewsource-frame-unfinished-script.html | 1 + .../viewsource-frame-unfinished-textarea.html | 1 + .../frames/viewsource-unfinished-tags-expected.txt | 3 + .../fast/frames/viewsource-unfinished-tags.html | 46 ++++++++++++++++++ WebCore/ChangeLog | 18 +++++++ WebCore/html/HTMLTokenizer.cpp | 50 ++++++++++---------- WebCore/html/HTMLTokenizer.h | 4 +- 12 files changed, 126 insertions(+), 31 deletions(-)
As I've found Ian is saying that re-parsing is invalid, I updated the patch so that this patch only affects for view-source mode and there should be no security issues. https://bugs.webkit.org/show_bug.cgi?id=6314
Created attachment 30939 [details] patch v4 LayoutTests/ChangeLog | 20 ++++++++ ...tion-using-js-object-with-toString-expected.txt | 3 + ...dow-location-using-js-object-with-toString.html | 4 +- ...indow-shadow-location-using-string-expected.txt | 3 + .../window-shadow-location-using-string.html | 4 +- .../viewsource-frame-unfinished-script.html | 1 + .../viewsource-frame-unfinished-textarea.html | 1 + .../frames/viewsource-unfinished-tags-expected.txt | 3 + .../fast/frames/viewsource-unfinished-tags.html | 46 ++++++++++++++++++ WebCore/ChangeLog | 18 +++++++ WebCore/html/HTMLTokenizer.cpp | 50 ++++++++++---------- WebCore/html/HTMLTokenizer.h | 4 +- 12 files changed, 126 insertions(+), 31 deletions(-)
Comment on attachment 30939 [details] patch v4 Sorry, I fixed my email address in ChangeLog.
Created attachment 30990 [details] patch v5 LayoutTests/ChangeLog | 22 +++++++++ ...tion-using-js-object-with-toString-expected.txt | 3 + ...dow-location-using-js-object-with-toString.html | 4 +- ...indow-shadow-location-using-string-expected.txt | 3 + .../window-shadow-location-using-string.html | 4 +- .../viewsource-frame-unfinished-script.html | 1 + .../viewsource-frame-unfinished-textarea.html | 1 + .../frames/viewsource-unfinished-tags-expected.txt | 3 + .../fast/frames/viewsource-unfinished-tags.html | 46 ++++++++++++++++++ WebCore/ChangeLog | 20 ++++++++ WebCore/html/HTMLTokenizer.cpp | 50 ++++++++++---------- WebCore/html/HTMLTokenizer.h | 4 +- 12 files changed, 130 insertions(+), 31 deletions(-)
Comment on attachment 30990 [details] patch v5 I inserted some line breaks in ChangeLog as I was advised that we should keep lines in ChangeLog short.
Comment on attachment 30990 [details] patch v5 I'm not an expert in the HTML tokenizer, but this patch LGTM.
Landed in @r44608.