Summary: | Make TextCodecUTF8 handle 8 bit data without converting to UChar's | ||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Michael Saboff <msaboff> | ||||||||||||||||||||||||||
Component: | DOM | Assignee: | Michael Saboff <msaboff> | ||||||||||||||||||||||||||
Status: | RESOLVED FIXED | ||||||||||||||||||||||||||||
Severity: | Normal | CC: | ap, dglazkov, gustavo, philn, webkit.review.bot, xan.lopez | ||||||||||||||||||||||||||
Priority: | P2 | ||||||||||||||||||||||||||||
Version: | 528+ (Nightly build) | ||||||||||||||||||||||||||||
Hardware: | All | ||||||||||||||||||||||||||||
OS: | All | ||||||||||||||||||||||||||||
Bug Depends on: | 90319 | ||||||||||||||||||||||||||||
Bug Blocks: | 90321 | ||||||||||||||||||||||||||||
Attachments: |
|
Description
Michael Saboff
2012-06-29 15:50:37 PDT
> Much of the UTF-8 tagged resources on the web can be processed as 8-bit data using 8-bit strings.
This sounds surprising to me. What kind of data do we have to support this?
Created attachment 151834 [details] Patch for Review This patch depends on the changes for https://bugs.webkit.org/show_bug.cgi?id=90319. I will post a separate patch that contains both 90319 and this patch for the bots to check. Created attachment 151835 [details]
Combined patched for 90319 and 90320 for EWS
Comment on attachment 151835 [details] Combined patched for 90319 and 90320 for EWS Attachment 151835 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/13200556 Comment on attachment 151835 [details] Combined patched for 90319 and 90320 for EWS Attachment 151835 [details] did not pass mac-ews (mac): Output: http://queues.webkit.org/results/13180648 Comment on attachment 151834 [details] Patch for Review Attachment 151834 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/13180649 Comment on attachment 151835 [details] Combined patched for 90319 and 90320 for EWS Attachment 151835 [details] did not pass qt-wk2-ews (qt): Output: http://queues.webkit.org/results/13198600 Comment on attachment 151835 [details] Combined patched for 90319 and 90320 for EWS Attachment 151835 [details] did not pass win-ews (win): Output: http://queues.webkit.org/results/13180655 Comment on attachment 151835 [details] Combined patched for 90319 and 90320 for EWS Attachment 151835 [details] did not pass qt-ews (qt): Output: http://queues.webkit.org/results/13209553 Comment on attachment 151834 [details] Patch for Review Attachment 151834 [details] did not pass win-ews (win): Output: http://queues.webkit.org/results/13198606 Comment on attachment 151834 [details] Patch for Review Attachment 151834 [details] did not pass qt-wk2-ews (qt): Output: http://queues.webkit.org/results/13198608 Comment on attachment 151834 [details] Patch for Review Attachment 151834 [details] did not pass efl-ews (efl): Output: http://queues.webkit.org/results/13208534 Comment on attachment 151834 [details] Patch for Review Attachment 151834 [details] did not pass mac-ews (mac): Output: http://queues.webkit.org/results/13204550 Comment on attachment 151834 [details] Patch for Review Attachment 151834 [details] did not pass mac-ews (mac): Output: http://queues.webkit.org/results/13180676 Comment on attachment 151834 [details] Patch for Review Attachment 151834 [details] did not pass gtk-ews (gtk): Output: http://queues.webkit.org/results/13203574 (In reply to comment #1) > > Much of the UTF-8 tagged resources on the web can be processed as 8-bit data using 8-bit strings. > > This sounds surprising to me. What kind of data do we have to support this? I was surprised as well. I suspect that the 8 bit UTF-8 sites are actually all ASCII. I built an instrumented version of both the Latin-1 and UTF-8 codecs that output whether it found 8 or 16 bit data. I then went to many common web sites including European (London TImes, Le Monde, Der Spiegel, Spanish and Italian newspapers among others) and Asian news papers for both China and Japan. I even went to www.haaretz.co.il, the Hebrew language newspaper site from Israel. I was surprised that far and away the majority of text streams were 8-bit. It appears that the "16 bit" sites will use many 8 bit files along with some 16 bit files. My guess is that they are using common JS libraries. Also, I think it is likely that CSS files are predominately 8 bit. It also seems that most US based sites are using UTF-8 instead of Latin-1. Craiglist appears to be exclusively Latin-1 as is www.spiegel.de, where most other common sites are either a mix of UTF-8 and Latin-1 or UTF-8 exclusively. I did not capture and analyze the number of 8 versus 16 bit, rather it was from watching the 8 vs 16 reporting while visiting many, many sites. Created attachment 152006 [details]
Updated Patch with build fixes
Created attachment 152007 [details]
Combined patches for 90319 and 90320 for EWS with build fix
Comment on attachment 152006 [details] Updated Patch with build fixes Attachment 152006 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/13206793 Comment on attachment 152006 [details] Updated Patch with build fixes Attachment 152006 [details] did not pass mac-ews (mac): Output: http://queues.webkit.org/results/13208782 Comment on attachment 152006 [details] Updated Patch with build fixes Attachment 152006 [details] did not pass win-ews (win): Output: http://queues.webkit.org/results/13200834 Comment on attachment 152006 [details] Updated Patch with build fixes Attachment 152006 [details] did not pass efl-ews (efl): Output: http://queues.webkit.org/results/13205772 Comment on attachment 152007 [details] Combined patches for 90319 and 90320 for EWS with build fix Attachment 152007 [details] did not pass mac-ews (mac): Output: http://queues.webkit.org/results/13204788 Comment on attachment 152007 [details] Combined patches for 90319 and 90320 for EWS with build fix Attachment 152007 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/13201771 Comment on attachment 152007 [details] Combined patches for 90319 and 90320 for EWS with build fix Attachment 152007 [details] did not pass win-ews (win): Output: http://queues.webkit.org/results/13202773 Comment on attachment 152006 [details] Updated Patch with build fixes Attachment 152006 [details] did not pass qt-wk2-ews (qt): Output: http://queues.webkit.org/results/13198891 Created attachment 152025 [details]
Really fix the build failures
Created attachment 152028 [details]
Another Combined patches for 90319 and 90320 for EWS with build fix
Attachment 152028 [details] did not pass style-queue:
Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WTF/ChangeLog', u'Source/WTF/wtf/te..." exit_code: 1
Source/WebCore/platform/text/TextCodecUTF8.cpp:194: Should have only a single space after a punctuation in a comment. [whitespace/comments] [5]
Total errors found: 1 in 8 files
If any of these errors are false positives, please file a bug against check-webkit-style.
Attachment 152025 [details] did not pass style-queue:
Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1
Source/WebCore/platform/text/TextCodecUTF8.cpp:194: Should have only a single space after a punctuation in a comment. [whitespace/comments] [5]
Total errors found: 1 in 3 files
If any of these errors are false positives, please file a bug against check-webkit-style.
Comment on attachment 152025 [details] Really fix the build failures Attachment 152025 [details] did not pass mac-ews (mac): Output: http://queues.webkit.org/results/13207738 Comment on attachment 152025 [details] Really fix the build failures Attachment 152025 [details] did not pass win-ews (win): Output: http://queues.webkit.org/results/13202792 Comment on attachment 152025 [details] Really fix the build failures Attachment 152025 [details] did not pass efl-ews (efl): Output: http://queues.webkit.org/results/13202800 Comment on attachment 152025 [details] Really fix the build failures Attachment 152025 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/13221345 Comment on attachment 152025 [details] Really fix the build failures Attachment 152025 [details] did not pass gtk-ews (gtk): Output: http://queues.webkit.org/results/13208830 Comment on attachment 152028 [details] Another Combined patches for 90319 and 90320 for EWS with build fix Attachment 152028 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/13180953 New failing tests: fast/text/international/thai-line-breaks.html http/tests/incremental/slow-utf8-html.pl Created attachment 152057 [details]
Archive of layout-test-results from gce-cr-linux-08
The attached test failures were seen while running run-webkit-tests on the chromium-ews.
Bot: gce-cr-linux-08 Port: <class 'webkitpy.common.config.ports.ChromiumXVFBPort'> Platform: Linux-2.6.39-gcg-201203291735-x86_64-with-Ubuntu-10.04-lucid
Comment on attachment 152025 [details] Really fix the build failures Attachment 152025 [details] did not pass qt-wk2-ews (qt): Output: http://queues.webkit.org/results/13221363 Comment on attachment 152025 [details] Really fix the build failures Attachment 152025 [details] did not pass qt-ews (qt): Output: http://queues.webkit.org/results/13182942 Comment on attachment 152028 [details] Another Combined patches for 90319 and 90320 for EWS with build fix Attachment 152028 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/13208864 New failing tests: fast/text/international/thai-line-breaks.html http/tests/incremental/slow-utf8-html.pl Created attachment 152078 [details]
Archive of layout-test-results from gce-cr-linux-05
The attached test failures were seen while running run-webkit-tests on the chromium-ews.
Bot: gce-cr-linux-05 Port: <class 'webkitpy.common.config.ports.ChromiumXVFBPort'> Platform: Linux-2.6.39-gcg-201203291735-x86_64-with-Ubuntu-10.04-lucid
Created attachment 152121 [details]
Patch with Linux test fix and style fin
Fixed the handling of partial sequences right at the transition from 8 to 16 bit decoding.
Fixed 2 spaces in comment style issue.
Will post another double patch for EWS.
Created attachment 152122 [details]
Combined patches for 90319 and 90320 for EWS with test and style fixes
Comment on attachment 152121 [details] Patch with Linux test fix and style fin Attachment 152121 [details] did not pass win-ews (win): Output: http://queues.webkit.org/results/13203935 Comment on attachment 152121 [details] Patch with Linux test fix and style fin Attachment 152121 [details] did not pass efl-ews (efl): Output: http://queues.webkit.org/results/13206930 Comment on attachment 152121 [details] Patch with Linux test fix and style fin Attachment 152121 [details] did not pass qt-wk2-ews (qt): Output: http://queues.webkit.org/results/13221465 Comment on attachment 152122 [details] Combined patches for 90319 and 90320 for EWS with test and style fixes Attachment 152122 [details] did not pass efl-ews (efl): Output: http://queues.webkit.org/results/13202910 Comment on attachment 152121 [details] Patch with Linux test fix and style fin Attachment 152121 [details] did not pass qt-ews (qt): Output: http://queues.webkit.org/results/13221467 Comment on attachment 152122 [details] Combined patches for 90319 and 90320 for EWS with test and style fixes Attachment 152122 [details] did not pass qt-wk2-ews (qt): Output: http://queues.webkit.org/results/13205901 Comment on attachment 152122 [details] Combined patches for 90319 and 90320 for EWS with test and style fixes Attachment 152122 [details] did not pass qt-ews (qt): Output: http://queues.webkit.org/results/13199949 Comment on attachment 152121 [details] Patch with Linux test fix and style fin Attachment 152121 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/13232048 Comment on attachment 152122 [details] Combined patches for 90319 and 90320 for EWS with test and style fixes Attachment 152122 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/13204957 Comment on attachment 152121 [details] Patch with Linux test fix and style fin Attachment 152121 [details] did not pass mac-ews (mac): Output: http://queues.webkit.org/results/13199957 Created attachment 152293 [details]
Patch with leftover fprintf removed
Created attachment 152295 [details]
Combined patches for 90319 and 90320 for EWS with leftover fprintf removed
Comment on attachment 152293 [details] Patch with leftover fprintf removed Attachment 152293 [details] did not pass mac-ews (mac): Output: http://queues.webkit.org/results/13236197 Comment on attachment 152293 [details] Patch with leftover fprintf removed Attachment 152293 [details] did not pass win-ews (win): Output: http://queues.webkit.org/results/13241162 Comment on attachment 152293 [details] Patch with leftover fprintf removed Attachment 152293 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/13243145 Comment on attachment 152293 [details] Patch with leftover fprintf removed Attachment 152293 [details] did not pass qt-wk2-ews (qt): Output: http://queues.webkit.org/results/13235212 Comment on attachment 152293 [details] Patch with leftover fprintf removed Attachment 152293 [details] did not pass efl-ews (efl): Output: http://queues.webkit.org/results/13232317 Comment on attachment 152293 [details] Patch with leftover fprintf removed Attachment 152293 [details] did not pass qt-ews (qt): Output: http://queues.webkit.org/results/13232319 Committed r123011: <http://trac.webkit.org/changeset/123011> |