RESOLVED FIXED 90320
Make TextCodecUTF8 handle 8 bit data without converting to UChar's
https://bugs.webkit.org/show_bug.cgi?id=90320
Summary Make TextCodecUTF8 handle 8 bit data without converting to UChar's
Michael Saboff
Reported 2012-06-29 15:50:37 PDT
Much of the UTF-8 tagged resources on the web can be processed as 8-bit data using 8-bit strings. The task is to modify TextCodecUTF8 and related code to return strings appropriate for the source data.
Attachments
Patch for Review (14.08 KB, patch)
2012-07-11 18:20 PDT, Michael Saboff
webkit.review.bot: commit-queue-
Combined patched for 90319 and 90320 for EWS (21.49 KB, patch)
2012-07-11 18:21 PDT, Michael Saboff
webkit.review.bot: commit-queue-
Updated Patch with build fixes (12.53 KB, patch)
2012-07-12 11:14 PDT, Michael Saboff
webkit.review.bot: commit-queue-
Combined patches for 90319 and 90320 for EWS with build fix (20.46 KB, patch)
2012-07-12 11:16 PDT, Michael Saboff
buildbot: commit-queue-
Really fix the build failures (11.83 KB, patch)
2012-07-12 12:43 PDT, Michael Saboff
buildbot: commit-queue-
Another Combined patches for 90319 and 90320 for EWS with build fix (19.76 KB, patch)
2012-07-12 12:44 PDT, Michael Saboff
webkit.review.bot: commit-queue-
Archive of layout-test-results from gce-cr-linux-08 (351.39 KB, application/zip)
2012-07-12 14:13 PDT, WebKit Review Bot
no flags
Archive of layout-test-results from gce-cr-linux-05 (457.08 KB, application/zip)
2012-07-12 15:22 PDT, WebKit Review Bot
no flags
Patch with Linux test fix and style fin (11.94 KB, patch)
2012-07-12 18:19 PDT, Michael Saboff
buildbot: commit-queue-
Combined patches for 90319 and 90320 for EWS with test and style fixes (19.87 KB, patch)
2012-07-12 18:20 PDT, Michael Saboff
gyuyoung.kim: commit-queue-
Patch with leftover fprintf removed (11.80 KB, patch)
2012-07-13 10:37 PDT, Michael Saboff
oliver: review+
buildbot: commit-queue-
Combined patches for 90319 and 90320 for EWS with leftover fprintf removed (19.73 KB, patch)
2012-07-13 10:38 PDT, Michael Saboff
no flags
Alexey Proskuryakov
Comment 1 2012-06-30 09:48:56 PDT
> Much of the UTF-8 tagged resources on the web can be processed as 8-bit data using 8-bit strings. This sounds surprising to me. What kind of data do we have to support this?
Michael Saboff
Comment 2 2012-07-11 18:20:35 PDT
Created attachment 151834 [details] Patch for Review This patch depends on the changes for https://bugs.webkit.org/show_bug.cgi?id=90319. I will post a separate patch that contains both 90319 and this patch for the bots to check.
Michael Saboff
Comment 3 2012-07-11 18:21:17 PDT
Created attachment 151835 [details] Combined patched for 90319 and 90320 for EWS
WebKit Review Bot
Comment 4 2012-07-11 18:53:33 PDT
Comment on attachment 151835 [details] Combined patched for 90319 and 90320 for EWS Attachment 151835 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/13200556
Build Bot
Comment 5 2012-07-11 19:14:43 PDT
Comment on attachment 151835 [details] Combined patched for 90319 and 90320 for EWS Attachment 151835 [details] did not pass mac-ews (mac): Output: http://queues.webkit.org/results/13180648
WebKit Review Bot
Comment 6 2012-07-11 19:20:12 PDT
Comment on attachment 151834 [details] Patch for Review Attachment 151834 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/13180649
Early Warning System Bot
Comment 7 2012-07-11 19:22:52 PDT
Comment on attachment 151835 [details] Combined patched for 90319 and 90320 for EWS Attachment 151835 [details] did not pass qt-wk2-ews (qt): Output: http://queues.webkit.org/results/13198600
Build Bot
Comment 8 2012-07-11 19:33:39 PDT
Comment on attachment 151835 [details] Combined patched for 90319 and 90320 for EWS Attachment 151835 [details] did not pass win-ews (win): Output: http://queues.webkit.org/results/13180655
Early Warning System Bot
Comment 9 2012-07-11 19:45:26 PDT
Comment on attachment 151835 [details] Combined patched for 90319 and 90320 for EWS Attachment 151835 [details] did not pass qt-ews (qt): Output: http://queues.webkit.org/results/13209553
Build Bot
Comment 10 2012-07-11 19:56:03 PDT
Comment on attachment 151834 [details] Patch for Review Attachment 151834 [details] did not pass win-ews (win): Output: http://queues.webkit.org/results/13198606
Early Warning System Bot
Comment 11 2012-07-11 20:02:04 PDT
Comment on attachment 151834 [details] Patch for Review Attachment 151834 [details] did not pass qt-wk2-ews (qt): Output: http://queues.webkit.org/results/13198608
Gyuyoung Kim
Comment 12 2012-07-11 20:14:08 PDT
Comment on attachment 151834 [details] Patch for Review Attachment 151834 [details] did not pass efl-ews (efl): Output: http://queues.webkit.org/results/13208534
Build Bot
Comment 13 2012-07-11 20:37:28 PDT
Comment on attachment 151834 [details] Patch for Review Attachment 151834 [details] did not pass mac-ews (mac): Output: http://queues.webkit.org/results/13204550
Build Bot
Comment 14 2012-07-11 21:00:57 PDT
Comment on attachment 151834 [details] Patch for Review Attachment 151834 [details] did not pass mac-ews (mac): Output: http://queues.webkit.org/results/13180676
Gustavo Noronha (kov)
Comment 15 2012-07-11 22:41:11 PDT
Comment on attachment 151834 [details] Patch for Review Attachment 151834 [details] did not pass gtk-ews (gtk): Output: http://queues.webkit.org/results/13203574
Michael Saboff
Comment 16 2012-07-12 10:29:21 PDT
(In reply to comment #1) > > Much of the UTF-8 tagged resources on the web can be processed as 8-bit data using 8-bit strings. > > This sounds surprising to me. What kind of data do we have to support this? I was surprised as well. I suspect that the 8 bit UTF-8 sites are actually all ASCII. I built an instrumented version of both the Latin-1 and UTF-8 codecs that output whether it found 8 or 16 bit data. I then went to many common web sites including European (London TImes, Le Monde, Der Spiegel, Spanish and Italian newspapers among others) and Asian news papers for both China and Japan. I even went to www.haaretz.co.il, the Hebrew language newspaper site from Israel. I was surprised that far and away the majority of text streams were 8-bit. It appears that the "16 bit" sites will use many 8 bit files along with some 16 bit files. My guess is that they are using common JS libraries. Also, I think it is likely that CSS files are predominately 8 bit. It also seems that most US based sites are using UTF-8 instead of Latin-1. Craiglist appears to be exclusively Latin-1 as is www.spiegel.de, where most other common sites are either a mix of UTF-8 and Latin-1 or UTF-8 exclusively. I did not capture and analyze the number of 8 versus 16 bit, rather it was from watching the 8 vs 16 reporting while visiting many, many sites.
Michael Saboff
Comment 17 2012-07-12 11:14:30 PDT
Created attachment 152006 [details] Updated Patch with build fixes
Michael Saboff
Comment 18 2012-07-12 11:16:37 PDT
Created attachment 152007 [details] Combined patches for 90319 and 90320 for EWS with build fix
WebKit Review Bot
Comment 19 2012-07-12 11:20:05 PDT
Comment on attachment 152006 [details] Updated Patch with build fixes Attachment 152006 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/13206793
Build Bot
Comment 20 2012-07-12 11:22:24 PDT
Comment on attachment 152006 [details] Updated Patch with build fixes Attachment 152006 [details] did not pass mac-ews (mac): Output: http://queues.webkit.org/results/13208782
Build Bot
Comment 21 2012-07-12 11:40:33 PDT
Comment on attachment 152006 [details] Updated Patch with build fixes Attachment 152006 [details] did not pass win-ews (win): Output: http://queues.webkit.org/results/13200834
Gyuyoung Kim
Comment 22 2012-07-12 11:45:16 PDT
Comment on attachment 152006 [details] Updated Patch with build fixes Attachment 152006 [details] did not pass efl-ews (efl): Output: http://queues.webkit.org/results/13205772
Build Bot
Comment 23 2012-07-12 11:48:14 PDT
Comment on attachment 152007 [details] Combined patches for 90319 and 90320 for EWS with build fix Attachment 152007 [details] did not pass mac-ews (mac): Output: http://queues.webkit.org/results/13204788
WebKit Review Bot
Comment 24 2012-07-12 11:50:12 PDT
Comment on attachment 152007 [details] Combined patches for 90319 and 90320 for EWS with build fix Attachment 152007 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/13201771
Build Bot
Comment 25 2012-07-12 11:53:47 PDT
Comment on attachment 152007 [details] Combined patches for 90319 and 90320 for EWS with build fix Attachment 152007 [details] did not pass win-ews (win): Output: http://queues.webkit.org/results/13202773
Early Warning System Bot
Comment 26 2012-07-12 12:34:12 PDT
Comment on attachment 152006 [details] Updated Patch with build fixes Attachment 152006 [details] did not pass qt-wk2-ews (qt): Output: http://queues.webkit.org/results/13198891
Michael Saboff
Comment 27 2012-07-12 12:43:25 PDT
Created attachment 152025 [details] Really fix the build failures
Michael Saboff
Comment 28 2012-07-12 12:44:19 PDT
Created attachment 152028 [details] Another Combined patches for 90319 and 90320 for EWS with build fix
WebKit Review Bot
Comment 29 2012-07-12 12:47:23 PDT
Attachment 152028 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WTF/ChangeLog', u'Source/WTF/wtf/te..." exit_code: 1 Source/WebCore/platform/text/TextCodecUTF8.cpp:194: Should have only a single space after a punctuation in a comment. [whitespace/comments] [5] Total errors found: 1 in 8 files If any of these errors are false positives, please file a bug against check-webkit-style.
WebKit Review Bot
Comment 30 2012-07-12 12:48:35 PDT
Attachment 152025 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1 Source/WebCore/platform/text/TextCodecUTF8.cpp:194: Should have only a single space after a punctuation in a comment. [whitespace/comments] [5] Total errors found: 1 in 3 files If any of these errors are false positives, please file a bug against check-webkit-style.
Build Bot
Comment 31 2012-07-12 12:54:30 PDT
Comment on attachment 152025 [details] Really fix the build failures Attachment 152025 [details] did not pass mac-ews (mac): Output: http://queues.webkit.org/results/13207738
Build Bot
Comment 32 2012-07-12 13:02:08 PDT
Comment on attachment 152025 [details] Really fix the build failures Attachment 152025 [details] did not pass win-ews (win): Output: http://queues.webkit.org/results/13202792
Gyuyoung Kim
Comment 33 2012-07-12 13:21:42 PDT
Comment on attachment 152025 [details] Really fix the build failures Attachment 152025 [details] did not pass efl-ews (efl): Output: http://queues.webkit.org/results/13202800
WebKit Review Bot
Comment 34 2012-07-12 13:26:12 PDT
Comment on attachment 152025 [details] Really fix the build failures Attachment 152025 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/13221345
Gustavo Noronha (kov)
Comment 35 2012-07-12 13:55:35 PDT
Comment on attachment 152025 [details] Really fix the build failures Attachment 152025 [details] did not pass gtk-ews (gtk): Output: http://queues.webkit.org/results/13208830
WebKit Review Bot
Comment 36 2012-07-12 14:13:02 PDT
Comment on attachment 152028 [details] Another Combined patches for 90319 and 90320 for EWS with build fix Attachment 152028 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/13180953 New failing tests: fast/text/international/thai-line-breaks.html http/tests/incremental/slow-utf8-html.pl
WebKit Review Bot
Comment 37 2012-07-12 14:13:07 PDT
Created attachment 152057 [details] Archive of layout-test-results from gce-cr-linux-08 The attached test failures were seen while running run-webkit-tests on the chromium-ews. Bot: gce-cr-linux-08 Port: <class 'webkitpy.common.config.ports.ChromiumXVFBPort'> Platform: Linux-2.6.39-gcg-201203291735-x86_64-with-Ubuntu-10.04-lucid
Early Warning System Bot
Comment 38 2012-07-12 14:22:28 PDT
Comment on attachment 152025 [details] Really fix the build failures Attachment 152025 [details] did not pass qt-wk2-ews (qt): Output: http://queues.webkit.org/results/13221363
Early Warning System Bot
Comment 39 2012-07-12 14:35:51 PDT
Comment on attachment 152025 [details] Really fix the build failures Attachment 152025 [details] did not pass qt-ews (qt): Output: http://queues.webkit.org/results/13182942
WebKit Review Bot
Comment 40 2012-07-12 15:22:21 PDT
Comment on attachment 152028 [details] Another Combined patches for 90319 and 90320 for EWS with build fix Attachment 152028 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/13208864 New failing tests: fast/text/international/thai-line-breaks.html http/tests/incremental/slow-utf8-html.pl
WebKit Review Bot
Comment 41 2012-07-12 15:22:26 PDT
Created attachment 152078 [details] Archive of layout-test-results from gce-cr-linux-05 The attached test failures were seen while running run-webkit-tests on the chromium-ews. Bot: gce-cr-linux-05 Port: <class 'webkitpy.common.config.ports.ChromiumXVFBPort'> Platform: Linux-2.6.39-gcg-201203291735-x86_64-with-Ubuntu-10.04-lucid
Michael Saboff
Comment 42 2012-07-12 18:19:09 PDT
Created attachment 152121 [details] Patch with Linux test fix and style fin Fixed the handling of partial sequences right at the transition from 8 to 16 bit decoding. Fixed 2 spaces in comment style issue. Will post another double patch for EWS.
Michael Saboff
Comment 43 2012-07-12 18:20:02 PDT
Created attachment 152122 [details] Combined patches for 90319 and 90320 for EWS with test and style fixes
Build Bot
Comment 44 2012-07-12 18:45:32 PDT
Comment on attachment 152121 [details] Patch with Linux test fix and style fin Attachment 152121 [details] did not pass win-ews (win): Output: http://queues.webkit.org/results/13203935
Gyuyoung Kim
Comment 45 2012-07-12 19:14:21 PDT
Comment on attachment 152121 [details] Patch with Linux test fix and style fin Attachment 152121 [details] did not pass efl-ews (efl): Output: http://queues.webkit.org/results/13206930
Early Warning System Bot
Comment 46 2012-07-12 19:24:23 PDT
Comment on attachment 152121 [details] Patch with Linux test fix and style fin Attachment 152121 [details] did not pass qt-wk2-ews (qt): Output: http://queues.webkit.org/results/13221465
Gyuyoung Kim
Comment 47 2012-07-12 19:30:28 PDT
Comment on attachment 152122 [details] Combined patches for 90319 and 90320 for EWS with test and style fixes Attachment 152122 [details] did not pass efl-ews (efl): Output: http://queues.webkit.org/results/13202910
Early Warning System Bot
Comment 48 2012-07-12 19:34:35 PDT
Comment on attachment 152121 [details] Patch with Linux test fix and style fin Attachment 152121 [details] did not pass qt-ews (qt): Output: http://queues.webkit.org/results/13221467
Early Warning System Bot
Comment 49 2012-07-12 19:46:50 PDT
Comment on attachment 152122 [details] Combined patches for 90319 and 90320 for EWS with test and style fixes Attachment 152122 [details] did not pass qt-wk2-ews (qt): Output: http://queues.webkit.org/results/13205901
Early Warning System Bot
Comment 50 2012-07-12 19:57:21 PDT
Comment on attachment 152122 [details] Combined patches for 90319 and 90320 for EWS with test and style fixes Attachment 152122 [details] did not pass qt-ews (qt): Output: http://queues.webkit.org/results/13199949
WebKit Review Bot
Comment 51 2012-07-12 19:57:29 PDT
Comment on attachment 152121 [details] Patch with Linux test fix and style fin Attachment 152121 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/13232048
WebKit Review Bot
Comment 52 2012-07-12 20:32:10 PDT
Comment on attachment 152122 [details] Combined patches for 90319 and 90320 for EWS with test and style fixes Attachment 152122 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/13204957
Build Bot
Comment 53 2012-07-12 20:42:33 PDT
Comment on attachment 152121 [details] Patch with Linux test fix and style fin Attachment 152121 [details] did not pass mac-ews (mac): Output: http://queues.webkit.org/results/13199957
Michael Saboff
Comment 54 2012-07-13 10:37:26 PDT
Created attachment 152293 [details] Patch with leftover fprintf removed
Michael Saboff
Comment 55 2012-07-13 10:38:11 PDT
Created attachment 152295 [details] Combined patches for 90319 and 90320 for EWS with leftover fprintf removed
Build Bot
Comment 56 2012-07-13 10:53:24 PDT
Comment on attachment 152293 [details] Patch with leftover fprintf removed Attachment 152293 [details] did not pass mac-ews (mac): Output: http://queues.webkit.org/results/13236197
Build Bot
Comment 57 2012-07-13 10:56:25 PDT
Comment on attachment 152293 [details] Patch with leftover fprintf removed Attachment 152293 [details] did not pass win-ews (win): Output: http://queues.webkit.org/results/13241162
WebKit Review Bot
Comment 58 2012-07-13 11:04:00 PDT
Comment on attachment 152293 [details] Patch with leftover fprintf removed Attachment 152293 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/13243145
Early Warning System Bot
Comment 59 2012-07-13 11:23:06 PDT
Comment on attachment 152293 [details] Patch with leftover fprintf removed Attachment 152293 [details] did not pass qt-wk2-ews (qt): Output: http://queues.webkit.org/results/13235212
Gyuyoung Kim
Comment 60 2012-07-13 11:31:17 PDT
Comment on attachment 152293 [details] Patch with leftover fprintf removed Attachment 152293 [details] did not pass efl-ews (efl): Output: http://queues.webkit.org/results/13232317
Early Warning System Bot
Comment 61 2012-07-13 11:41:22 PDT
Comment on attachment 152293 [details] Patch with leftover fprintf removed Attachment 152293 [details] did not pass qt-ews (qt): Output: http://queues.webkit.org/results/13232319
Michael Saboff
Comment 62 2012-07-18 13:28:53 PDT
Note You need to log in before you can comment on or make changes to this bug.