Bug 90320 - Make TextCodecUTF8 handle 8 bit data without converting to UChar's
Summary: Make TextCodecUTF8 handle 8 bit data without converting to UChar's
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: DOM (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Michael Saboff
URL:
Keywords:
Depends on: 90319
Blocks: 90321
  Show dependency treegraph
 
Reported: 2012-06-29 15:50 PDT by Michael Saboff
Modified: 2012-07-18 13:28 PDT (History)
6 users (show)

See Also:


Attachments
Patch for Review (14.08 KB, patch)
2012-07-11 18:20 PDT, Michael Saboff
webkit.review.bot: commit-queue-
Details | Formatted Diff | Diff
Combined patched for 90319 and 90320 for EWS (21.49 KB, patch)
2012-07-11 18:21 PDT, Michael Saboff
webkit.review.bot: commit-queue-
Details | Formatted Diff | Diff
Updated Patch with build fixes (12.53 KB, patch)
2012-07-12 11:14 PDT, Michael Saboff
webkit.review.bot: commit-queue-
Details | Formatted Diff | Diff
Combined patches for 90319 and 90320 for EWS with build fix (20.46 KB, patch)
2012-07-12 11:16 PDT, Michael Saboff
buildbot: commit-queue-
Details | Formatted Diff | Diff
Really fix the build failures (11.83 KB, patch)
2012-07-12 12:43 PDT, Michael Saboff
buildbot: commit-queue-
Details | Formatted Diff | Diff
Another Combined patches for 90319 and 90320 for EWS with build fix (19.76 KB, patch)
2012-07-12 12:44 PDT, Michael Saboff
webkit.review.bot: commit-queue-
Details | Formatted Diff | Diff
Archive of layout-test-results from gce-cr-linux-08 (351.39 KB, application/zip)
2012-07-12 14:13 PDT, WebKit Review Bot
no flags Details
Archive of layout-test-results from gce-cr-linux-05 (457.08 KB, application/zip)
2012-07-12 15:22 PDT, WebKit Review Bot
no flags Details
Patch with Linux test fix and style fin (11.94 KB, patch)
2012-07-12 18:19 PDT, Michael Saboff
buildbot: commit-queue-
Details | Formatted Diff | Diff
Combined patches for 90319 and 90320 for EWS with test and style fixes (19.87 KB, patch)
2012-07-12 18:20 PDT, Michael Saboff
gyuyoung.kim: commit-queue-
Details | Formatted Diff | Diff
Patch with leftover fprintf removed (11.80 KB, patch)
2012-07-13 10:37 PDT, Michael Saboff
oliver: review+
buildbot: commit-queue-
Details | Formatted Diff | Diff
Combined patches for 90319 and 90320 for EWS with leftover fprintf removed (19.73 KB, patch)
2012-07-13 10:38 PDT, Michael Saboff
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Saboff 2012-06-29 15:50:37 PDT
Much of the UTF-8 tagged resources on the web can be processed as 8-bit data using 8-bit strings.  The task is to modify TextCodecUTF8 and related code to return strings appropriate for the source data.
Comment 1 Alexey Proskuryakov 2012-06-30 09:48:56 PDT
> Much of the UTF-8 tagged resources on the web can be processed as 8-bit data using 8-bit strings.

This sounds surprising to me. What kind of data do we have to support this?
Comment 2 Michael Saboff 2012-07-11 18:20:35 PDT
Created attachment 151834 [details]
Patch for Review

This patch depends on the changes for https://bugs.webkit.org/show_bug.cgi?id=90319.  I will post a separate patch that contains both 90319 and this patch for the bots to check.
Comment 3 Michael Saboff 2012-07-11 18:21:17 PDT
Created attachment 151835 [details]
Combined patched for 90319 and 90320 for EWS
Comment 4 WebKit Review Bot 2012-07-11 18:53:33 PDT
Comment on attachment 151835 [details]
Combined patched for 90319 and 90320 for EWS

Attachment 151835 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/13200556
Comment 5 Build Bot 2012-07-11 19:14:43 PDT
Comment on attachment 151835 [details]
Combined patched for 90319 and 90320 for EWS

Attachment 151835 [details] did not pass mac-ews (mac):
Output: http://queues.webkit.org/results/13180648
Comment 6 WebKit Review Bot 2012-07-11 19:20:12 PDT
Comment on attachment 151834 [details]
Patch for Review

Attachment 151834 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/13180649
Comment 7 Early Warning System Bot 2012-07-11 19:22:52 PDT
Comment on attachment 151835 [details]
Combined patched for 90319 and 90320 for EWS

Attachment 151835 [details] did not pass qt-wk2-ews (qt):
Output: http://queues.webkit.org/results/13198600
Comment 8 Build Bot 2012-07-11 19:33:39 PDT
Comment on attachment 151835 [details]
Combined patched for 90319 and 90320 for EWS

Attachment 151835 [details] did not pass win-ews (win):
Output: http://queues.webkit.org/results/13180655
Comment 9 Early Warning System Bot 2012-07-11 19:45:26 PDT
Comment on attachment 151835 [details]
Combined patched for 90319 and 90320 for EWS

Attachment 151835 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/13209553
Comment 10 Build Bot 2012-07-11 19:56:03 PDT
Comment on attachment 151834 [details]
Patch for Review

Attachment 151834 [details] did not pass win-ews (win):
Output: http://queues.webkit.org/results/13198606
Comment 11 Early Warning System Bot 2012-07-11 20:02:04 PDT
Comment on attachment 151834 [details]
Patch for Review

Attachment 151834 [details] did not pass qt-wk2-ews (qt):
Output: http://queues.webkit.org/results/13198608
Comment 12 Gyuyoung Kim 2012-07-11 20:14:08 PDT
Comment on attachment 151834 [details]
Patch for Review

Attachment 151834 [details] did not pass efl-ews (efl):
Output: http://queues.webkit.org/results/13208534
Comment 13 Build Bot 2012-07-11 20:37:28 PDT
Comment on attachment 151834 [details]
Patch for Review

Attachment 151834 [details] did not pass mac-ews (mac):
Output: http://queues.webkit.org/results/13204550
Comment 14 Build Bot 2012-07-11 21:00:57 PDT
Comment on attachment 151834 [details]
Patch for Review

Attachment 151834 [details] did not pass mac-ews (mac):
Output: http://queues.webkit.org/results/13180676
Comment 15 Gustavo Noronha (kov) 2012-07-11 22:41:11 PDT
Comment on attachment 151834 [details]
Patch for Review

Attachment 151834 [details] did not pass gtk-ews (gtk):
Output: http://queues.webkit.org/results/13203574
Comment 16 Michael Saboff 2012-07-12 10:29:21 PDT
(In reply to comment #1)
> > Much of the UTF-8 tagged resources on the web can be processed as 8-bit data using 8-bit strings.
> 
> This sounds surprising to me. What kind of data do we have to support this?

I was surprised as well.  I suspect that the 8 bit UTF-8 sites are actually all ASCII.

I built an instrumented version of both the Latin-1 and UTF-8 codecs that output whether it found 8 or 16 bit data.  I then went to many common web sites including European (London TImes, Le Monde, Der Spiegel, Spanish and Italian newspapers among others) and Asian news papers for both China and Japan.  I even went to www.haaretz.co.il, the Hebrew language newspaper site from Israel.  I was surprised that far and away the majority of text streams were 8-bit.  It appears that the "16 bit" sites will use many 8 bit files along with some 16 bit files.  My guess is that they are using common JS libraries.  Also, I think it is likely that CSS files are predominately 8 bit.

It also seems that most US based sites are using UTF-8 instead of Latin-1.  Craiglist appears to be exclusively Latin-1 as is www.spiegel.de, where most other common sites are either a mix of UTF-8 and Latin-1 or UTF-8 exclusively.

I did not capture and analyze the number of 8 versus 16 bit, rather it was from watching the 8 vs 16 reporting while visiting many, many sites.
Comment 17 Michael Saboff 2012-07-12 11:14:30 PDT
Created attachment 152006 [details]
Updated Patch with build fixes
Comment 18 Michael Saboff 2012-07-12 11:16:37 PDT
Created attachment 152007 [details]
Combined patches for 90319 and 90320 for EWS with build fix
Comment 19 WebKit Review Bot 2012-07-12 11:20:05 PDT
Comment on attachment 152006 [details]
Updated Patch with build fixes

Attachment 152006 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/13206793
Comment 20 Build Bot 2012-07-12 11:22:24 PDT
Comment on attachment 152006 [details]
Updated Patch with build fixes

Attachment 152006 [details] did not pass mac-ews (mac):
Output: http://queues.webkit.org/results/13208782
Comment 21 Build Bot 2012-07-12 11:40:33 PDT
Comment on attachment 152006 [details]
Updated Patch with build fixes

Attachment 152006 [details] did not pass win-ews (win):
Output: http://queues.webkit.org/results/13200834
Comment 22 Gyuyoung Kim 2012-07-12 11:45:16 PDT
Comment on attachment 152006 [details]
Updated Patch with build fixes

Attachment 152006 [details] did not pass efl-ews (efl):
Output: http://queues.webkit.org/results/13205772
Comment 23 Build Bot 2012-07-12 11:48:14 PDT
Comment on attachment 152007 [details]
Combined patches for 90319 and 90320 for EWS with build fix

Attachment 152007 [details] did not pass mac-ews (mac):
Output: http://queues.webkit.org/results/13204788
Comment 24 WebKit Review Bot 2012-07-12 11:50:12 PDT
Comment on attachment 152007 [details]
Combined patches for 90319 and 90320 for EWS with build fix

Attachment 152007 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/13201771
Comment 25 Build Bot 2012-07-12 11:53:47 PDT
Comment on attachment 152007 [details]
Combined patches for 90319 and 90320 for EWS with build fix

Attachment 152007 [details] did not pass win-ews (win):
Output: http://queues.webkit.org/results/13202773
Comment 26 Early Warning System Bot 2012-07-12 12:34:12 PDT
Comment on attachment 152006 [details]
Updated Patch with build fixes

Attachment 152006 [details] did not pass qt-wk2-ews (qt):
Output: http://queues.webkit.org/results/13198891
Comment 27 Michael Saboff 2012-07-12 12:43:25 PDT
Created attachment 152025 [details]
Really fix the build failures
Comment 28 Michael Saboff 2012-07-12 12:44:19 PDT
Created attachment 152028 [details]
Another Combined patches for 90319 and 90320 for EWS with build fix
Comment 29 WebKit Review Bot 2012-07-12 12:47:23 PDT
Attachment 152028 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WTF/ChangeLog', u'Source/WTF/wtf/te..." exit_code: 1
Source/WebCore/platform/text/TextCodecUTF8.cpp:194:  Should have only a single space after a punctuation in a comment.  [whitespace/comments] [5]
Total errors found: 1 in 8 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 30 WebKit Review Bot 2012-07-12 12:48:35 PDT
Attachment 152025 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1
Source/WebCore/platform/text/TextCodecUTF8.cpp:194:  Should have only a single space after a punctuation in a comment.  [whitespace/comments] [5]
Total errors found: 1 in 3 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 31 Build Bot 2012-07-12 12:54:30 PDT
Comment on attachment 152025 [details]
Really fix the build failures

Attachment 152025 [details] did not pass mac-ews (mac):
Output: http://queues.webkit.org/results/13207738
Comment 32 Build Bot 2012-07-12 13:02:08 PDT
Comment on attachment 152025 [details]
Really fix the build failures

Attachment 152025 [details] did not pass win-ews (win):
Output: http://queues.webkit.org/results/13202792
Comment 33 Gyuyoung Kim 2012-07-12 13:21:42 PDT
Comment on attachment 152025 [details]
Really fix the build failures

Attachment 152025 [details] did not pass efl-ews (efl):
Output: http://queues.webkit.org/results/13202800
Comment 34 WebKit Review Bot 2012-07-12 13:26:12 PDT
Comment on attachment 152025 [details]
Really fix the build failures

Attachment 152025 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/13221345
Comment 35 Gustavo Noronha (kov) 2012-07-12 13:55:35 PDT
Comment on attachment 152025 [details]
Really fix the build failures

Attachment 152025 [details] did not pass gtk-ews (gtk):
Output: http://queues.webkit.org/results/13208830
Comment 36 WebKit Review Bot 2012-07-12 14:13:02 PDT
Comment on attachment 152028 [details]
Another Combined patches for 90319 and 90320 for EWS with build fix

Attachment 152028 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/13180953

New failing tests:
fast/text/international/thai-line-breaks.html
http/tests/incremental/slow-utf8-html.pl
Comment 37 WebKit Review Bot 2012-07-12 14:13:07 PDT
Created attachment 152057 [details]
Archive of layout-test-results from gce-cr-linux-08

The attached test failures were seen while running run-webkit-tests on the chromium-ews.
Bot: gce-cr-linux-08  Port: <class 'webkitpy.common.config.ports.ChromiumXVFBPort'>  Platform: Linux-2.6.39-gcg-201203291735-x86_64-with-Ubuntu-10.04-lucid
Comment 38 Early Warning System Bot 2012-07-12 14:22:28 PDT
Comment on attachment 152025 [details]
Really fix the build failures

Attachment 152025 [details] did not pass qt-wk2-ews (qt):
Output: http://queues.webkit.org/results/13221363
Comment 39 Early Warning System Bot 2012-07-12 14:35:51 PDT
Comment on attachment 152025 [details]
Really fix the build failures

Attachment 152025 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/13182942
Comment 40 WebKit Review Bot 2012-07-12 15:22:21 PDT
Comment on attachment 152028 [details]
Another Combined patches for 90319 and 90320 for EWS with build fix

Attachment 152028 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/13208864

New failing tests:
fast/text/international/thai-line-breaks.html
http/tests/incremental/slow-utf8-html.pl
Comment 41 WebKit Review Bot 2012-07-12 15:22:26 PDT
Created attachment 152078 [details]
Archive of layout-test-results from gce-cr-linux-05

The attached test failures were seen while running run-webkit-tests on the chromium-ews.
Bot: gce-cr-linux-05  Port: <class 'webkitpy.common.config.ports.ChromiumXVFBPort'>  Platform: Linux-2.6.39-gcg-201203291735-x86_64-with-Ubuntu-10.04-lucid
Comment 42 Michael Saboff 2012-07-12 18:19:09 PDT
Created attachment 152121 [details]
Patch with Linux test fix and style fin

Fixed the handling of partial sequences right at the transition from 8 to 16 bit decoding.

Fixed 2 spaces in comment style issue.

Will post another double patch for EWS.
Comment 43 Michael Saboff 2012-07-12 18:20:02 PDT
Created attachment 152122 [details]
Combined patches for 90319 and 90320 for EWS with test and style fixes
Comment 44 Build Bot 2012-07-12 18:45:32 PDT
Comment on attachment 152121 [details]
Patch with Linux test fix and style fin

Attachment 152121 [details] did not pass win-ews (win):
Output: http://queues.webkit.org/results/13203935
Comment 45 Gyuyoung Kim 2012-07-12 19:14:21 PDT
Comment on attachment 152121 [details]
Patch with Linux test fix and style fin

Attachment 152121 [details] did not pass efl-ews (efl):
Output: http://queues.webkit.org/results/13206930
Comment 46 Early Warning System Bot 2012-07-12 19:24:23 PDT
Comment on attachment 152121 [details]
Patch with Linux test fix and style fin

Attachment 152121 [details] did not pass qt-wk2-ews (qt):
Output: http://queues.webkit.org/results/13221465
Comment 47 Gyuyoung Kim 2012-07-12 19:30:28 PDT
Comment on attachment 152122 [details]
Combined patches for 90319 and 90320 for EWS with test and style fixes

Attachment 152122 [details] did not pass efl-ews (efl):
Output: http://queues.webkit.org/results/13202910
Comment 48 Early Warning System Bot 2012-07-12 19:34:35 PDT
Comment on attachment 152121 [details]
Patch with Linux test fix and style fin

Attachment 152121 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/13221467
Comment 49 Early Warning System Bot 2012-07-12 19:46:50 PDT
Comment on attachment 152122 [details]
Combined patches for 90319 and 90320 for EWS with test and style fixes

Attachment 152122 [details] did not pass qt-wk2-ews (qt):
Output: http://queues.webkit.org/results/13205901
Comment 50 Early Warning System Bot 2012-07-12 19:57:21 PDT
Comment on attachment 152122 [details]
Combined patches for 90319 and 90320 for EWS with test and style fixes

Attachment 152122 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/13199949
Comment 51 WebKit Review Bot 2012-07-12 19:57:29 PDT
Comment on attachment 152121 [details]
Patch with Linux test fix and style fin

Attachment 152121 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/13232048
Comment 52 WebKit Review Bot 2012-07-12 20:32:10 PDT
Comment on attachment 152122 [details]
Combined patches for 90319 and 90320 for EWS with test and style fixes

Attachment 152122 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/13204957
Comment 53 Build Bot 2012-07-12 20:42:33 PDT
Comment on attachment 152121 [details]
Patch with Linux test fix and style fin

Attachment 152121 [details] did not pass mac-ews (mac):
Output: http://queues.webkit.org/results/13199957
Comment 54 Michael Saboff 2012-07-13 10:37:26 PDT
Created attachment 152293 [details]
Patch with leftover fprintf removed
Comment 55 Michael Saboff 2012-07-13 10:38:11 PDT
Created attachment 152295 [details]
Combined patches for 90319 and 90320 for EWS with leftover fprintf removed
Comment 56 Build Bot 2012-07-13 10:53:24 PDT
Comment on attachment 152293 [details]
Patch with leftover fprintf removed

Attachment 152293 [details] did not pass mac-ews (mac):
Output: http://queues.webkit.org/results/13236197
Comment 57 Build Bot 2012-07-13 10:56:25 PDT
Comment on attachment 152293 [details]
Patch with leftover fprintf removed

Attachment 152293 [details] did not pass win-ews (win):
Output: http://queues.webkit.org/results/13241162
Comment 58 WebKit Review Bot 2012-07-13 11:04:00 PDT
Comment on attachment 152293 [details]
Patch with leftover fprintf removed

Attachment 152293 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/13243145
Comment 59 Early Warning System Bot 2012-07-13 11:23:06 PDT
Comment on attachment 152293 [details]
Patch with leftover fprintf removed

Attachment 152293 [details] did not pass qt-wk2-ews (qt):
Output: http://queues.webkit.org/results/13235212
Comment 60 Gyuyoung Kim 2012-07-13 11:31:17 PDT
Comment on attachment 152293 [details]
Patch with leftover fprintf removed

Attachment 152293 [details] did not pass efl-ews (efl):
Output: http://queues.webkit.org/results/13232317
Comment 61 Early Warning System Bot 2012-07-13 11:41:22 PDT
Comment on attachment 152293 [details]
Patch with leftover fprintf removed

Attachment 152293 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/13232319
Comment 62 Michael Saboff 2012-07-18 13:28:53 PDT
Committed r123011: <http://trac.webkit.org/changeset/123011>