WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
90320
Make TextCodecUTF8 handle 8 bit data without converting to UChar's
https://bugs.webkit.org/show_bug.cgi?id=90320
Summary
Make TextCodecUTF8 handle 8 bit data without converting to UChar's
Michael Saboff
Reported
2012-06-29 15:50:37 PDT
Much of the UTF-8 tagged resources on the web can be processed as 8-bit data using 8-bit strings. The task is to modify TextCodecUTF8 and related code to return strings appropriate for the source data.
Attachments
Patch for Review
(14.08 KB, patch)
2012-07-11 18:20 PDT
,
Michael Saboff
webkit.review.bot
: commit-queue-
Details
Formatted Diff
Diff
Combined patched for 90319 and 90320 for EWS
(21.49 KB, patch)
2012-07-11 18:21 PDT
,
Michael Saboff
webkit.review.bot
: commit-queue-
Details
Formatted Diff
Diff
Updated Patch with build fixes
(12.53 KB, patch)
2012-07-12 11:14 PDT
,
Michael Saboff
webkit.review.bot
: commit-queue-
Details
Formatted Diff
Diff
Combined patches for 90319 and 90320 for EWS with build fix
(20.46 KB, patch)
2012-07-12 11:16 PDT
,
Michael Saboff
buildbot
: commit-queue-
Details
Formatted Diff
Diff
Really fix the build failures
(11.83 KB, patch)
2012-07-12 12:43 PDT
,
Michael Saboff
buildbot
: commit-queue-
Details
Formatted Diff
Diff
Another Combined patches for 90319 and 90320 for EWS with build fix
(19.76 KB, patch)
2012-07-12 12:44 PDT
,
Michael Saboff
webkit.review.bot
: commit-queue-
Details
Formatted Diff
Diff
Archive of layout-test-results from gce-cr-linux-08
(351.39 KB, application/zip)
2012-07-12 14:13 PDT
,
WebKit Review Bot
no flags
Details
Archive of layout-test-results from gce-cr-linux-05
(457.08 KB, application/zip)
2012-07-12 15:22 PDT
,
WebKit Review Bot
no flags
Details
Patch with Linux test fix and style fin
(11.94 KB, patch)
2012-07-12 18:19 PDT
,
Michael Saboff
buildbot
: commit-queue-
Details
Formatted Diff
Diff
Combined patches for 90319 and 90320 for EWS with test and style fixes
(19.87 KB, patch)
2012-07-12 18:20 PDT
,
Michael Saboff
gyuyoung.kim
: commit-queue-
Details
Formatted Diff
Diff
Patch with leftover fprintf removed
(11.80 KB, patch)
2012-07-13 10:37 PDT
,
Michael Saboff
oliver
: review+
buildbot
: commit-queue-
Details
Formatted Diff
Diff
Combined patches for 90319 and 90320 for EWS with leftover fprintf removed
(19.73 KB, patch)
2012-07-13 10:38 PDT
,
Michael Saboff
no flags
Details
Formatted Diff
Diff
Show Obsolete
(8)
View All
Add attachment
proposed patch, testcase, etc.
Alexey Proskuryakov
Comment 1
2012-06-30 09:48:56 PDT
> Much of the UTF-8 tagged resources on the web can be processed as 8-bit data using 8-bit strings.
This sounds surprising to me. What kind of data do we have to support this?
Michael Saboff
Comment 2
2012-07-11 18:20:35 PDT
Created
attachment 151834
[details]
Patch for Review This patch depends on the changes for
https://bugs.webkit.org/show_bug.cgi?id=90319
. I will post a separate patch that contains both 90319 and this patch for the bots to check.
Michael Saboff
Comment 3
2012-07-11 18:21:17 PDT
Created
attachment 151835
[details]
Combined patched for 90319 and 90320 for EWS
WebKit Review Bot
Comment 4
2012-07-11 18:53:33 PDT
Comment on
attachment 151835
[details]
Combined patched for 90319 and 90320 for EWS
Attachment 151835
[details]
did not pass chromium-ews (chromium-xvfb): Output:
http://queues.webkit.org/results/13200556
Build Bot
Comment 5
2012-07-11 19:14:43 PDT
Comment on
attachment 151835
[details]
Combined patched for 90319 and 90320 for EWS
Attachment 151835
[details]
did not pass mac-ews (mac): Output:
http://queues.webkit.org/results/13180648
WebKit Review Bot
Comment 6
2012-07-11 19:20:12 PDT
Comment on
attachment 151834
[details]
Patch for Review
Attachment 151834
[details]
did not pass chromium-ews (chromium-xvfb): Output:
http://queues.webkit.org/results/13180649
Early Warning System Bot
Comment 7
2012-07-11 19:22:52 PDT
Comment on
attachment 151835
[details]
Combined patched for 90319 and 90320 for EWS
Attachment 151835
[details]
did not pass qt-wk2-ews (qt): Output:
http://queues.webkit.org/results/13198600
Build Bot
Comment 8
2012-07-11 19:33:39 PDT
Comment on
attachment 151835
[details]
Combined patched for 90319 and 90320 for EWS
Attachment 151835
[details]
did not pass win-ews (win): Output:
http://queues.webkit.org/results/13180655
Early Warning System Bot
Comment 9
2012-07-11 19:45:26 PDT
Comment on
attachment 151835
[details]
Combined patched for 90319 and 90320 for EWS
Attachment 151835
[details]
did not pass qt-ews (qt): Output:
http://queues.webkit.org/results/13209553
Build Bot
Comment 10
2012-07-11 19:56:03 PDT
Comment on
attachment 151834
[details]
Patch for Review
Attachment 151834
[details]
did not pass win-ews (win): Output:
http://queues.webkit.org/results/13198606
Early Warning System Bot
Comment 11
2012-07-11 20:02:04 PDT
Comment on
attachment 151834
[details]
Patch for Review
Attachment 151834
[details]
did not pass qt-wk2-ews (qt): Output:
http://queues.webkit.org/results/13198608
Gyuyoung Kim
Comment 12
2012-07-11 20:14:08 PDT
Comment on
attachment 151834
[details]
Patch for Review
Attachment 151834
[details]
did not pass efl-ews (efl): Output:
http://queues.webkit.org/results/13208534
Build Bot
Comment 13
2012-07-11 20:37:28 PDT
Comment on
attachment 151834
[details]
Patch for Review
Attachment 151834
[details]
did not pass mac-ews (mac): Output:
http://queues.webkit.org/results/13204550
Build Bot
Comment 14
2012-07-11 21:00:57 PDT
Comment on
attachment 151834
[details]
Patch for Review
Attachment 151834
[details]
did not pass mac-ews (mac): Output:
http://queues.webkit.org/results/13180676
Gustavo Noronha (kov)
Comment 15
2012-07-11 22:41:11 PDT
Comment on
attachment 151834
[details]
Patch for Review
Attachment 151834
[details]
did not pass gtk-ews (gtk): Output:
http://queues.webkit.org/results/13203574
Michael Saboff
Comment 16
2012-07-12 10:29:21 PDT
(In reply to
comment #1
)
> > Much of the UTF-8 tagged resources on the web can be processed as 8-bit data using 8-bit strings. > > This sounds surprising to me. What kind of data do we have to support this?
I was surprised as well. I suspect that the 8 bit UTF-8 sites are actually all ASCII. I built an instrumented version of both the Latin-1 and UTF-8 codecs that output whether it found 8 or 16 bit data. I then went to many common web sites including European (London TImes, Le Monde, Der Spiegel, Spanish and Italian newspapers among others) and Asian news papers for both China and Japan. I even went to www.haaretz.co.il, the Hebrew language newspaper site from Israel. I was surprised that far and away the majority of text streams were 8-bit. It appears that the "16 bit" sites will use many 8 bit files along with some 16 bit files. My guess is that they are using common JS libraries. Also, I think it is likely that CSS files are predominately 8 bit. It also seems that most US based sites are using UTF-8 instead of Latin-1. Craiglist appears to be exclusively Latin-1 as is www.spiegel.de, where most other common sites are either a mix of UTF-8 and Latin-1 or UTF-8 exclusively. I did not capture and analyze the number of 8 versus 16 bit, rather it was from watching the 8 vs 16 reporting while visiting many, many sites.
Michael Saboff
Comment 17
2012-07-12 11:14:30 PDT
Created
attachment 152006
[details]
Updated Patch with build fixes
Michael Saboff
Comment 18
2012-07-12 11:16:37 PDT
Created
attachment 152007
[details]
Combined patches for 90319 and 90320 for EWS with build fix
WebKit Review Bot
Comment 19
2012-07-12 11:20:05 PDT
Comment on
attachment 152006
[details]
Updated Patch with build fixes
Attachment 152006
[details]
did not pass chromium-ews (chromium-xvfb): Output:
http://queues.webkit.org/results/13206793
Build Bot
Comment 20
2012-07-12 11:22:24 PDT
Comment on
attachment 152006
[details]
Updated Patch with build fixes
Attachment 152006
[details]
did not pass mac-ews (mac): Output:
http://queues.webkit.org/results/13208782
Build Bot
Comment 21
2012-07-12 11:40:33 PDT
Comment on
attachment 152006
[details]
Updated Patch with build fixes
Attachment 152006
[details]
did not pass win-ews (win): Output:
http://queues.webkit.org/results/13200834
Gyuyoung Kim
Comment 22
2012-07-12 11:45:16 PDT
Comment on
attachment 152006
[details]
Updated Patch with build fixes
Attachment 152006
[details]
did not pass efl-ews (efl): Output:
http://queues.webkit.org/results/13205772
Build Bot
Comment 23
2012-07-12 11:48:14 PDT
Comment on
attachment 152007
[details]
Combined patches for 90319 and 90320 for EWS with build fix
Attachment 152007
[details]
did not pass mac-ews (mac): Output:
http://queues.webkit.org/results/13204788
WebKit Review Bot
Comment 24
2012-07-12 11:50:12 PDT
Comment on
attachment 152007
[details]
Combined patches for 90319 and 90320 for EWS with build fix
Attachment 152007
[details]
did not pass chromium-ews (chromium-xvfb): Output:
http://queues.webkit.org/results/13201771
Build Bot
Comment 25
2012-07-12 11:53:47 PDT
Comment on
attachment 152007
[details]
Combined patches for 90319 and 90320 for EWS with build fix
Attachment 152007
[details]
did not pass win-ews (win): Output:
http://queues.webkit.org/results/13202773
Early Warning System Bot
Comment 26
2012-07-12 12:34:12 PDT
Comment on
attachment 152006
[details]
Updated Patch with build fixes
Attachment 152006
[details]
did not pass qt-wk2-ews (qt): Output:
http://queues.webkit.org/results/13198891
Michael Saboff
Comment 27
2012-07-12 12:43:25 PDT
Created
attachment 152025
[details]
Really fix the build failures
Michael Saboff
Comment 28
2012-07-12 12:44:19 PDT
Created
attachment 152028
[details]
Another Combined patches for 90319 and 90320 for EWS with build fix
WebKit Review Bot
Comment 29
2012-07-12 12:47:23 PDT
Attachment 152028
[details]
did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WTF/ChangeLog', u'Source/WTF/wtf/te..." exit_code: 1 Source/WebCore/platform/text/TextCodecUTF8.cpp:194: Should have only a single space after a punctuation in a comment. [whitespace/comments] [5] Total errors found: 1 in 8 files If any of these errors are false positives, please file a bug against check-webkit-style.
WebKit Review Bot
Comment 30
2012-07-12 12:48:35 PDT
Attachment 152025
[details]
did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1 Source/WebCore/platform/text/TextCodecUTF8.cpp:194: Should have only a single space after a punctuation in a comment. [whitespace/comments] [5] Total errors found: 1 in 3 files If any of these errors are false positives, please file a bug against check-webkit-style.
Build Bot
Comment 31
2012-07-12 12:54:30 PDT
Comment on
attachment 152025
[details]
Really fix the build failures
Attachment 152025
[details]
did not pass mac-ews (mac): Output:
http://queues.webkit.org/results/13207738
Build Bot
Comment 32
2012-07-12 13:02:08 PDT
Comment on
attachment 152025
[details]
Really fix the build failures
Attachment 152025
[details]
did not pass win-ews (win): Output:
http://queues.webkit.org/results/13202792
Gyuyoung Kim
Comment 33
2012-07-12 13:21:42 PDT
Comment on
attachment 152025
[details]
Really fix the build failures
Attachment 152025
[details]
did not pass efl-ews (efl): Output:
http://queues.webkit.org/results/13202800
WebKit Review Bot
Comment 34
2012-07-12 13:26:12 PDT
Comment on
attachment 152025
[details]
Really fix the build failures
Attachment 152025
[details]
did not pass chromium-ews (chromium-xvfb): Output:
http://queues.webkit.org/results/13221345
Gustavo Noronha (kov)
Comment 35
2012-07-12 13:55:35 PDT
Comment on
attachment 152025
[details]
Really fix the build failures
Attachment 152025
[details]
did not pass gtk-ews (gtk): Output:
http://queues.webkit.org/results/13208830
WebKit Review Bot
Comment 36
2012-07-12 14:13:02 PDT
Comment on
attachment 152028
[details]
Another Combined patches for 90319 and 90320 for EWS with build fix
Attachment 152028
[details]
did not pass chromium-ews (chromium-xvfb): Output:
http://queues.webkit.org/results/13180953
New failing tests: fast/text/international/thai-line-breaks.html http/tests/incremental/slow-utf8-html.pl
WebKit Review Bot
Comment 37
2012-07-12 14:13:07 PDT
Created
attachment 152057
[details]
Archive of layout-test-results from gce-cr-linux-08 The attached test failures were seen while running run-webkit-tests on the chromium-ews. Bot: gce-cr-linux-08 Port: <class 'webkitpy.common.config.ports.ChromiumXVFBPort'> Platform: Linux-2.6.39-gcg-201203291735-x86_64-with-Ubuntu-10.04-lucid
Early Warning System Bot
Comment 38
2012-07-12 14:22:28 PDT
Comment on
attachment 152025
[details]
Really fix the build failures
Attachment 152025
[details]
did not pass qt-wk2-ews (qt): Output:
http://queues.webkit.org/results/13221363
Early Warning System Bot
Comment 39
2012-07-12 14:35:51 PDT
Comment on
attachment 152025
[details]
Really fix the build failures
Attachment 152025
[details]
did not pass qt-ews (qt): Output:
http://queues.webkit.org/results/13182942
WebKit Review Bot
Comment 40
2012-07-12 15:22:21 PDT
Comment on
attachment 152028
[details]
Another Combined patches for 90319 and 90320 for EWS with build fix
Attachment 152028
[details]
did not pass chromium-ews (chromium-xvfb): Output:
http://queues.webkit.org/results/13208864
New failing tests: fast/text/international/thai-line-breaks.html http/tests/incremental/slow-utf8-html.pl
WebKit Review Bot
Comment 41
2012-07-12 15:22:26 PDT
Created
attachment 152078
[details]
Archive of layout-test-results from gce-cr-linux-05 The attached test failures were seen while running run-webkit-tests on the chromium-ews. Bot: gce-cr-linux-05 Port: <class 'webkitpy.common.config.ports.ChromiumXVFBPort'> Platform: Linux-2.6.39-gcg-201203291735-x86_64-with-Ubuntu-10.04-lucid
Michael Saboff
Comment 42
2012-07-12 18:19:09 PDT
Created
attachment 152121
[details]
Patch with Linux test fix and style fin Fixed the handling of partial sequences right at the transition from 8 to 16 bit decoding. Fixed 2 spaces in comment style issue. Will post another double patch for EWS.
Michael Saboff
Comment 43
2012-07-12 18:20:02 PDT
Created
attachment 152122
[details]
Combined patches for 90319 and 90320 for EWS with test and style fixes
Build Bot
Comment 44
2012-07-12 18:45:32 PDT
Comment on
attachment 152121
[details]
Patch with Linux test fix and style fin
Attachment 152121
[details]
did not pass win-ews (win): Output:
http://queues.webkit.org/results/13203935
Gyuyoung Kim
Comment 45
2012-07-12 19:14:21 PDT
Comment on
attachment 152121
[details]
Patch with Linux test fix and style fin
Attachment 152121
[details]
did not pass efl-ews (efl): Output:
http://queues.webkit.org/results/13206930
Early Warning System Bot
Comment 46
2012-07-12 19:24:23 PDT
Comment on
attachment 152121
[details]
Patch with Linux test fix and style fin
Attachment 152121
[details]
did not pass qt-wk2-ews (qt): Output:
http://queues.webkit.org/results/13221465
Gyuyoung Kim
Comment 47
2012-07-12 19:30:28 PDT
Comment on
attachment 152122
[details]
Combined patches for 90319 and 90320 for EWS with test and style fixes
Attachment 152122
[details]
did not pass efl-ews (efl): Output:
http://queues.webkit.org/results/13202910
Early Warning System Bot
Comment 48
2012-07-12 19:34:35 PDT
Comment on
attachment 152121
[details]
Patch with Linux test fix and style fin
Attachment 152121
[details]
did not pass qt-ews (qt): Output:
http://queues.webkit.org/results/13221467
Early Warning System Bot
Comment 49
2012-07-12 19:46:50 PDT
Comment on
attachment 152122
[details]
Combined patches for 90319 and 90320 for EWS with test and style fixes
Attachment 152122
[details]
did not pass qt-wk2-ews (qt): Output:
http://queues.webkit.org/results/13205901
Early Warning System Bot
Comment 50
2012-07-12 19:57:21 PDT
Comment on
attachment 152122
[details]
Combined patches for 90319 and 90320 for EWS with test and style fixes
Attachment 152122
[details]
did not pass qt-ews (qt): Output:
http://queues.webkit.org/results/13199949
WebKit Review Bot
Comment 51
2012-07-12 19:57:29 PDT
Comment on
attachment 152121
[details]
Patch with Linux test fix and style fin
Attachment 152121
[details]
did not pass chromium-ews (chromium-xvfb): Output:
http://queues.webkit.org/results/13232048
WebKit Review Bot
Comment 52
2012-07-12 20:32:10 PDT
Comment on
attachment 152122
[details]
Combined patches for 90319 and 90320 for EWS with test and style fixes
Attachment 152122
[details]
did not pass chromium-ews (chromium-xvfb): Output:
http://queues.webkit.org/results/13204957
Build Bot
Comment 53
2012-07-12 20:42:33 PDT
Comment on
attachment 152121
[details]
Patch with Linux test fix and style fin
Attachment 152121
[details]
did not pass mac-ews (mac): Output:
http://queues.webkit.org/results/13199957
Michael Saboff
Comment 54
2012-07-13 10:37:26 PDT
Created
attachment 152293
[details]
Patch with leftover fprintf removed
Michael Saboff
Comment 55
2012-07-13 10:38:11 PDT
Created
attachment 152295
[details]
Combined patches for 90319 and 90320 for EWS with leftover fprintf removed
Build Bot
Comment 56
2012-07-13 10:53:24 PDT
Comment on
attachment 152293
[details]
Patch with leftover fprintf removed
Attachment 152293
[details]
did not pass mac-ews (mac): Output:
http://queues.webkit.org/results/13236197
Build Bot
Comment 57
2012-07-13 10:56:25 PDT
Comment on
attachment 152293
[details]
Patch with leftover fprintf removed
Attachment 152293
[details]
did not pass win-ews (win): Output:
http://queues.webkit.org/results/13241162
WebKit Review Bot
Comment 58
2012-07-13 11:04:00 PDT
Comment on
attachment 152293
[details]
Patch with leftover fprintf removed
Attachment 152293
[details]
did not pass chromium-ews (chromium-xvfb): Output:
http://queues.webkit.org/results/13243145
Early Warning System Bot
Comment 59
2012-07-13 11:23:06 PDT
Comment on
attachment 152293
[details]
Patch with leftover fprintf removed
Attachment 152293
[details]
did not pass qt-wk2-ews (qt): Output:
http://queues.webkit.org/results/13235212
Gyuyoung Kim
Comment 60
2012-07-13 11:31:17 PDT
Comment on
attachment 152293
[details]
Patch with leftover fprintf removed
Attachment 152293
[details]
did not pass efl-ews (efl): Output:
http://queues.webkit.org/results/13232317
Early Warning System Bot
Comment 61
2012-07-13 11:41:22 PDT
Comment on
attachment 152293
[details]
Patch with leftover fprintf removed
Attachment 152293
[details]
did not pass qt-ews (qt): Output:
http://queues.webkit.org/results/13232319
Michael Saboff
Comment 62
2012-07-18 13:28:53 PDT
Committed
r123011
: <
http://trac.webkit.org/changeset/123011
>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug