Bug 71316 - decodeEscapeSequences() not correct for some encodings (GBK, Big5, ...).
Summary: decodeEscapeSequences() not correct for some encodings (GBK, Big5, ...).
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKit Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Thomas Sepez
URL:
Keywords: XSSAuditor
Depends on: 76896
Blocks:
  Show dependency treegraph
 
Reported: 2011-11-01 13:49 PDT by Thomas Sepez
Modified: 2012-01-23 23:42 PST (History)
7 users (show)

See Also:


Attachments
Layout Tests (2.35 KB, patch)
2011-11-07 22:34 PST, Daniel Bates
no flags Details | Formatted Diff | Diff
Work-in-progress patch (6.77 KB, patch)
2011-11-13 23:39 PST, Daniel Bates
no flags Details | Formatted Diff | Diff
Work-in-progress patch (6.64 KB, patch)
2011-11-13 23:42 PST, Daniel Bates
no flags Details | Formatted Diff | Diff
Work-in-progress patch (12.22 KB, patch)
2011-12-01 22:35 PST, Daniel Bates
no flags Details | Formatted Diff | Diff
Proposed Patch + Daniel's layout tests. (18.21 KB, patch)
2012-01-13 11:12 PST, Thomas Sepez
no flags Details | Formatted Diff | Diff
Patch, tests, fix changelog. (18.21 KB, patch)
2012-01-13 11:20 PST, Thomas Sepez
no flags Details | Formatted Diff | Diff
Patch, tests, fix changelog again. (18.21 KB, patch)
2012-01-13 11:21 PST, Thomas Sepez
ap: review-
Details | Formatted Diff | Diff
Patch plus test KURL decodeEscapeSequence path. (22.47 KB, patch)
2012-01-18 14:41 PST, Thomas Sepez
webkit.review.bot: commit-queue-
Details | Formatted Diff | Diff
Patch, KURL, KURLGoogle different (21.67 KB, patch)
2012-01-19 12:50 PST, Thomas Sepez
no flags Details | Formatted Diff | Diff
Patch (21.67 KB, patch)
2012-01-19 12:58 PST, Thomas Sepez
no flags Details | Formatted Diff | Diff
New patch for review. (23.20 KB, patch)
2012-01-23 12:22 PST, Thomas Sepez
dbates: review+
dbates: commit-queue-
Details | Formatted Diff | Diff
Patch w/nits fixed. (23.22 KB, patch)
2012-01-23 14:37 PST, Thomas Sepez
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas Sepez 2011-11-01 13:49:56 PDT
Upstreamed from XSS filter bypass in http://code.google.com/p/chromium/issues/detail?id=101905

Reported by kuz...@gmail.com, Oct 27 (5 days ago)
test chrome 15.0.874.106 m windows xp sp3

echo.php
====
<?php
echo $_GET['q'];
?>
http://127.0.0.1/echo.php?q=%3Cscript%20%89g%3Ealert(location);%3C/script%3E

%F3Z
%89g
%82a
Comment 1 by kuz...@gmail.com, Oct 27 (5 days ago)
http://127.0.0.1/echo.php?q=%3Cimg%20src=%201%20onerror=%C7Ojavascript:alert(document.domain)%3E%3C/iframe%3E'

http://127.0.0.1/echo.php?q=%3Ciframe%20onload=%C7Ojavascript:alert(document.domain)%3E%3C/iframe%3E

NOTE: requires GBK character set.

... 

Comment 11 by tse...@chromium.org, Today (28 minutes ago)
Looks like one needs an encoding where the multibyte sequence can include low order bytes, and that our decoder is only processing the %xx portion of the string apart from its trailing bytes, eg.

a) %C7%4F decodes to single big5 char and is removed.
b) [C7]O binary byte sequence in the page decodes to single big5 char and is removed.
c) %C7O decodes to [bad-big5-char]O and the O persists.  

Page is in form (b) but the URL is in form (c). Re-specifying the URL in form (a) catches the XSS in the auditor.

Problem is in WebKit/Source/WebCore/platform/text/DecodeEscapeSequences.h.
Template function decodeStandardURLEscapeSequences() only applies decodeRun() to consecutive %-style escapes, leaving off the trailing 'O'.
Comment 1 Daniel Bates 2011-11-01 13:55:22 PDT
Will look into this issue shortly.
Comment 2 Daniel Bates 2011-11-07 22:34:24 PST
Created attachment 113992 [details]
Layout Tests

Some DRT layout tests. Will work on the patch and look to add additional tests tomorrow.
Comment 3 Daniel Bates 2011-11-13 23:39:26 PST
Created attachment 114889 [details]
Work-in-progress patch

Work-in-progress patch.

Need to think about this some more.
Comment 4 Daniel Bates 2011-11-13 23:42:57 PST
Created attachment 114890 [details]
Work-in-progress patch

Removed extraneous whitespace.
Comment 5 Daniel Bates 2011-12-01 22:35:49 PST
Created attachment 117569 [details]
Work-in-progress patch

Added some more tests. This patch fails the following added tests:

http/tests/security/xssAuditor/script-tag-Big5-char.html
http/tests/security/xssAuditor/script-tag-Big5-char-twice-url-encode.html
http/tests/security/xssAuditor/script-tag-Big5-char-twice-url-encode-16bit-unicode.html
Comment 6 Thomas Sepez 2012-01-11 12:00:54 PST
Daniel, you still working on this?  Would you like me to take a shot at it?

I believe the error is actually in decodeEscapeSequences, in that it tries to be efficient and call the decoder only on the span of consecutive %-values.  But trailing characters may be important to the decoder, hence the bug.

I think the thing to do is to convert all escape sequences first, then decode starting at the first byte > 0x80 to the last (byte > 0x80) + N characters, where N = 4 to account for the longest multibyte run I can imagine.
Comment 7 Daniel Bates 2012-01-11 12:32:30 PST
(In reply to comment #6)
> Daniel, you still working on this?  

Not at the moment.

> Would you like me to take a shot at it?

Sure. Feel free to take this bug and work on it.
Comment 8 Thomas Sepez 2012-01-13 11:12:32 PST
Created attachment 122468 [details]
Proposed Patch + Daniel's layout tests.

Adam, Daniel, please review carefully.
Comment 9 Thomas Sepez 2012-01-13 11:20:14 PST
Created attachment 122470 [details]
Patch, tests, fix changelog.
Comment 10 Thomas Sepez 2012-01-13 11:21:32 PST
Created attachment 122472 [details]
Patch, tests, fix changelog again.
Comment 11 Alexey Proskuryakov 2012-01-13 11:43:03 PST
Comment on attachment 122472 [details]
Patch, tests, fix changelog again.

decodeEscapeSequences() is used in many other places in WebCore besides XSS Auditor. How did you verify that the changes there are compatible with other browsers? Please add tests that directly involve this function as part of KURL accessors, ScriptController::executeIfJavaScriptURL(), etc.

I'm really confused about the approach here. Why does a simple function that decodes percent escapes need to know about encodings? That looks like a layering violation.

I think that the actual bug here is that we flush the decoder at the end of decodeRun() function. We should just pass all text through it, unless it can be demonstrated to be a performance problem.

r- for lack of tests primarily, although the fix may need to be reconsidered, too.
Comment 12 Thomas Sepez 2012-01-13 12:13:47 PST
(In reply to comment #11)
> (From update of attachment 122472 [details])
> decodeEscapeSequences() is used in many other places in WebCore besides XSS Auditor. How did you verify that the changes there are compatible with other browsers? Please add tests that directly involve this function as part of KURL accessors, ScriptController::executeIfJavaScriptURL(), etc.

Sure. Are there any existing tests that do this which could be adapted?

> 
> I'm really confused about the approach here. Why does a simple function that decodes percent escapes need to know about encodings? That looks like a layering violation.
> 

Because it is returning a String of 16 bit unicode codepoints, not a string of codepoints whose values correspond to the individual byte values in the decoded sequence.  KURL.cpp used to contain the decodeURL function directly, and it has been expecting this behaviour (i.e. calling a decoder) going back as far as I looked in source control. 

It gets messy to separate the layers, since the input is itself a 16 bit string, and I didn't want to assume that its values were always fitting in the bottom byte.  To separate this out, and keep the current behaviour, you'd have to take the 16bit input, re-encode it according to whatever encoder is present into a byte stream, unescape the %-signs into bytes, then decode it back using the encoder into the 16 bit form.  Besides the overhead, its not clear if there is an encoder object everywhere that it would need to be if you wanted to pull this off.


> I think that the actual bug here is that we flush the decoder at the end of decodeRun() function. We should just pass all text through it, unless it can be demonstrated to be a performance problem.

Yes, you could flush all text through it- if it were 8-bit.  So there's a conversion that has to happen first. The existing implementation goes through pains not to do this, so I wasn't willing to take that approach -- suspecting that the original implementors knew what they were doing and not undoing their work.  

> 
> r- for lack of tests primarily, although the fix may need to be reconsidered, too.
Comment 13 Alexey Proskuryakov 2012-01-13 12:26:36 PST
> Sure. Are there any existing tests that do this which could be adapted?

Not anything specifically, but such tests should be straightforward to make. For the cases I mentioned,

<a href="http://%..."></a>
<script>shouldBe("a.href.hostname", "...")</script>

<a href="javascript:shouldBe('%...', '...')</a>
<script>... click the link your eventSender</script>

> Sure. Are there any existing tests that do this which could be adapted?

CC'ing Darin, who may have an idea about how to best structure the code.
Comment 14 Thomas Sepez 2012-01-13 14:51:03 PST
(In reply to comment #13)
> > Sure. Are there any existing tests that do this which could be adapted?
> 
> Not anything specifically, but such tests should be straightforward to make. For the cases I mentioned,
> 
> <a href="http://%..."></a>
> <script>shouldBe("a.href.hostname", "...")</script>
> 
> <a href="javascript:shouldBe('%...', '...')</a>
> <script>... click the link your eventSender</script>
> 
> > Sure. Are there any existing tests that do this which could be adapted?
> 
> CC'ing Darin, who may have an idea about how to best structure the code.

Wouldn't the results in LayoutTests/fast/url/host-expected.txt cover these cases?
Comment 15 Alexey Proskuryakov 2012-01-13 14:57:29 PST
For hostname, adding subtests to this test should be enough. Since you are changing behavior, we need new tests (or new results, if there are already failing subtests)
Comment 16 Thomas Sepez 2012-01-13 15:04:09 PST
Ah.  Just so we're on the same page, the behaviour should be absolutely identical to the present except when running under Big5 or GBK, and covered by the existing cases.  So it looks like I'll need to find a place where KURL, not XSSAuditor, gets invoked with one of these decoders and show that the result changes?
Comment 17 Alexey Proskuryakov 2012-01-13 15:05:48 PST
Yes.
Comment 18 Thomas Sepez 2012-01-18 14:41:46 PST
Created attachment 123001 [details]
Patch plus test KURL decodeEscapeSequence path.

Hi AP,

  Its surprisingly hard to find a path that calls into the two-argument form (non-utf8 decoder) of KURLs decodeURLEscapeSequences, which is why this may have gone unnoticed, but one place that does it is when we scroll to a fragment.  The new frame scrolling test fails without this patch.
Comment 19 WebKit Review Bot 2012-01-18 17:57:37 PST
Comment on attachment 123001 [details]
Patch plus test KURL decodeEscapeSequence path.

Attachment 123001 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/11284449

New failing tests:
http/tests/navigation/anchor-frames-gbk.html
Comment 20 Thomas Sepez 2012-01-19 12:50:57 PST
Created attachment 123178 [details]
Patch, KURL, KURLGoogle different
Comment 21 WebKit Review Bot 2012-01-19 12:54:57 PST
Attachment 123178 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'LayoutTests/ChangeLog', u'LayoutTests/http..." exit_code: 1

ERROR: FAILURES FOR <lucid, x86_64, release, cpu> in /mnt/git/webkit-style-queue/LayoutTests/platform/chromium/test_expectations.txt
ERROR: Line:1946 Missing expectations BUGWK20559 : http/tests/navigation/anchor-frames-gbk.html
LayoutTests/platform/chromium/test_expectations.txt:1946:  Missing expectations BUGWK20559 : http/tests/navigation/anchor-frames-gbk.html  [test/expectations] [5]
Total errors found: 1 in 19 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 22 Thomas Sepez 2012-01-19 12:58:46 PST
Created attachment 123182 [details]
Patch
Comment 23 Thomas Sepez 2012-01-23 09:55:00 PST
Adam, Daniel, AP - could I get a review on this?  Thanks Heaps.
Comment 24 Alexey Proskuryakov 2012-01-23 10:31:25 PST
Comment on attachment 123182 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=123182&action=review

> Source/WebCore/platform/text/DecodeEscapeSequences.h:43
> +    enum { sequenceSize = 6 }; // e.g. %u26C4

This patch doesn't have a WebCore ChangeLog. What are UTF-16 changes for?

> Source/WebCore/platform/text/DecodeEscapeSequences.h:85
> +        // We need to handle the possibility that the encodings may be Big5/GBK, which have values 
> +        // in the range 0x40 - 0x7F as the second byte of their sequences. These need to be passed
> +        // to the decoder as part of the run to get the correct output (e.g. the run %XXc%XX%XX%XXccc
> +        // should be decoded as %XXc%XX%XX%XXc followed by cc, where X is a hex digit and c is a character
> +        // in the range 0x40 - 0x7f). We approximate this by stopping the at the first out of range
> +        // value, at a second consecutive in-range value, or at a %-sign that does not introduce a
> +        // valid sequence. Other encodings are still correct because the decoder will pass on the
> +        // in-range characters unchanged.

I find it difficult to understand this comment. It it saying that we add one additional character to the percent escaped range? But we may need at least two - GBK has sequences that have two ASCII bytes, e.g. 814e42 for 乂.

> LayoutTests/ChangeLog:9
> +        * http/tests/navigation/anchor-frames-gbk-expected.txt: Added.
> +        * http/tests/navigation/anchor-frames-gbk.html: Added.

I see, it's indeed difficult to find non-XSS code paths executing this. Is it a bug that we don't pass encoding elsewhere, or correct behavior?
Comment 25 Alexey Proskuryakov 2012-01-23 10:35:35 PST
I now think that if the only knowledge about encodings is that we need to extend escaped sequences by two characters, then it's probably OK. But I wouldn't vouch that no supported encoding can have more than two ASCII characters in a sequence.
Comment 26 Thomas Sepez 2012-01-23 10:49:35 PST
Comment on attachment 123182 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=123182&action=review

>> Source/WebCore/platform/text/DecodeEscapeSequences.h:43
>> +    enum { sequenceSize = 6 }; // e.g. %u26C4
> 
> This patch doesn't have a WebCore ChangeLog. What are UTF-16 changes for?

Oops.  Looks like that got dropped.  Sorry.
The UTF-16 changes are to keep compatibility with the refactored interface that the template expects its supporting classes to provide.

>> Source/WebCore/platform/text/DecodeEscapeSequences.h:85
>> +        // in-range characters unchanged.
> 
> I find it difficult to understand this comment. It it saying that we add one additional character to the percent escaped range? But we may need at least two - GBK has sequences that have two ASCII bytes, e.g. 814e42 for 乂.

Ah.  Did not know that -- thought that GBK was a two-byte encoding.  Easy enough to fix for this.

>> LayoutTests/ChangeLog:9
>> +        * http/tests/navigation/anchor-frames-gbk.html: Added.
> 
> I see, it's indeed difficult to find non-XSS code paths executing this. Is it a bug that we don't pass encoding elsewhere, or correct behavior?

I'd assume its correct, or we'd have had additional reports, no?  Given that this has been wrong for so long with little consequence?
Comment 27 Thomas Sepez 2012-01-23 10:50:58 PST
(In reply to comment #25)
> I now think that if the only knowledge about encodings is that we need to extend escaped sequences by two characters, then it's probably OK. But I wouldn't vouch that no supported encoding can have more than two ASCII characters in a sequence.

Indeed.  It's very hard to make generalities about all the possible encodings.  But I think this fixes the issue in practice.

I'll put together another patch that addresses the comments above.  Thanks again.
Comment 28 Alexey Proskuryakov 2012-01-23 11:09:04 PST
> I'd assume its correct, or we'd have had additional reports, no?  Given that this has been wrong for so long with little consequence?

I would not make such a conclusion. People don't regularly use percent escaping in most URL parts, and behavior is not consistent between browsers when they do (e.g. when percent escaping of host name interacts with IDNA).
Comment 29 Thomas Sepez 2012-01-23 11:39:35 PST
Comment on attachment 123182 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=123182&action=review

>>> Source/WebCore/platform/text/DecodeEscapeSequences.h:85
>>> +        // in-range characters unchanged.
>> 
>> I find it difficult to understand this comment. It it saying that we add one additional character to the percent escaped range? But we may need at least two - GBK has sequences that have two ASCII bytes, e.g. 814e42 for 乂.
> 
> Ah.  Did not know that -- thought that GBK was a two-byte encoding.  Easy enough to fix for this.

Strange, I didn't find any pages about 3-byte GBK when I did a quick websearch, and  I get two characters for 814e42 not the single one you describe, but no matter -- passing two trailing bytes sounds like a good plan either way.
$ printf "\x81\x4e\x42" | iconv -f gbk -t utf-32LE | od -t x4
0000000          00004e2f        00000042

For my own education, would you happen to have a pointer to some doc on this sequence?
Comment 30 Thomas Sepez 2012-01-23 11:48:31 PST
(In reply to comment #28)
> > I'd assume its correct, or we'd have had additional reports, no?  Given that this has been wrong for so long with little consequence?
> 
> I would not make such a conclusion. People don't regularly use percent escaping in most URL parts, and behavior is not consistent between browsers when they do (e.g. when percent escaping of host name interacts with IDNA).

If so, I'm not sure what to say then, feels like this is getting beyond the scope of this one change.
Comment 31 Alexey Proskuryakov 2012-01-23 11:53:40 PST
Nope, I just misunderstood what I saw in GBK encoding table. GBK is 2-byte indeed. Sorry for the false lead!

However, what about ISO 2022 family encodings? These are stateful, and I think that an arbitrary number of ASCII characters can be part of a sequence.
Comment 32 Thomas Sepez 2012-01-23 12:03:19 PST
(In reply to comment #31)
> Nope, I just misunderstood what I saw in GBK encoding table. GBK is 2-byte indeed. Sorry for the false lead!
> 
> However, what about ISO 2022 family encodings? These are stateful, and I think that an arbitrary number of ASCII characters can be part of a sequence.

True, however these are going to be a lost cause, at least in the xssAuditor case. My impression was that they were slowly going extinct in face of unicode, but that big5/gbk are still very prevalent.
Comment 33 Thomas Sepez 2012-01-23 12:22:30 PST
Created attachment 123597 [details]
New patch for review.
Comment 34 Alexey Proskuryakov 2012-01-23 13:51:31 PST
If ISO 2022 is lost cause, what's the purpose of fixing GBK? I thought the issue here was primarily with attackers who are able to influence what encoding is used, so they could just pick ISO-2022-JP instead.

Two ISO 2022 family encodings are on Safari's encoding selection list, so presumably they are common enough for users to know them.
Comment 35 Thomas Sepez 2012-01-23 13:56:27 PST
(In reply to comment #34)
> If ISO 2022 is lost cause, what's the purpose of fixing GBK? I thought the issue here was primarily with attackers who are able to influence what encoding is used, so they could just pick ISO-2022-JP instead.
> 
No, if the attacker is able to pick what encoding to use, then there are other problems.  UTF-7 comes to mind (still supported?)  The patch is targeted at sites that specify big5 always.

> Two ISO 2022 family encodings are on Safari's encoding selection list, so presumably they are common enough for users to know them.

Users yes, site owners maybe less so  (I admit I don't have hard data).
Comment 36 Daniel Bates 2012-01-23 14:07:14 PST
Comment on attachment 123597 [details]
New patch for review.

View in context: https://bugs.webkit.org/attachment.cgi?id=123597&action=review

This patch looks good to me. Encoding/decoding can be tricky. If anyone sees any issues with this patch then feel free to raise them. I had some minor nits.

> Source/WebCore/ChangeLog:9
> +        results in face of encodings which re-use ASCII values in sequences.

Nit: I think your missing the word "the" in this line.

> Source/WebCore/platform/text/DecodeEscapeSequences.h:81
> +        // 0x40 - 0x7f range, after two values in this range, or at a %-sign that does not introduce a valid

Nit: You mention 0x7F above and 0x7f here. For consistency, I suggest choosing one hexadecimal digit notation (either uppercase or lowercase) to use throughout this comment.

> Source/WebCore/platform/text/DecodeEscapeSequences.h:92
> +            } else if (string[runEnd] >= 0x40 && string[runEnd] < 0x80 && numberOfTrailingCharacters < 2) {

For your consideration, I suggest writing the conjunct "string[runEnd] < 0x80" as "string[runEnd] <= 0x7F" so as to be consistent with the usage of greater-than-equal-to in the first conjunct as well as to make the maximum value in the range (0x7F) more recognizable, especially after reading the above comment.

> LayoutTests/http/tests/navigation/anchor-frames-gbk-expected.txt:17
> +This is an anchor point named as the unicode equivalent of the gbk sequence %a9g (test trailing low byte).

Nit: gbk => GBK

(since GBK is an acronym).

Nit: unicode => Unicode

> LayoutTests/http/tests/navigation/anchor-frames-gbk.html:1
> +<html>

This is the only test in this patch that doesn't have a DOCTYPE. Did you intend for this test to be run under quirks mode? If so, it may be beneficial to add a comment to the test to explain this. Regardless of the compatibility mode this test is run under, I don't anticipate a difference in the expected results .

> LayoutTests/http/tests/navigation/resources/frame-with-anchor-gbk.html:38
> +<a name="&#x586f">This is an anchor point named as the unicode equivalent of the gbk sequence %a9g (test trailing low byte)</a>.

Nit: gbk => GBK

For your consideration, I suggest programmatically removing this node before calling finishJSTest() so that this text doesn't appear after "TEST COMPLETE" in the expected result.
Comment 37 Thomas Sepez 2012-01-23 14:26:06 PST
Comment on attachment 123597 [details]
New patch for review.

View in context: https://bugs.webkit.org/attachment.cgi?id=123597&action=review

>> Source/WebCore/ChangeLog:9
>> +        results in face of encodings which re-use ASCII values in sequences.
> 
> Nit: I think your missing the word "the" in this line.

Huh?  Where?  Did you mean the encodings?  the sequences?

>> Source/WebCore/platform/text/DecodeEscapeSequences.h:81
>> +        // 0x40 - 0x7f range, after two values in this range, or at a %-sign that does not introduce a valid
> 
> Nit: You mention 0x7F above and 0x7f here. For consistency, I suggest choosing one hexadecimal digit notation (either uppercase or lowercase) to use throughout this comment.

Sure.

>> Source/WebCore/platform/text/DecodeEscapeSequences.h:92
>> +            } else if (string[runEnd] >= 0x40 && string[runEnd] < 0x80 && numberOfTrailingCharacters < 2) {
> 
> For your consideration, I suggest writing the conjunct "string[runEnd] < 0x80" as "string[runEnd] <= 0x7F" so as to be consistent with the usage of greater-than-equal-to in the first conjunct as well as to make the maximum value in the range (0x7F) more recognizable, especially after reading the above comment.

sure.

>> LayoutTests/http/tests/navigation/anchor-frames-gbk-expected.txt:17
>> +This is an anchor point named as the unicode equivalent of the gbk sequence %a9g (test trailing low byte).
> 
> Nit: gbk => GBK
> 
> (since GBK is an acronym).
> 
> Nit: unicode => Unicode

Sure.

>> LayoutTests/http/tests/navigation/anchor-frames-gbk.html:1
>> +<html>
> 
> This is the only test in this patch that doesn't have a DOCTYPE. Did you intend for this test to be run under quirks mode? If so, it may be beneficial to add a comment to the test to explain this. Regardless of the compatibility mode this test is run under, I don't anticipate a difference in the expected results .

Fixed.  Thanks.

>> LayoutTests/http/tests/navigation/resources/frame-with-anchor-gbk.html:38
>> +<a name="&#x586f">This is an anchor point named as the unicode equivalent of the gbk sequence %a9g (test trailing low byte)</a>.
> 
> Nit: gbk => GBK
> 
> For your consideration, I suggest programmatically removing this node before calling finishJSTest() so that this text doesn't appear after "TEST COMPLETE" in the expected result.

Fixed GBK.  If its all the same with you, I'm going to pass on the removing the anchor just so someone glancing at the output doesn't have to scratch his head wondering why the anchor didn't dump.
Comment 38 Daniel Bates 2012-01-23 14:31:32 PST
(In reply to comment #37)
> (From update of attachment 123597 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=123597&action=review
> 
> >> Source/WebCore/ChangeLog:9
> >> +        results in face of encodings which re-use ASCII values in sequences.
> > 
> > Nit: I think your missing the word "the" in this line.
> 
> Huh?  Where?  Did you mean the encodings?  the sequences?
> 

Shouldn't there be a "the" before "face"? I take it that the word "face" is being used as a synonym for the word "presence".

> > For your consideration, I suggest programmatically removing this node before calling finishJSTest() so that this text doesn't appear after "TEST COMPLETE" in the expected result.
> 
> Fixed GBK.  If its all the same with you, I'm going to pass on the removing the anchor just so someone glancing at the output doesn't have to scratch his head wondering why the anchor didn't dump.

OK.
Comment 39 Alexey Proskuryakov 2012-01-23 14:31:52 PST
> UTF-7 comes to mind (still supported?)

No.
Comment 40 Thomas Sepez 2012-01-23 14:35:24 PST
Comment on attachment 123597 [details]
New patch for review.

View in context: https://bugs.webkit.org/attachment.cgi?id=123597&action=review

>>>> Source/WebCore/ChangeLog:9
>>>> +        results in face of encodings which re-use ASCII values in sequences.
>>> 
>>> Nit: I think your missing the word "the" in this line.
>> 
>> Huh?  Where?  Did you mean the encodings?  the sequences?
> 
> Shouldn't there be a "the" before "face"? I take it that the word "face" is being used as a synonym for the word "presence".

"in the presence of" is better.  thanks.
Comment 41 Thomas Sepez 2012-01-23 14:37:13 PST
Created attachment 123623 [details]
Patch w/nits fixed.
Comment 42 WebKit Review Bot 2012-01-23 21:08:29 PST
Comment on attachment 123623 [details]
Patch w/nits fixed.

Clearing flags on attachment: 123623

Committed r105691: <http://trac.webkit.org/changeset/105691>
Comment 43 WebKit Review Bot 2012-01-23 21:08:37 PST
All reviewed patches have been landed.  Closing bug.