Bug 140173 - The ASCII decoding for non ASCII character is incorrect if this character comes after the going through fast decoding code path and before the end of the text by less than a machine word size of characters
Summary: The ASCII decoding for non ASCII character is incorrect if this character com...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Page Loading (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Said Abou-Hallawa
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-01-06 21:09 PST by Said Abou-Hallawa
Modified: 2015-01-07 21:18 PST (History)
2 users (show)

See Also:


Attachments
Test case (73 bytes, text/html)
2015-01-06 21:09 PST, Said Abou-Hallawa
no flags Details
Patch (5.83 KB, patch)
2015-01-06 21:25 PST, Said Abou-Hallawa
no flags Details | Formatted Diff | Diff
Patch (4.61 KB, patch)
2015-01-07 12:04 PST, Said Abou-Hallawa
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Said Abou-Hallawa 2015-01-06 21:09:53 PST
Created attachment 244137 [details]
Test case

1. Check the function TextCodecLatin1::decode()
2. Suppose the following text is decoded: "%41%42%43%44%45%46%47%48%82%82".
3. The URL precent encoded text will be converted to 10 bytes array and sent to TextCodecLatin1::decode(),
4. Since the first byte is ASCII and since its index is 8-bytes aligned, we use the fast decoding path.
5. Since the first eight bytes are all ASCII, we do the decoding in one step and and we copy the whole word from the source byte array to the result buffer.
6. Since after the first eight bytes, what is left is less than a machine word, we exit the fast decoding path.
7. The problem is after breaking the fast decoding code path, we copy the current byte as is even if it is a non ASCII character.

A test case is attached. It works correctly in FireFox and Chrome and shows two non ASCII 0x82 characters at the end. But in Safari, it shows a single 0x82 character at the end.
Comment 1 Said Abou-Hallawa 2015-01-06 21:25:26 PST
Created attachment 244140 [details]
Patch
Comment 2 Darin Adler 2015-01-06 23:50:29 PST
Comment on attachment 244140 [details]
Patch

This makes the all-ASCII-but-not-aligned-to-machine-word case slower unnecessarily.

I think a better fix might be to just add:

    if (!isASCII(*source))
        goto useLookupTable;

right after the code that says:

    if (source == end)
        break;

at the end of the isAlignedToMachineWord(source) block.
Comment 3 Darin Adler 2015-01-06 23:51:39 PST
Comment on attachment 244140 [details]
Patch

Great catch, by the way! Please consider my suggested alternate fix.
Comment 4 Said Abou-Hallawa 2015-01-07 12:04:04 PST
Created attachment 244172 [details]
Patch
Comment 5 Said Abou-Hallawa 2015-01-07 12:13:08 PST
(In reply to comment #3)
> Comment on attachment 244140 [details]
> Patch
> 
> Great catch, by the way! Please consider my suggested alternate fix.

Done. Thanks for the suggestion.
Comment 6 WebKit Commit Bot 2015-01-07 21:18:26 PST
Comment on attachment 244172 [details]
Patch

Clearing flags on attachment: 244172

Committed r178099: <http://trac.webkit.org/changeset/178099>
Comment 7 WebKit Commit Bot 2015-01-07 21:18:29 PST
All reviewed patches have been landed.  Closing bug.