Bug 140173

Summary: The ASCII decoding for non ASCII character is incorrect if this character comes after the going through fast decoding code path and before the end of the text by less than a machine word size of characters
Product: WebKit Reporter: Said Abou-Hallawa <sabouhallawa>
Component: Page LoadingAssignee: Said Abou-Hallawa <sabouhallawa>
Status: RESOLVED FIXED    
Severity: Normal CC: commit-queue, darin
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Test case
none
Patch
none
Patch none

Description Said Abou-Hallawa 2015-01-06 21:09:53 PST
Created attachment 244137 [details]
Test case

1. Check the function TextCodecLatin1::decode()
2. Suppose the following text is decoded: "%41%42%43%44%45%46%47%48%82%82".
3. The URL precent encoded text will be converted to 10 bytes array and sent to TextCodecLatin1::decode(),
4. Since the first byte is ASCII and since its index is 8-bytes aligned, we use the fast decoding path.
5. Since the first eight bytes are all ASCII, we do the decoding in one step and and we copy the whole word from the source byte array to the result buffer.
6. Since after the first eight bytes, what is left is less than a machine word, we exit the fast decoding path.
7. The problem is after breaking the fast decoding code path, we copy the current byte as is even if it is a non ASCII character.

A test case is attached. It works correctly in FireFox and Chrome and shows two non ASCII 0x82 characters at the end. But in Safari, it shows a single 0x82 character at the end.
Comment 1 Said Abou-Hallawa 2015-01-06 21:25:26 PST
Created attachment 244140 [details]
Patch
Comment 2 Darin Adler 2015-01-06 23:50:29 PST
Comment on attachment 244140 [details]
Patch

This makes the all-ASCII-but-not-aligned-to-machine-word case slower unnecessarily.

I think a better fix might be to just add:

    if (!isASCII(*source))
        goto useLookupTable;

right after the code that says:

    if (source == end)
        break;

at the end of the isAlignedToMachineWord(source) block.
Comment 3 Darin Adler 2015-01-06 23:51:39 PST
Comment on attachment 244140 [details]
Patch

Great catch, by the way! Please consider my suggested alternate fix.
Comment 4 Said Abou-Hallawa 2015-01-07 12:04:04 PST
Created attachment 244172 [details]
Patch
Comment 5 Said Abou-Hallawa 2015-01-07 12:13:08 PST
(In reply to comment #3)
> Comment on attachment 244140 [details]
> Patch
> 
> Great catch, by the way! Please consider my suggested alternate fix.

Done. Thanks for the suggestion.
Comment 6 WebKit Commit Bot 2015-01-07 21:18:26 PST
Comment on attachment 244172 [details]
Patch

Clearing flags on attachment: 244172

Committed r178099: <http://trac.webkit.org/changeset/178099>
Comment 7 WebKit Commit Bot 2015-01-07 21:18:29 PST
All reviewed patches have been landed.  Closing bug.