Bug 140173

Summary: The ASCII decoding for non ASCII character is incorrect if this character comes after the going through fast decoding code path and before the end of the text by less than a machine word size of characters
Product: WebKit Reporter: Said Abou-Hallawa <sabouhallawa>
Component: Page LoadingAssignee: Said Abou-Hallawa <sabouhallawa>
Status: RESOLVED FIXED    
Severity: Normal CC: commit-queue, darin
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Test case
none
Patch
none
Patch none

Said Abou-Hallawa
Reported 2015-01-06 21:09:53 PST
Created attachment 244137 [details] Test case 1. Check the function TextCodecLatin1::decode() 2. Suppose the following text is decoded: "%41%42%43%44%45%46%47%48%82%82". 3. The URL precent encoded text will be converted to 10 bytes array and sent to TextCodecLatin1::decode(), 4. Since the first byte is ASCII and since its index is 8-bytes aligned, we use the fast decoding path. 5. Since the first eight bytes are all ASCII, we do the decoding in one step and and we copy the whole word from the source byte array to the result buffer. 6. Since after the first eight bytes, what is left is less than a machine word, we exit the fast decoding path. 7. The problem is after breaking the fast decoding code path, we copy the current byte as is even if it is a non ASCII character. A test case is attached. It works correctly in FireFox and Chrome and shows two non ASCII 0x82 characters at the end. But in Safari, it shows a single 0x82 character at the end.
Attachments
Test case (73 bytes, text/html)
2015-01-06 21:09 PST, Said Abou-Hallawa
no flags
Patch (5.83 KB, patch)
2015-01-06 21:25 PST, Said Abou-Hallawa
no flags
Patch (4.61 KB, patch)
2015-01-07 12:04 PST, Said Abou-Hallawa
no flags
Said Abou-Hallawa
Comment 1 2015-01-06 21:25:26 PST
Darin Adler
Comment 2 2015-01-06 23:50:29 PST
Comment on attachment 244140 [details] Patch This makes the all-ASCII-but-not-aligned-to-machine-word case slower unnecessarily. I think a better fix might be to just add: if (!isASCII(*source)) goto useLookupTable; right after the code that says: if (source == end) break; at the end of the isAlignedToMachineWord(source) block.
Darin Adler
Comment 3 2015-01-06 23:51:39 PST
Comment on attachment 244140 [details] Patch Great catch, by the way! Please consider my suggested alternate fix.
Said Abou-Hallawa
Comment 4 2015-01-07 12:04:04 PST
Said Abou-Hallawa
Comment 5 2015-01-07 12:13:08 PST
(In reply to comment #3) > Comment on attachment 244140 [details] > Patch > > Great catch, by the way! Please consider my suggested alternate fix. Done. Thanks for the suggestion.
WebKit Commit Bot
Comment 6 2015-01-07 21:18:26 PST
Comment on attachment 244172 [details] Patch Clearing flags on attachment: 244172 Committed r178099: <http://trac.webkit.org/changeset/178099>
WebKit Commit Bot
Comment 7 2015-01-07 21:18:29 PST
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.