Summary: | WebCore PreloadScanner Entity Detection Bug - Non-HTML Entities are being treated as entities | ||
---|---|---|---|
Product: | WebKit | Reporter: | Jeff <mirthy> |
Component: | Page Loading | Assignee: | Nobody <webkit-unassigned> |
Status: | RESOLVED DUPLICATE | ||
Severity: | Normal | CC: | abarth, ap, inhoahiephcm, koivisto |
Priority: | P2 | ||
Version: | 528+ (Nightly build) | ||
Hardware: | Mac (Intel) | ||
OS: | OS X 10.6 | ||
URL: | http://www.vistaprint.com/gallery.aspx |
Description
Jeff
2010-03-06 12:46:02 PST
Also replicated on Safari 4 Mac/Win, Google Chrome Mac/Win. You can spot the preload issue in the Resource Inspector. PreloadScanner (unlike the main tokenizer currently) implements the HTML5 entity parsing. The spec says (10.2.4): "Consume the maximum number of characters possible, with the consumed characters matching one of the identifiers in the first column of the named character references table (in a case-sensitive manner). If no match can be made, then this is a parse error. No characters are consumed, and nothing is returned. If the last character matched is not a U+003B SEMICOLON character (;), there is a parse error. If the character reference is being consumed as part of an attribute, and the last character matched is not a U+003B SEMICOLON character (;), and the next character is in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), U+0041 LATIN CAPITAL LETTER A to U+005A LATIN CAPITAL LETTER Z, or U+0061 LATIN SMALL LETTER A to U+007A LATIN SMALL LETTER Z, then, for historical reasons, all the characters that were matched after the U+0026 AMPERSAND character (&) must be unconsumed, and nothing is returned. Otherwise, return a character token for the character corresponding to the character reference name (as given by the second column of the named character references table)." Basically, if a named entity in attribute ends in non-alphanumeric character other than ; it is considered a parse error but the entity is still returned. As far as I see the implementation matches the spec. If you think HTML5 is wrong here, you should send mail to the whatwg list and explain why. The HTML spec has been amended to address these situations: http://www.w3.org/Bugs/Public/show_bug.cgi?id=9207 Reopening based on above. EQUALS SIGN character (=) at the end should cause the entity to be unconsumed. The preload scanner uses the same tokenizer as the main parser now, so we'll fix both these bugs at once. *** This bug has been marked as a duplicate of bug 41345 *** Jeff, could you please verify whether this is indeed fixed in current nightly builds? (In reply to comment #7) > Jeff, could you please verify whether this is indeed fixed in current nightly builds? Verified as fixed. Thanks guys! (In reply to comment #8) > (In reply to comment #7) > > Jeff, could you please verify whether this is indeed fixed in current nightly builds? > > Verified as fixed. Thanks guys! Verified in 6533.16, r64451 |