Created attachment 60014 [details] Test case showing error. (Note, originally filed as Issue 47509 on Chromium Issues, but then I read about Chrome using the new HTML5 parser in the Chromium-dev Google group, so verified with webkit nightly (r61877) and it is indeed a webkit bug). Chrome Version : 6.0.447.0 (Official Build 50594) dev URLs (if applicable) : See attached file Other browsers tested: Add OK or FAIL after other browsers where you have tested this issue: Safari 5: OK Firefox 3.7a6pre: OK IE 6: OK What steps will reproduce the problem? 1. Open up the attached webpage 2. Developer Tools > Elements will show the first paramter of the URL for the bottom frame as "setno=0∏_id=" 3. View source on the page, or viewing the attached file in a text editor will show the first parameters as "setno=0&prod_id=" What is the expected result? The URL parameters should be parsed correctly What happens instead? Parsed incorrectly. Please provide any additional information below. Attach a screenshot if possible. The attached file is based on an internal intranet page of the company I work for. I have just changed the URLs to something that will load for everyone (google search), but have included the parameter from the intranet page as a dummy parameter to the google search. This issue is new with 6.0.447.0. As it was working correctly in the previous build. In testing this parsing error seems to be caused if a parameter name resembles a HTML entity code (i.e. just missing the semi-colon) followed by an underscore: E.g: £_id => £_id &_id => &_id "_id => "_id
Created attachment 60037 [details] reduction Here's a simpler reduction. For some reason, Minefield is treating the ∏ entity differently from the £ entity. We'll need to investigate further to understand why.
See also: bug 35831.
*** Bug 35831 has been marked as a duplicate of this bug. ***
Some folks in the HTML working group explained the issue to me. We're ignoring an important bit in the entity table. Given that the old parser has the right behavior here, we must have that bit somewhere. Investigating.
Looks like I missed this check: http://trac.webkit.org/browser/trunk/WebCore/html/LegacyHTMLDocumentParser.cpp#L834
Created attachment 60073 [details] Patch
Comment on attachment 60073 [details] Patch > + if (entity->code > 255) > + break; What's the meaning of the magic number 255 here? If it was a named constant it could have a name or comment indicating why 255 is the correct number.
(In reply to comment #7) > (From update of attachment 60073 [details]) > > + if (entity->code > 255) > > + break; > > What's the meaning of the magic number 255 here? If it was a named constant it could have a name or comment indicating why 255 is the correct number. I bet it's the characters that fit in Latin-1. That's how the code in the legacy tokenizer is written (although it only applies that branch to attribute values, which doesn't match the spec or Minefield). The spec just has a giant table: http://www.whatwg.org/specs/web-apps/current-work/multipage/named-character-references.html#named-character-references
Comment on attachment 60073 [details] Patch Holy shit that needs a comment. No sir, I cannot r+ such madness w/o explanation (or at least a spec link).
Created attachment 60203 [details] Patch
Comment on attachment 60203 [details] Patch Fantastic! Thank you!
Comment on attachment 60203 [details] Patch Clearing flags on attachment: 60203 Committed r62241: <http://trac.webkit.org/changeset/62241>
All reviewed patches have been landed. Closing bug.
http://trac.webkit.org/changeset/62241 might have broken Qt Linux Release