RESOLVED FIXED Bug 41345
Parameter names in frame src URLs parsed incorrectly if resembles HTML entity code followed by underscore
https://bugs.webkit.org/show_bug.cgi?id=41345
Summary Parameter names in frame src URLs parsed incorrectly if resembles HTML entity...
atomicules
Reported 2010-06-29 05:52:21 PDT
Created attachment 60014 [details] Test case showing error. (Note, originally filed as Issue 47509 on Chromium Issues, but then I read about Chrome using the new HTML5 parser in the Chromium-dev Google group, so verified with webkit nightly (r61877) and it is indeed a webkit bug). Chrome Version : 6.0.447.0 (Official Build 50594) dev URLs (if applicable) : See attached file Other browsers tested: Add OK or FAIL after other browsers where you have tested this issue: Safari 5: OK Firefox 3.7a6pre: OK IE 6: OK What steps will reproduce the problem? 1. Open up the attached webpage 2. Developer Tools > Elements will show the first paramter of the URL for the bottom frame as "setno=0∏_id=" 3. View source on the page, or viewing the attached file in a text editor will show the first parameters as "setno=0&prod_id=" What is the expected result? The URL parameters should be parsed correctly What happens instead? Parsed incorrectly. Please provide any additional information below. Attach a screenshot if possible. The attached file is based on an internal intranet page of the company I work for. I have just changed the URLs to something that will load for everyone (google search), but have included the parameter from the intranet page as a dummy parameter to the google search. This issue is new with 6.0.447.0. As it was working correctly in the previous build. In testing this parsing error seems to be caused if a parameter name resembles a HTML entity code (i.e. just missing the semi-colon) followed by an underscore: E.g: &pound_id => £_id &amp_id => &_id &quot_id => "_id
Attachments
Test case showing error. (534 bytes, text/html)
2010-06-29 05:52 PDT, atomicules
no flags
reduction (417 bytes, text/html)
2010-06-29 10:50 PDT, Adam Barth
no flags
Patch (4.33 KB, patch)
2010-06-29 17:59 PDT, Adam Barth
no flags
Patch (4.59 KB, patch)
2010-06-30 23:25 PDT, Adam Barth
no flags
Adam Barth
Comment 1 2010-06-29 10:50:02 PDT
Created attachment 60037 [details] reduction Here's a simpler reduction. For some reason, Minefield is treating the ∏ entity differently from the £ entity. We'll need to investigate further to understand why.
Alexey Proskuryakov
Comment 2 2010-06-29 11:32:51 PDT
See also: bug 35831.
Adam Barth
Comment 3 2010-06-29 17:24:16 PDT
*** Bug 35831 has been marked as a duplicate of this bug. ***
Adam Barth
Comment 4 2010-06-29 17:25:16 PDT
Some folks in the HTML working group explained the issue to me. We're ignoring an important bit in the entity table. Given that the old parser has the right behavior here, we must have that bit somewhere. Investigating.
Adam Barth
Comment 5 2010-06-29 17:38:31 PDT
Adam Barth
Comment 6 2010-06-29 17:59:44 PDT
Darin Adler
Comment 7 2010-06-29 18:01:33 PDT
Comment on attachment 60073 [details] Patch > + if (entity->code > 255) > + break; What's the meaning of the magic number 255 here? If it was a named constant it could have a name or comment indicating why 255 is the correct number.
Adam Barth
Comment 8 2010-06-29 18:03:27 PDT
(In reply to comment #7) > (From update of attachment 60073 [details]) > > + if (entity->code > 255) > > + break; > > What's the meaning of the magic number 255 here? If it was a named constant it could have a name or comment indicating why 255 is the correct number. I bet it's the characters that fit in Latin-1. That's how the code in the legacy tokenizer is written (although it only applies that branch to attribute values, which doesn't match the spec or Minefield). The spec just has a giant table: http://www.whatwg.org/specs/web-apps/current-work/multipage/named-character-references.html#named-character-references
Eric Seidel (no email)
Comment 9 2010-06-30 23:07:01 PDT
Comment on attachment 60073 [details] Patch Holy shit that needs a comment. No sir, I cannot r+ such madness w/o explanation (or at least a spec link).
Adam Barth
Comment 10 2010-06-30 23:25:26 PDT
Eric Seidel (no email)
Comment 11 2010-06-30 23:26:26 PDT
Comment on attachment 60203 [details] Patch Fantastic! Thank you!
WebKit Commit Bot
Comment 12 2010-07-01 02:29:32 PDT
Comment on attachment 60203 [details] Patch Clearing flags on attachment: 60203 Committed r62241: <http://trac.webkit.org/changeset/62241>
WebKit Commit Bot
Comment 13 2010-07-01 02:29:43 PDT
All reviewed patches have been landed. Closing bug.
WebKit Review Bot
Comment 14 2010-07-01 02:47:50 PDT
http://trac.webkit.org/changeset/62241 might have broken Qt Linux Release
Note You need to log in before you can comment on or make changes to this bug.