WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
Bug 41345
Parameter names in frame src URLs parsed incorrectly if resembles HTML entity code followed by underscore
https://bugs.webkit.org/show_bug.cgi?id=41345
Summary
Parameter names in frame src URLs parsed incorrectly if resembles HTML entity...
atomicules
Reported
2010-06-29 05:52:21 PDT
Created
attachment 60014
[details]
Test case showing error. (Note, originally filed as Issue 47509 on Chromium Issues, but then I read about Chrome using the new HTML5 parser in the Chromium-dev Google group, so verified with webkit nightly (
r61877
) and it is indeed a webkit bug). Chrome Version : 6.0.447.0 (Official Build 50594) dev URLs (if applicable) : See attached file Other browsers tested: Add OK or FAIL after other browsers where you have tested this issue: Safari 5: OK Firefox 3.7a6pre: OK IE 6: OK What steps will reproduce the problem? 1. Open up the attached webpage 2. Developer Tools > Elements will show the first paramter of the URL for the bottom frame as "setno=0∏_id=" 3. View source on the page, or viewing the attached file in a text editor will show the first parameters as "setno=0&prod_id=" What is the expected result? The URL parameters should be parsed correctly What happens instead? Parsed incorrectly. Please provide any additional information below. Attach a screenshot if possible. The attached file is based on an internal intranet page of the company I work for. I have just changed the URLs to something that will load for everyone (google search), but have included the parameter from the intranet page as a dummy parameter to the google search. This issue is new with 6.0.447.0. As it was working correctly in the previous build. In testing this parsing error seems to be caused if a parameter name resembles a HTML entity code (i.e. just missing the semi-colon) followed by an underscore: E.g: £_id => £_id &_id => &_id "_id => "_id
Attachments
Test case showing error.
(534 bytes, text/html)
2010-06-29 05:52 PDT
,
atomicules
no flags
Details
reduction
(417 bytes, text/html)
2010-06-29 10:50 PDT
,
Adam Barth
no flags
Details
Patch
(4.33 KB, patch)
2010-06-29 17:59 PDT
,
Adam Barth
no flags
Details
Formatted Diff
Diff
Patch
(4.59 KB, patch)
2010-06-30 23:25 PDT
,
Adam Barth
no flags
Details
Formatted Diff
Diff
Show Obsolete
(1)
View All
Add attachment
proposed patch, testcase, etc.
Adam Barth
Comment 1
2010-06-29 10:50:02 PDT
Created
attachment 60037
[details]
reduction Here's a simpler reduction. For some reason, Minefield is treating the ∏ entity differently from the £ entity. We'll need to investigate further to understand why.
Alexey Proskuryakov
Comment 2
2010-06-29 11:32:51 PDT
See also:
bug 35831
.
Adam Barth
Comment 3
2010-06-29 17:24:16 PDT
***
Bug 35831
has been marked as a duplicate of this bug. ***
Adam Barth
Comment 4
2010-06-29 17:25:16 PDT
Some folks in the HTML working group explained the issue to me. We're ignoring an important bit in the entity table. Given that the old parser has the right behavior here, we must have that bit somewhere. Investigating.
Adam Barth
Comment 5
2010-06-29 17:38:31 PDT
Looks like I missed this check:
http://trac.webkit.org/browser/trunk/WebCore/html/LegacyHTMLDocumentParser.cpp#L834
Adam Barth
Comment 6
2010-06-29 17:59:44 PDT
Created
attachment 60073
[details]
Patch
Darin Adler
Comment 7
2010-06-29 18:01:33 PDT
Comment on
attachment 60073
[details]
Patch
> + if (entity->code > 255) > + break;
What's the meaning of the magic number 255 here? If it was a named constant it could have a name or comment indicating why 255 is the correct number.
Adam Barth
Comment 8
2010-06-29 18:03:27 PDT
(In reply to
comment #7
)
> (From update of
attachment 60073
[details]
) > > + if (entity->code > 255) > > + break; > > What's the meaning of the magic number 255 here? If it was a named constant it could have a name or comment indicating why 255 is the correct number.
I bet it's the characters that fit in Latin-1. That's how the code in the legacy tokenizer is written (although it only applies that branch to attribute values, which doesn't match the spec or Minefield). The spec just has a giant table:
http://www.whatwg.org/specs/web-apps/current-work/multipage/named-character-references.html#named-character-references
Eric Seidel (no email)
Comment 9
2010-06-30 23:07:01 PDT
Comment on
attachment 60073
[details]
Patch Holy shit that needs a comment. No sir, I cannot r+ such madness w/o explanation (or at least a spec link).
Adam Barth
Comment 10
2010-06-30 23:25:26 PDT
Created
attachment 60203
[details]
Patch
Eric Seidel (no email)
Comment 11
2010-06-30 23:26:26 PDT
Comment on
attachment 60203
[details]
Patch Fantastic! Thank you!
WebKit Commit Bot
Comment 12
2010-07-01 02:29:32 PDT
Comment on
attachment 60203
[details]
Patch Clearing flags on attachment: 60203 Committed
r62241
: <
http://trac.webkit.org/changeset/62241
>
WebKit Commit Bot
Comment 13
2010-07-01 02:29:43 PDT
All reviewed patches have been landed. Closing bug.
WebKit Review Bot
Comment 14
2010-07-01 02:47:50 PDT
http://trac.webkit.org/changeset/62241
might have broken Qt Linux Release
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug