WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED WONTFIX
Bug 48225
Language attribute validation
https://bugs.webkit.org/show_bug.cgi?id=48225
Summary
Language attribute validation
Leandro Graciá Gil
Reported
2010-10-25 02:32:15 PDT
The contents of the language attribute are expected to follow the BCP47 specification (
http://www.rfc-editor.org/rfc/bcp/bcp47.txt
) however no validation at all is performed on it. This patch introduces a brief validation to filter the language attribute contents to [A-Za-z0-9-]* with the purpose of making it safe to use it, for example, as a parameter in URLs. Please note that this doesn't guarantee that the attribute contains a completely valid language tag according to BCP47 as this is not the purpose of this patch.
Attachments
Patch
(1.92 KB, patch)
2010-10-25 02:56 PDT
,
Leandro Graciá Gil
no flags
Details
Formatted Diff
Diff
Patch
(5.24 KB, patch)
2010-10-26 07:02 PDT
,
Leandro Graciá Gil
no flags
Details
Formatted Diff
Diff
Patch
(5.26 KB, patch)
2010-10-26 07:17 PDT
,
Leandro Graciá Gil
no flags
Details
Formatted Diff
Diff
Patch
(5.21 KB, patch)
2010-10-26 07:36 PDT
,
Leandro Graciá Gil
no flags
Details
Formatted Diff
Diff
Patch
(5.18 KB, patch)
2010-10-26 09:20 PDT
,
Leandro Graciá Gil
no flags
Details
Formatted Diff
Diff
Patch
(5.15 KB, patch)
2010-10-26 09:40 PDT
,
Leandro Graciá Gil
ap
: review-
Details
Formatted Diff
Diff
Show Obsolete
(5)
View All
Add attachment
proposed patch, testcase, etc.
Leandro Graciá Gil
Comment 1
2010-10-25 02:56:30 PDT
Created
attachment 71724
[details]
Patch
Satish Sampath
Comment 2
2010-10-25 03:03:02 PDT
Comment on
attachment 71724
[details]
Patch Since this is a standalone patch, can you add layout tests to check for valid and invalid language codes?
Alexey Proskuryakov
Comment 3
2010-10-25 08:26:41 PDT
Anyone is welcome to review patches, but only reviewers should actually mark them r+ or r-. I agree that this needs a test.
Leandro Graciá Gil
Comment 4
2010-10-26 07:02:54 PDT
Created
attachment 71873
[details]
Patch
Leandro Graciá Gil
Comment 5
2010-10-26 07:17:32 PDT
Created
attachment 71878
[details]
Patch
Jeremy Orlow
Comment 6
2010-10-26 07:28:56 PDT
Comment on
attachment 71873
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=71873&action=review
> LayoutTests/fast/dom/Element/language-valid.html:13 > + <div id="console"></div>
put this under the <p>'s. Otherwise it's kind of confusing to read the output.
> WebCore/ChangeLog:12 > +
https://bugs.webkit.org/show_bug.cgi?id=48225
move this under the first line (before the blank new line)
> WebCore/dom/Element.cpp:1508 > + if (!value.isNull() && !value.isEmpty()) {
IsEmpty is a superset of isNull. When in doubt, please look these things up so that code isn't more complex than it needs to be.
> WebCore/dom/Element.cpp:1510 > + CString asciiOnly = value.string().ascii();
It seems as though we should be able to do this check without converting the whole string to ascii first. Maybe this isn't a performance concern and thus keeping the code simple is best, but it seems as though you should be able to inspect the data directly.
Satish Sampath
Comment 7
2010-10-26 07:30:57 PDT
Comment on
attachment 71878
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=71878&action=review
> LayoutTests/fast/dom/Element/language-valid.html:26 > + jsTestIsAsync = true;
Since this test is not really async (no handlers, everything done sequentially when the page loads), I think you can remove this line and the call to 'finishJSTest()' below.
Leandro Graciá Gil
Comment 8
2010-10-26 07:36:25 PDT
Created
attachment 71881
[details]
Patch
Leandro Graciá Gil
Comment 9
2010-10-26 07:42:40 PDT
(In reply to
comment #7
)
> (From update of
attachment 71878
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=71878&action=review
> > > LayoutTests/fast/dom/Element/language-valid.html:26 > > + jsTestIsAsync = true; > > Since this test is not really async (no handlers, everything done sequentially when the page loads), I think you can remove this line and the call to 'finishJSTest()' below.
Done. Going to apply Jeremy's updates now.
Leandro Graciá Gil
Comment 10
2010-10-26 07:45:48 PDT
(In reply to
comment #6
)
> (From update of
attachment 71873
[details]
) > > WebCore/dom/Element.cpp:1510 > > + CString asciiOnly = value.string().ascii(); > > It seems as though we should be able to do this check without converting the whole string to ascii first. Maybe this isn't a performance concern and thus keeping the code simple is best, but it seems as though you should be able to inspect the data directly.
There's another reason of doing this, and it's that ascii deals with the UTF16 encoding and converts all non-ascii characters to the question mark symbol. That way it avoids any nasty problems coming from the representation of non-ascii characters.
Leandro Graciá Gil
Comment 11
2010-10-26 09:20:47 PDT
Created
attachment 71900
[details]
Patch
Jeremy Orlow
Comment 12
2010-10-26 09:27:40 PDT
Comment on
attachment 71900
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=71900&action=review
Does anyone else have any thoughts?
> LayoutTests/fast/dom/Element/language-valid.html:15 > + <p id="description"></p>
I think description is still best on top
> WebCore/dom/Element.cpp:1509 > + // Only A-Za-z0-9 and '-' are allowed in the language tag (see BCP 47)
period at end
> WebCore/dom/Element.cpp:1510 > + size_t length = value.length();
not sure if there's any value to this being broken out. Maybe just replace length with it?
Leandro Graciá Gil
Comment 13
2010-10-26 09:40:51 PDT
Created
attachment 71901
[details]
Patch
Jeremy Orlow
Comment 14
2010-10-26 09:42:40 PDT
Comment on
attachment 71901
[details]
Patch LGTM. Ask me to r+ it tomorrow if no one else raises an issue in the mean time.
Leandro Graciá Gil
Comment 15
2010-10-26 09:43:54 PDT
(In reply to
comment #14
)
> (From update of
attachment 71901
[details]
) > LGTM. Ask me to r+ it tomorrow if no one else raises an issue in the mean time.
Ok. Thank you.
Alexey Proskuryakov
Comment 16
2010-10-26 10:07:57 PDT
Comment on
attachment 71901
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=71901&action=review
> LayoutTests/ChangeLog:9 > + The contents of the language attribute are expected to follow the BCP 47 > + specification. However no validation at all is performed on it. This patch
What are the specs governing this? In HTML5, I could find that in two places: - authoring requirements, which are irrelevant to browsers; - non-normative list of attributes, saying that value of hreflang, lang, and srclang attributes is a "valid BCP 47 language tag". So, I don't see that requirement specified anywhere. And Firefox 3.6 fails the test case included in this patch. Does IE pass it?
> LayoutTests/fast/dom/Element/language-valid.html:9 > + p:lang(en) { text-transform: uppercase; } > + p:lang(iην41iδ_ã»4g) { text-transform: uppercase; }
We'd need tests for more forbidden characters - ideally, for every forbidden ASCII character, like en/GB. Please use red/green color in results for easier visual evaluation <
http://www.w3.org/Style/CSS/Test/guidelines.html#color
>.
> WebCore/dom/Element.cpp:1513 > + if (!isASCIIAlphanumeric(c) && c != '-') > + return AtomicString();
Is an invalid value equivalent to lang="", or should it be ignored?
Leandro Graciá Gil
Comment 17
2010-10-26 10:20:20 PDT
(In reply to
comment #16
)
> (From update of
attachment 71901
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=71901&action=review
> > > LayoutTests/ChangeLog:9 > > + The contents of the language attribute are expected to follow the BCP 47 > > + specification. However no validation at all is performed on it. This patch > > What are the specs governing this? In HTML5, I could find that in two places: > - authoring requirements, which are irrelevant to browsers; > - non-normative list of attributes, saying that value of hreflang, lang, and srclang attributes is a "valid BCP 47 language tag". > > So, I don't see that requirement specified anywhere. And Firefox 3.6 fails the test case included in this patch. Does IE pass it? >
I was referring to the non-normative attributes. This is not required to be implemented, but thought that it should be good to have some kind of validation. If you think it's not necessary at all we can drop this issue.
Alexey Proskuryakov
Comment 18
2010-10-26 10:24:17 PDT
We certainly should be very suspicious of requirements not mandated by specs, especially if no other browser implements such.
Leandro Graciá Gil
Comment 19
2010-10-26 10:44:53 PDT
(In reply to
comment #18
)
> We certainly should be very suspicious of requirements not mandated by specs, especially if no other browser implements such.
Agreed. Just tested in IE and it doesn't seem to implement it either. Let's drop this. Thank you for your time and sorry.
Leandro Graciá Gil
Comment 20
2010-10-26 10:45:41 PDT
Please mark it as a WONTFIX as I can't do it by myself.
Jeremy Orlow
Comment 21
2010-10-26 10:48:44 PDT
Sounds like a plan. P.S. Sorry for not checking whether this was normative text or what other browsers will do. Will try to do that in such reviews in the future!
Alexey Proskuryakov
Comment 22
2010-10-26 10:56:53 PDT
You may want to report this as a bug in HTML5 draft, as the non-normative description is misleading.
Alexey Proskuryakov
Comment 23
2010-10-26 10:58:26 PDT
We could also land a regression test verifying that non-ASCII lang values actually work. I think that would be fairly valuable.
Leandro Graciá Gil
Comment 24
2010-10-27 02:12:23 PDT
(In reply to
comment #23
)
> We could also land a regression test verifying that non-ASCII lang values actually work. I think that would be fairly valuable.
Actually, I found while developing the layout test that if the language tag in the p:lang() contained an @ then it was not properly assigned. I'm not sure if this is a CSS syntax problem, though. I got no complains in the test results. It did work however with many invalid characters like Greek and Japanese letters. An example would be: p:lang(doesnt@work) { text-transform: uppercase; } ... <p id="invalid" lang="doesn@work">This line won't be in uppercase at all. Not because of the patch but because the CSS style doesn't seem to be properly associated by the lang attribute.</p>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug