Summary: | Language attribute validation | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Leandro Graciá Gil <leandrogracia> | ||||||||||||||
Component: | DOM | Assignee: | Leandro Graciá Gil <leandrogracia> | ||||||||||||||
Status: | RESOLVED WONTFIX | ||||||||||||||||
Severity: | Normal | CC: | ap, jorlow, satish | ||||||||||||||
Priority: | P2 | ||||||||||||||||
Version: | 528+ (Nightly build) | ||||||||||||||||
Hardware: | All | ||||||||||||||||
OS: | All | ||||||||||||||||
Attachments: |
|
Description
Leandro Graciá Gil
2010-10-25 02:32:15 PDT
Created attachment 71724 [details]
Patch
Comment on attachment 71724 [details]
Patch
Since this is a standalone patch, can you add layout tests to check for valid and invalid language codes?
Anyone is welcome to review patches, but only reviewers should actually mark them r+ or r-. I agree that this needs a test. Created attachment 71873 [details]
Patch
Created attachment 71878 [details]
Patch
Comment on attachment 71873 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=71873&action=review > LayoutTests/fast/dom/Element/language-valid.html:13 > + <div id="console"></div> put this under the <p>'s. Otherwise it's kind of confusing to read the output. > WebCore/ChangeLog:12 > + https://bugs.webkit.org/show_bug.cgi?id=48225 move this under the first line (before the blank new line) > WebCore/dom/Element.cpp:1508 > + if (!value.isNull() && !value.isEmpty()) { IsEmpty is a superset of isNull. When in doubt, please look these things up so that code isn't more complex than it needs to be. > WebCore/dom/Element.cpp:1510 > + CString asciiOnly = value.string().ascii(); It seems as though we should be able to do this check without converting the whole string to ascii first. Maybe this isn't a performance concern and thus keeping the code simple is best, but it seems as though you should be able to inspect the data directly. Comment on attachment 71878 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=71878&action=review > LayoutTests/fast/dom/Element/language-valid.html:26 > + jsTestIsAsync = true; Since this test is not really async (no handlers, everything done sequentially when the page loads), I think you can remove this line and the call to 'finishJSTest()' below. Created attachment 71881 [details]
Patch
(In reply to comment #7) > (From update of attachment 71878 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=71878&action=review > > > LayoutTests/fast/dom/Element/language-valid.html:26 > > + jsTestIsAsync = true; > > Since this test is not really async (no handlers, everything done sequentially when the page loads), I think you can remove this line and the call to 'finishJSTest()' below. Done. Going to apply Jeremy's updates now. (In reply to comment #6) > (From update of attachment 71873 [details]) > > WebCore/dom/Element.cpp:1510 > > + CString asciiOnly = value.string().ascii(); > > It seems as though we should be able to do this check without converting the whole string to ascii first. Maybe this isn't a performance concern and thus keeping the code simple is best, but it seems as though you should be able to inspect the data directly. There's another reason of doing this, and it's that ascii deals with the UTF16 encoding and converts all non-ascii characters to the question mark symbol. That way it avoids any nasty problems coming from the representation of non-ascii characters. Created attachment 71900 [details]
Patch
Comment on attachment 71900 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=71900&action=review Does anyone else have any thoughts? > LayoutTests/fast/dom/Element/language-valid.html:15 > + <p id="description"></p> I think description is still best on top > WebCore/dom/Element.cpp:1509 > + // Only A-Za-z0-9 and '-' are allowed in the language tag (see BCP 47) period at end > WebCore/dom/Element.cpp:1510 > + size_t length = value.length(); not sure if there's any value to this being broken out. Maybe just replace length with it? Created attachment 71901 [details]
Patch
Comment on attachment 71901 [details]
Patch
LGTM. Ask me to r+ it tomorrow if no one else raises an issue in the mean time.
(In reply to comment #14) > (From update of attachment 71901 [details]) > LGTM. Ask me to r+ it tomorrow if no one else raises an issue in the mean time. Ok. Thank you. Comment on attachment 71901 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=71901&action=review > LayoutTests/ChangeLog:9 > + The contents of the language attribute are expected to follow the BCP 47 > + specification. However no validation at all is performed on it. This patch What are the specs governing this? In HTML5, I could find that in two places: - authoring requirements, which are irrelevant to browsers; - non-normative list of attributes, saying that value of hreflang, lang, and srclang attributes is a "valid BCP 47 language tag". So, I don't see that requirement specified anywhere. And Firefox 3.6 fails the test case included in this patch. Does IE pass it? > LayoutTests/fast/dom/Element/language-valid.html:9 > + p:lang(en) { text-transform: uppercase; } > + p:lang(iην41iδ_ã»4g) { text-transform: uppercase; } We'd need tests for more forbidden characters - ideally, for every forbidden ASCII character, like en/GB. Please use red/green color in results for easier visual evaluation <http://www.w3.org/Style/CSS/Test/guidelines.html#color>. > WebCore/dom/Element.cpp:1513 > + if (!isASCIIAlphanumeric(c) && c != '-') > + return AtomicString(); Is an invalid value equivalent to lang="", or should it be ignored? (In reply to comment #16) > (From update of attachment 71901 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=71901&action=review > > > LayoutTests/ChangeLog:9 > > + The contents of the language attribute are expected to follow the BCP 47 > > + specification. However no validation at all is performed on it. This patch > > What are the specs governing this? In HTML5, I could find that in two places: > - authoring requirements, which are irrelevant to browsers; > - non-normative list of attributes, saying that value of hreflang, lang, and srclang attributes is a "valid BCP 47 language tag". > > So, I don't see that requirement specified anywhere. And Firefox 3.6 fails the test case included in this patch. Does IE pass it? > I was referring to the non-normative attributes. This is not required to be implemented, but thought that it should be good to have some kind of validation. If you think it's not necessary at all we can drop this issue. We certainly should be very suspicious of requirements not mandated by specs, especially if no other browser implements such. (In reply to comment #18) > We certainly should be very suspicious of requirements not mandated by specs, especially if no other browser implements such. Agreed. Just tested in IE and it doesn't seem to implement it either. Let's drop this. Thank you for your time and sorry. Please mark it as a WONTFIX as I can't do it by myself. Sounds like a plan. P.S. Sorry for not checking whether this was normative text or what other browsers will do. Will try to do that in such reviews in the future! You may want to report this as a bug in HTML5 draft, as the non-normative description is misleading. We could also land a regression test verifying that non-ASCII lang values actually work. I think that would be fairly valuable. (In reply to comment #23) > We could also land a regression test verifying that non-ASCII lang values actually work. I think that would be fairly valuable. Actually, I found while developing the layout test that if the language tag in the p:lang() contained an @ then it was not properly assigned. I'm not sure if this is a CSS syntax problem, though. I got no complains in the test results. It did work however with many invalid characters like Greek and Japanese letters. An example would be: p:lang(doesnt@work) { text-transform: uppercase; } ... <p id="invalid" lang="doesn@work">This line won't be in uppercase at all. Not because of the patch but because the CSS style doesn't seem to be properly associated by the lang attribute.</p> |