Bug 153114 - createElement should not lowercase non-ASCII characters
Summary: createElement should not lowercase non-ASCII characters
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: DOM (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Ryosuke Niwa
URL:
Keywords:
Depends on:
Blocks: 150225
  Show dependency treegraph
 
Reported: 2016-01-14 19:23 PST by Ryosuke Niwa
Modified: 2016-01-15 09:20 PST (History)
9 users (show)

See Also:


Attachments
Fixes the bug (8.38 KB, patch)
2016-01-14 19:29 PST, Ryosuke Niwa
no flags Details | Formatted Diff | Diff
Add a reference to the spec in the change log (8.58 KB, patch)
2016-01-14 19:36 PST, Ryosuke Niwa
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ryosuke Niwa 2016-01-14 19:23:16 PST
Even in a HTML document, createElement shouldn't lowercase non-ASCII letters.

https://dom.spec.whatwg.org/#dom-document-createelement
1. If localName does not match the Name production, throw an InvalidCharacterError exception.
2. If the context object is an HTML document, let localName be converted to ASCII lowercase.
3. Let interface be the element interface for localName and the HTML namespace.
4. Return a new element that implements interface, with no attributes, namespace set to the HTML namespace, local name set to localName, and node document set to the context object.
Comment 1 Ryosuke Niwa 2016-01-14 19:29:21 PST
Created attachment 269025 [details]
Fixes the bug
Comment 2 Ryosuke Niwa 2016-01-14 19:36:25 PST
Created attachment 269026 [details]
Add a reference to the spec in the change log
Comment 3 Alex Christensen 2016-01-14 21:18:59 PST
Comment on attachment 269026 [details]
Add a reference to the spec in the change log

View in context: https://bugs.webkit.org/attachment.cgi?id=269026&action=review

> Source/WebCore/dom/Document.h:384
> +    WEBCORE_EXPORT RefPtr<Element> createElement(const AtomicString& tagName, ExceptionCode&);

There's no new use of this outside of WebCore.  I don't think WEBCORE_EXPORT is needed here.
Comment 4 Ryosuke Niwa 2016-01-14 21:22:25 PST
(In reply to comment #3)
> Comment on attachment 269026 [details]
> Add a reference to the spec in the change log
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=269026&action=review
> 
> > Source/WebCore/dom/Document.h:384
> > +    WEBCORE_EXPORT RefPtr<Element> createElement(const AtomicString& tagName, ExceptionCode&);
> 
> There's no new use of this outside of WebCore.  I don't think WEBCORE_EXPORT
> is needed here.

It's used by WKDOMDocument.mm in WebKit2.
Comment 5 WebKit Commit Bot 2016-01-14 22:13:10 PST
Comment on attachment 269026 [details]
Add a reference to the spec in the change log

Clearing flags on attachment: 269026

Committed r195091: <http://trac.webkit.org/changeset/195091>
Comment 6 WebKit Commit Bot 2016-01-14 22:13:14 PST
All reviewed patches have been landed.  Closing bug.
Comment 7 Darin Adler 2016-01-15 09:20:12 PST
Comment on attachment 269026 [details]
Add a reference to the spec in the change log

View in context: https://bugs.webkit.org/attachment.cgi?id=269026&action=review

> Source/WebCore/html/HTMLDocument.cpp:-235
> -    return HTMLElementFactory::createElement(QualifiedName(nullAtom, name.lower(), xhtmlNamespaceURI), *this);

Aside from the fix for this specific bug, it’s worth having someone audit WebKit code to eliminate most uses of this function, and possibly renaming it to a explicit longer name so it’s not attractive for people writing new code.

There are almost no cases where String::lower, String::upper, String::foldCase, or the StringImpl versions are the best functions to call. The few cases where these are needed are cases where we are processing actual document or user-typed text content, such as in the implementation of text-transform and of searching, and in those cases we almost always need to do something ore complicated than simply processing a single string without taking locale into account. In the vast majority of cases where we do use the functions, we are simply trying to process things in a way that is not sensitive to *ASCII* case because that concept arises in many places in standards like HTML, HTTP, DNS, etc. When we use these Unicode case processing functions we either get incorrect behavior (as we did here), or correct behavior but with more complex code doing more work that is necessary for no benefit.

There’s a similar issue with the stripWhiteSpace and simplifyWhiteSpace functions.