Bug 153114

Summary: createElement should not lowercase non-ASCII characters
Product: WebKit Reporter: Ryosuke Niwa <rniwa>
Component: DOMAssignee: Ryosuke Niwa <rniwa>
Status: RESOLVED FIXED    
Severity: Normal CC: cdumez, commit-queue, darin, esprehn+autocc, gyuyoung.kim, kangil.han, kling, koivisto, sam
Priority: P2    
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on:    
Bug Blocks: 150225    
Attachments:
Description Flags
Fixes the bug
none
Add a reference to the spec in the change log none

Description Ryosuke Niwa 2016-01-14 19:23:16 PST
Even in a HTML document, createElement shouldn't lowercase non-ASCII letters.

https://dom.spec.whatwg.org/#dom-document-createelement
1. If localName does not match the Name production, throw an InvalidCharacterError exception.
2. If the context object is an HTML document, let localName be converted to ASCII lowercase.
3. Let interface be the element interface for localName and the HTML namespace.
4. Return a new element that implements interface, with no attributes, namespace set to the HTML namespace, local name set to localName, and node document set to the context object.
Comment 1 Ryosuke Niwa 2016-01-14 19:29:21 PST
Created attachment 269025 [details]
Fixes the bug
Comment 2 Ryosuke Niwa 2016-01-14 19:36:25 PST
Created attachment 269026 [details]
Add a reference to the spec in the change log
Comment 3 Alex Christensen 2016-01-14 21:18:59 PST
Comment on attachment 269026 [details]
Add a reference to the spec in the change log

View in context: https://bugs.webkit.org/attachment.cgi?id=269026&action=review

> Source/WebCore/dom/Document.h:384
> +    WEBCORE_EXPORT RefPtr<Element> createElement(const AtomicString& tagName, ExceptionCode&);

There's no new use of this outside of WebCore.  I don't think WEBCORE_EXPORT is needed here.
Comment 4 Ryosuke Niwa 2016-01-14 21:22:25 PST
(In reply to comment #3)
> Comment on attachment 269026 [details]
> Add a reference to the spec in the change log
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=269026&action=review
> 
> > Source/WebCore/dom/Document.h:384
> > +    WEBCORE_EXPORT RefPtr<Element> createElement(const AtomicString& tagName, ExceptionCode&);
> 
> There's no new use of this outside of WebCore.  I don't think WEBCORE_EXPORT
> is needed here.

It's used by WKDOMDocument.mm in WebKit2.
Comment 5 WebKit Commit Bot 2016-01-14 22:13:10 PST
Comment on attachment 269026 [details]
Add a reference to the spec in the change log

Clearing flags on attachment: 269026

Committed r195091: <http://trac.webkit.org/changeset/195091>
Comment 6 WebKit Commit Bot 2016-01-14 22:13:14 PST
All reviewed patches have been landed.  Closing bug.
Comment 7 Darin Adler 2016-01-15 09:20:12 PST
Comment on attachment 269026 [details]
Add a reference to the spec in the change log

View in context: https://bugs.webkit.org/attachment.cgi?id=269026&action=review

> Source/WebCore/html/HTMLDocument.cpp:-235
> -    return HTMLElementFactory::createElement(QualifiedName(nullAtom, name.lower(), xhtmlNamespaceURI), *this);

Aside from the fix for this specific bug, it’s worth having someone audit WebKit code to eliminate most uses of this function, and possibly renaming it to a explicit longer name so it’s not attractive for people writing new code.

There are almost no cases where String::lower, String::upper, String::foldCase, or the StringImpl versions are the best functions to call. The few cases where these are needed are cases where we are processing actual document or user-typed text content, such as in the implementation of text-transform and of searching, and in those cases we almost always need to do something ore complicated than simply processing a single string without taking locale into account. In the vast majority of cases where we do use the functions, we are simply trying to process things in a way that is not sensitive to *ASCII* case because that concept arises in many places in standards like HTML, HTTP, DNS, etc. When we use these Unicode case processing functions we either get incorrect behavior (as we did here), or correct behavior but with more complex code doing more work that is necessary for no benefit.

There’s a similar issue with the stripWhiteSpace and simplifyWhiteSpace functions.