RESOLVED FIXED150094
[ES6] Implement String.prototype.normalize
https://bugs.webkit.org/show_bug.cgi?id=150094
Summary [ES6] Implement String.prototype.normalize
Yusuke Suzuki
Reported 2015-10-13 12:28:18 PDT
By leveraging ICU, implement String.prototype.normalize to complete ES6 String extras.
Attachments
Patch (5.24 KB, patch)
2015-10-13 21:46 PDT, Yusuke Suzuki
no flags
Patch (9.69 KB, patch)
2015-10-16 02:49 PDT, Yusuke Suzuki
no flags
Archive of layout-test-results from ews102 for mac-mavericks (641.19 KB, application/zip)
2015-10-16 03:22 PDT, Build Bot
no flags
Archive of layout-test-results from ews105 for mac-mavericks-wk2 (750.79 KB, application/zip)
2015-10-16 03:28 PDT, Build Bot
no flags
Patch (15.08 KB, patch)
2015-10-16 03:52 PDT, Yusuke Suzuki
no flags
Patch (16.07 KB, patch)
2015-10-16 04:05 PDT, Yusuke Suzuki
ggaren: review+
Yusuke Suzuki
Comment 2 2015-10-13 21:46:31 PDT
Created attachment 263056 [details] Patch Just implemented. We need to add tests
Yusuke Suzuki
Comment 3 2015-10-16 02:49:26 PDT
Yusuke Suzuki
Comment 4 2015-10-16 02:50:58 PDT
Comment on attachment 263254 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=263254&action=review Added comment. > Source/JavaScriptCore/runtime/StringPrototype.cpp:1804 > + return throwTypeError(exec); Because this depends on ICU, I took the conservative implementation; ICU API can raise errors any time. But practically, it does not raise any error here I think.
Build Bot
Comment 5 2015-10-16 03:22:51 PDT
Comment on attachment 263254 [details] Patch Attachment 263254 [details] did not pass mac-ews (mac): Output: http://webkit-queues.webkit.org/results/295074 New failing tests: js/Object-getOwnPropertyNames.html
Build Bot
Comment 6 2015-10-16 03:22:54 PDT
Created attachment 263255 [details] Archive of layout-test-results from ews102 for mac-mavericks The attached test failures were seen while running run-webkit-tests on the mac-ews. Bot: ews102 Port: mac-mavericks Platform: Mac OS X 10.9.5
Build Bot
Comment 7 2015-10-16 03:28:18 PDT
Comment on attachment 263254 [details] Patch Attachment 263254 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://webkit-queues.webkit.org/results/295078 New failing tests: js/Object-getOwnPropertyNames.html
Build Bot
Comment 8 2015-10-16 03:28:20 PDT
Created attachment 263257 [details] Archive of layout-test-results from ews105 for mac-mavericks-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: ews105 Port: mac-mavericks-wk2 Platform: Mac OS X 10.9.5
Yusuke Suzuki
Comment 9 2015-10-16 03:52:15 PDT
Yusuke Suzuki
Comment 10 2015-10-16 04:05:44 PDT
Geoffrey Garen
Comment 11 2015-10-16 10:02:24 PDT
Comment on attachment 263260 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=263260&action=review r=me > Source/JavaScriptCore/runtime/StringPrototype.cpp:1802 > + // The behavior is not specified when normalize is failed. is failed => fails > Source/JavaScriptCore/runtime/StringPrototype.cpp:1803 > + // Now we throw a type erorr since it seems that the contents of the string is invalid. is invalid => are invalid > Source/JavaScriptCore/runtime/StringPrototype.cpp:1815 > + status = U_ZERO_ERROR; > + unorm_normalize(source, sourceLength, form, 0, buffer, normalizedStringLength, &status); > + if (U_FAILURE(status)) > + return throwTypeError(exec); Does normalization ever grow a string? If not, I'd suggest allocating a buffer the size of the existing string, and calling norm_normalize only once. > Source/JavaScriptCore/runtime/StringPrototype.cpp:1830 > + // The argument is not provided or is undefined. // Verify that the argument is provided and is not undefined. > Source/JavaScriptCore/runtime/StringPrototype.cpp:1843 > + if (formString == String(ASCIILiteral("NFC"))) > + form = UNORM_NFC; > + else if (formString == String(ASCIILiteral("NFD"))) > + form = UNORM_NFD; > + else if (formString == String(ASCIILiteral("NFKC"))) > + form = UNORM_NFKC; > + else if (formString == String(ASCIILiteral("NFKD"))) > + form = UNORM_NFKD; This is a lot of StringImpl allocation. Can we use char* to avoid allocation?
Yusuke Suzuki
Comment 12 2015-10-16 11:08:52 PDT
Comment on attachment 263260 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=263260&action=review Thanks! >> Source/JavaScriptCore/runtime/StringPrototype.cpp:1802 >> + // The behavior is not specified when normalize is failed. > > is failed => fails Fixed. >> Source/JavaScriptCore/runtime/StringPrototype.cpp:1803 >> + // Now we throw a type erorr since it seems that the contents of the string is invalid. > > is invalid => are invalid Fixed. >> Source/JavaScriptCore/runtime/StringPrototype.cpp:1815 >> + return throwTypeError(exec); > > Does normalization ever grow a string? If not, I'd suggest allocating a buffer the size of the existing string, and calling norm_normalize only once. Ah, normalization can grow a string. The actual example is included in the test case. For example, in NFKC, "\uFDFA" will be normalized to "\u0635\u0644\u0649\u0020\u0627\u0644\u0644\u0647\u0020\u0639\u0644\u064a\u0647\u0020\u0648\u0633\u0644\u0645" !! >> Source/JavaScriptCore/runtime/StringPrototype.cpp:1830 >> + // The argument is not provided or is undefined. > > // Verify that the argument is provided and is not undefined. Thanks. Fixed. >> Source/JavaScriptCore/runtime/StringPrototype.cpp:1843 >> + form = UNORM_NFKD; > > This is a lot of StringImpl allocation. Can we use char* to avoid allocation? Ah, oops. After seeing ASCIILiteral, WTFString and StringImpl implementation, it seems that `formString == "NFC"` is more efficient :) Fixed.
Yusuke Suzuki
Comment 13 2015-10-16 21:37:57 PDT
Note You need to log in before you can comment on or make changes to this bug.