WebKit Bugzilla
Attachment 343057 Details for
Bug 174816
: [GTK][WPE] Need a function to convert internal URI to display ("pretty") URI
Home
|
New
|
Browse
|
Search
|
[?]
|
Reports
|
Requests
|
Help
|
New Account
|
Log In
Remember
[x]
|
Forgot Password
Login:
[x]
[patch]
Patch
bug-174816-20180619173713.patch (text/plain), 68.21 KB, created by
Ms2ger (he/him; ⌚ UTC+1/+2)
on 2018-06-19 08:37:14 PDT
(
hide
)
Description:
Patch
Filename:
MIME Type:
Creator:
Ms2ger (he/him; ⌚ UTC+1/+2)
Created:
2018-06-19 08:37:14 PDT
Size:
68.21 KB
patch
obsolete
>Subversion Revision: 232959 >diff --git a/Source/WebCore/ChangeLog b/Source/WebCore/ChangeLog >index 2b36c20af17dc041602c3e5c65268615c6cb2b16..73e654457e90fcddc3e6f4b7fc6595c5297065d4 100644 >--- a/Source/WebCore/ChangeLog >+++ b/Source/WebCore/ChangeLog >@@ -1,3 +1,30 @@ >+2018-06-12 Ms2ger <Ms2ger@igalia.com> >+ >+ [GTK][WPE] Add a function to convert internal URL to display ("pretty") URL >+ https://bugs.webkit.org/show_bug.cgi?id=174816 >+ >+ Reviewed by NOBODY (OOPS!). >+ >+ This code is based almost entirely on code by Gabriel Ivascu. >+ >+ No new tests (OOPS!). >+ >+ * platform/URLParser.cpp: >+ (WebCore::isArmenianLookalikeCharacter): code moved without semantic changes. >+ (WebCore::isArmenianScriptCharacter): code moved without semantic changes. >+ (WebCore::isASCIIDigitOrValidHostCharacter): code moved without semantic changes. >+ (WebCore::URLParser::isLookalikeCharacter): code moved without semantic changes. >+ (WebCore::allCharactersInIDNScriptWhiteList): code moved without semantic changes. >+ (WebCore::isSecondLevelDomainNameAllowedByTLDRules): code moved without semantic changes. >+ (WebCore::isRussianDomainNameCharacter): code moved without semantic changes. >+ (WebCore::allCharactersAllowedByTLDRules): code moved without semantic changes. >+ (WebCore::URLParser::ICUConvertHostName): add. >+ * platform/URLParser.h: add signatures. >+ * platform/mac/WebCoreNSURLExtras.mm: >+ (WebCore::loadIDNScriptWhiteList): factor out to allow moving allCharactersInIDNScriptWhiteList() out. >+ (WebCore::mapHostNameWithRange): use new API. >+ (WebCore::createStringWithEscapedUnsafeCharacters): adjust for moved code. >+ > 2018-06-19 David Kilzer <ddkilzer@apple.com> > > Add logging when splashboardd enables WebThread >diff --git a/Source/WebKit/ChangeLog b/Source/WebKit/ChangeLog >index e1693c776926386c5a65f3add17985c86ca4de86..a35fc0ca6fc1dfbf5244ea3ce9fa8406efd50562 100644 >--- a/Source/WebKit/ChangeLog >+++ b/Source/WebKit/ChangeLog >@@ -1,3 +1,25 @@ >+2018-06-12 Ms2ger <Ms2ger@igalia.com> >+ >+ [GTK][WPE] Add a function to convert internal URL to display ("pretty") URL >+ https://bugs.webkit.org/show_bug.cgi?id=174816 >+ >+ Reviewed by NOBODY (OOPS!). >+ >+ This code is based almost entirely on code by Gabriel Ivascu. >+ >+ * PlatformGTK.cmake: >+ * PlatformWPE.cmake: >+ * SourcesGTK.txt: >+ * SourcesWPE.txt: >+ * UIProcess/API/glib/WebKitURIUtilities.cpp: Added. >+ (webkit_uri_for_display): >+ * UIProcess/API/gtk/WebKitURIUtilities.h: Added. >+ * UIProcess/API/gtk/docs/webkit2gtk-4.0-sections.txt: >+ * UIProcess/API/gtk/docs/webkit2gtk-docs.sgml: >+ * UIProcess/API/gtk/webkit2.h: >+ * UIProcess/API/wpe/WebKitURIUtilities.h: Added. >+ * UIProcess/API/wpe/webkit.h: >+ > 2018-06-18 John Wilander <wilander@apple.com> > > Resource Load Statistics: Make sure to call callbacks even if there is no store (test infrastructure) >diff --git a/Source/WebCore/Sources.txt b/Source/WebCore/Sources.txt >index 8b02601cb890da9fc4e1d0b9b7d8c34a590e29f3..724d7b97d97de84d09c895439e3a65379ea5d448 100644 >--- a/Source/WebCore/Sources.txt >+++ b/Source/WebCore/Sources.txt >@@ -1522,6 +1522,7 @@ platform/ThreadGlobalData.cpp > platform/ThreadTimers.cpp > platform/Timer.cpp > platform/URL.cpp >+platform/URLHelpers.cpp > platform/URLParser.cpp > platform/UserActivity.cpp > platform/WebCoreCrossThreadCopier.cpp >diff --git a/Source/WebCore/WebCore.xcodeproj/project.pbxproj b/Source/WebCore/WebCore.xcodeproj/project.pbxproj >index e578f644be65a1a7091485847afb68dd9d9a2535..6e67c141328514dbde37ce5cf8b9d9fedef67d83 100644 >--- a/Source/WebCore/WebCore.xcodeproj/project.pbxproj >+++ b/Source/WebCore/WebCore.xcodeproj/project.pbxproj >@@ -8731,6 +8731,8 @@ > 5EBB89381C77BDA400C65D41 /* PeerMediaDescription.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = PeerMediaDescription.h; sourceTree = "<group>"; }; > 5F2DBBE7178E332D00141486 /* CertificateInfoMac.mm */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.objcpp; path = CertificateInfoMac.mm; sourceTree = "<group>"; }; > 5F2DBBE8178E336900141486 /* CertificateInfo.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = CertificateInfo.h; sourceTree = "<group>"; }; >+ 5FBA60B520D814D2004F8C7D /* URLHelpers.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = URLHelpers.h; sourceTree = "<group>"; }; >+ 5FBA60B620D814D3004F8C7D /* URLHelpers.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = URLHelpers.cpp; sourceTree = "<group>"; }; > 5FC7DC26CFE2563200B85AE5 /* JSEventTarget.h */ = {isa = PBXFileReference; fileEncoding = 30; lastKnownFileType = sourcecode.c.h; path = JSEventTarget.h; sourceTree = "<group>"; }; > 5FE1D291178FD1F3001AA3C3 /* Security.framework */ = {isa = PBXFileReference; lastKnownFileType = wrapper.framework; name = Security.framework; path = System/Library/Frameworks/Security.framework; sourceTree = SDKROOT; }; > 626CDE0C1140424C001E5A68 /* SpatialNavigation.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = SpatialNavigation.cpp; sourceTree = "<group>"; }; >@@ -24328,6 +24330,8 @@ > BCF1A5BA097832090061A123 /* platform */ = { > isa = PBXGroup; > children = ( >+ 5FBA60B620D814D3004F8C7D /* URLHelpers.cpp */, >+ 5FBA60B520D814D2004F8C7D /* URLHelpers.h */, > 49E912A40EFAC8E6009D0CAF /* animation */, > FD31604012B026A300C1A359 /* audio */, > 1AE42F670AA4B8CB00C8612D /* cf */, >diff --git a/Source/WebCore/platform/URLHelpers.cpp b/Source/WebCore/platform/URLHelpers.cpp >new file mode 100644 >index 0000000000000000000000000000000000000000..63c145e26c4ee9c2dbe5500d6cdcdaae8dd8495f >--- /dev/null >+++ b/Source/WebCore/platform/URLHelpers.cpp >@@ -0,0 +1,488 @@ >+/* >+ * Copyright (C) 2018 Metrological Group B.V. >+ * Copyright (C) 2018 Igalia S.L. >+ * >+ * Redistribution and use in source and binary forms, with or without >+ * modification, are permitted provided that the following conditions >+ * are met: >+ * 1. Redistributions of source code must retain the above copyright >+ * notice, this list of conditions and the following disclaimer. >+ * 2. Redistributions in binary form must reproduce the above copyright >+ * notice, this list of conditions and the following disclaimer in the >+ * documentation and/or other materials provided with the distribution. >+ * >+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. AND ITS CONTRIBUTORS ``AS IS'' >+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, >+ * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR >+ * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL APPLE INC. OR ITS CONTRIBUTORS >+ * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR >+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF >+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS >+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN >+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) >+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF >+ * THE POSSIBILITY OF SUCH DAMAGE. >+ */ >+ >+#include "config.h" >+#include "URLHelpers.h" >+ >+#include "URLParser.h" >+#include <unicode/uidna.h> >+#include <wtf/Optional.h> >+ >+namespace WebCore { >+ >+static bool isArmenianLookalikeCharacter(UChar32 codePoint) >+{ >+ return codePoint == 0x0548 || codePoint == 0x054D || codePoint == 0x0578 || codePoint == 0x057D; >+} >+ >+static bool isArmenianScriptCharacter(UChar32 codePoint) >+{ >+ UErrorCode error = U_ZERO_ERROR; >+ UScriptCode script = uscript_getScript(codePoint, &error); >+ if (error != U_ZERO_ERROR) { >+ LOG_ERROR("got ICU error while trying to look at scripts: %d", error); >+ return false; >+ } >+ >+ return script == USCRIPT_ARMENIAN; >+} >+ >+template<typename CharacterType> inline bool isASCIIDigitOrValidHostCharacter(CharacterType charCode) >+{ >+ if (!isASCIIDigitOrPunctuation(charCode)) >+ return false; >+ >+ // Things the URL Parser rejects: >+ switch (charCode) { >+ case '#': >+ case '%': >+ case '/': >+ case ':': >+ case '?': >+ case '@': >+ case '[': >+ case '\\': >+ case ']': >+ return false; >+ default: >+ return true; >+ } >+} >+ >+bool URLHelpers::isLookalikeCharacter(std::optional<UChar32> previousCodePoint, UChar32 charCode) >+{ >+ // This function treats the following as unsafe, lookalike characters: >+ // any non-printable character, any character considered as whitespace, >+ // any ignorable character, and emoji characters related to locks. >+ >+ // We also considered the characters in Mozilla's blacklist <http://kb.mozillazine.org/Network.IDN.blacklist_chars>. >+ >+ // Some of the characters here will never appear once ICU has encoded. >+ // For example, ICU transforms most spaces into an ASCII space and most >+ // slashes into an ASCII solidus. But one of the two callers uses this >+ // on characters that have not been processed by ICU, so they are needed here. >+ >+ if (!u_isprint(charCode) || u_isUWhiteSpace(charCode) || u_hasBinaryProperty(charCode, UCHAR_DEFAULT_IGNORABLE_CODE_POINT)) >+ return true; >+ >+ switch (charCode) { >+ case 0x00BC: /* VULGAR FRACTION ONE QUARTER */ >+ case 0x00BD: /* VULGAR FRACTION ONE HALF */ >+ case 0x00BE: /* VULGAR FRACTION THREE QUARTERS */ >+ case 0x00ED: /* LATIN SMALL LETTER I WITH ACUTE */ >+ case 0x01C3: /* LATIN LETTER RETROFLEX CLICK */ >+ case 0x0251: /* LATIN SMALL LETTER ALPHA */ >+ case 0x0261: /* LATIN SMALL LETTER SCRIPT G */ >+ case 0x02D0: /* MODIFIER LETTER TRIANGULAR COLON */ >+ case 0x0335: /* COMBINING SHORT STROKE OVERLAY */ >+ case 0x0337: /* COMBINING SHORT SOLIDUS OVERLAY */ >+ case 0x0338: /* COMBINING LONG SOLIDUS OVERLAY */ >+ case 0x0589: /* ARMENIAN FULL STOP */ >+ case 0x05B4: /* HEBREW POINT HIRIQ */ >+ case 0x05BC: /* HEBREW POINT DAGESH OR MAPIQ */ >+ case 0x05C3: /* HEBREW PUNCTUATION SOF PASUQ */ >+ case 0x05F4: /* HEBREW PUNCTUATION GERSHAYIM */ >+ case 0x0609: /* ARABIC-INDIC PER MILLE SIGN */ >+ case 0x060A: /* ARABIC-INDIC PER TEN THOUSAND SIGN */ >+ case 0x0650: /* ARABIC KASRA */ >+ case 0x0660: /* ARABIC INDIC DIGIT ZERO */ >+ case 0x066A: /* ARABIC PERCENT SIGN */ >+ case 0x06D4: /* ARABIC FULL STOP */ >+ case 0x06F0: /* EXTENDED ARABIC INDIC DIGIT ZERO */ >+ case 0x0701: /* SYRIAC SUPRALINEAR FULL STOP */ >+ case 0x0702: /* SYRIAC SUBLINEAR FULL STOP */ >+ case 0x0703: /* SYRIAC SUPRALINEAR COLON */ >+ case 0x0704: /* SYRIAC SUBLINEAR COLON */ >+ case 0x1735: /* PHILIPPINE SINGLE PUNCTUATION */ >+ case 0x1D04: /* LATIN LETTER SMALL CAPITAL C */ >+ case 0x1D0F: /* LATIN LETTER SMALL CAPITAL O */ >+ case 0x1D1C: /* LATIN LETTER SMALL CAPITAL U */ >+ case 0x1D20: /* LATIN LETTER SMALL CAPITAL V */ >+ case 0x1D21: /* LATIN LETTER SMALL CAPITAL W */ >+ case 0x1D22: /* LATIN LETTER SMALL CAPITAL Z */ >+ case 0x1ECD: /* LATIN SMALL LETTER O WITH DOT BELOW */ >+ case 0x2010: /* HYPHEN */ >+ case 0x2011: /* NON-BREAKING HYPHEN */ >+ case 0x2024: /* ONE DOT LEADER */ >+ case 0x2027: /* HYPHENATION POINT */ >+ case 0x2039: /* SINGLE LEFT-POINTING ANGLE QUOTATION MARK */ >+ case 0x203A: /* SINGLE RIGHT-POINTING ANGLE QUOTATION MARK */ >+ case 0x2041: /* CARET INSERTION POINT */ >+ case 0x2044: /* FRACTION SLASH */ >+ case 0x2052: /* COMMERCIAL MINUS SIGN */ >+ case 0x2153: /* VULGAR FRACTION ONE THIRD */ >+ case 0x2154: /* VULGAR FRACTION TWO THIRDS */ >+ case 0x2155: /* VULGAR FRACTION ONE FIFTH */ >+ case 0x2156: /* VULGAR FRACTION TWO FIFTHS */ >+ case 0x2157: /* VULGAR FRACTION THREE FIFTHS */ >+ case 0x2158: /* VULGAR FRACTION FOUR FIFTHS */ >+ case 0x2159: /* VULGAR FRACTION ONE SIXTH */ >+ case 0x215A: /* VULGAR FRACTION FIVE SIXTHS */ >+ case 0x215B: /* VULGAR FRACTION ONE EIGHT */ >+ case 0x215C: /* VULGAR FRACTION THREE EIGHTHS */ >+ case 0x215D: /* VULGAR FRACTION FIVE EIGHTHS */ >+ case 0x215E: /* VULGAR FRACTION SEVEN EIGHTHS */ >+ case 0x215F: /* FRACTION NUMERATOR ONE */ >+ case 0x2212: /* MINUS SIGN */ >+ case 0x2215: /* DIVISION SLASH */ >+ case 0x2216: /* SET MINUS */ >+ case 0x2236: /* RATIO */ >+ case 0x233F: /* APL FUNCTIONAL SYMBOL SLASH BAR */ >+ case 0x23AE: /* INTEGRAL EXTENSION */ >+ case 0x244A: /* OCR DOUBLE BACKSLASH */ >+ case 0x2571: /* DisplayType::Box DRAWINGS LIGHT DIAGONAL UPPER RIGHT TO LOWER LEFT */ >+ case 0x2572: /* DisplayType::Box DRAWINGS LIGHT DIAGONAL UPPER LEFT TO LOWER RIGHT */ >+ case 0x29F6: /* SOLIDUS WITH OVERBAR */ >+ case 0x29F8: /* BIG SOLIDUS */ >+ case 0x2AFB: /* TRIPLE SOLIDUS BINARY RELATION */ >+ case 0x2AFD: /* DOUBLE SOLIDUS OPERATOR */ >+ case 0x2FF0: /* IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO RIGHT */ >+ case 0x2FF1: /* IDEOGRAPHIC DESCRIPTION CHARACTER ABOVE TO BELOW */ >+ case 0x2FF2: /* IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO MIDDLE AND RIGHT */ >+ case 0x2FF3: /* IDEOGRAPHIC DESCRIPTION CHARACTER ABOVE TO MIDDLE AND BELOW */ >+ case 0x2FF4: /* IDEOGRAPHIC DESCRIPTION CHARACTER FULL SURROUND */ >+ case 0x2FF5: /* IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM ABOVE */ >+ case 0x2FF6: /* IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM BELOW */ >+ case 0x2FF7: /* IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM LEFT */ >+ case 0x2FF8: /* IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM UPPER LEFT */ >+ case 0x2FF9: /* IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM UPPER RIGHT */ >+ case 0x2FFA: /* IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM LOWER LEFT */ >+ case 0x2FFB: /* IDEOGRAPHIC DESCRIPTION CHARACTER OVERLAID */ >+ case 0x3002: /* IDEOGRAPHIC FULL STOP */ >+ case 0x3008: /* LEFT ANGLE BRACKET */ >+ case 0x3014: /* LEFT TORTOISE SHELL BRACKET */ >+ case 0x3015: /* RIGHT TORTOISE SHELL BRACKET */ >+ case 0x3033: /* VERTICAL KANA REPEAT MARK UPPER HALF */ >+ case 0x3035: /* VERTICAL KANA REPEAT MARK LOWER HALF */ >+ case 0x321D: /* PARENTHESIZED KOREAN CHARACTER OJEON */ >+ case 0x321E: /* PARENTHESIZED KOREAN CHARACTER O HU */ >+ case 0x33AE: /* SQUARE RAD OVER S */ >+ case 0x33AF: /* SQUARE RAD OVER S SQUARED */ >+ case 0x33C6: /* SQUARE C OVER KG */ >+ case 0x33DF: /* SQUARE A OVER M */ >+ case 0x05B9: /* HEBREW POINT HOLAM */ >+ case 0x05BA: /* HEBREW POINT HOLAM HASER FOR VAV */ >+ case 0x05C1: /* HEBREW POINT SHIN DOT */ >+ case 0x05C2: /* HEBREW POINT SIN DOT */ >+ case 0x05C4: /* HEBREW MARK UPPER DOT */ >+ case 0xA731: /* LATIN LETTER SMALL CAPITAL S */ >+ case 0xA771: /* LATIN SMALL LETTER DUM */ >+ case 0xA789: /* MODIFIER LETTER COLON */ >+ case 0xFE14: /* PRESENTATION FORM FOR VERTICAL SEMICOLON */ >+ case 0xFE15: /* PRESENTATION FORM FOR VERTICAL EXCLAMATION MARK */ >+ case 0xFE3F: /* PRESENTATION FORM FOR VERTICAL LEFT ANGLE BRACKET */ >+ case 0xFE5D: /* SMALL LEFT TORTOISE SHELL BRACKET */ >+ case 0xFE5E: /* SMALL RIGHT TORTOISE SHELL BRACKET */ >+ case 0xFF0E: /* FULLWIDTH FULL STOP */ >+ case 0xFF0F: /* FULL WIDTH SOLIDUS */ >+ case 0xFF61: /* HALFWIDTH IDEOGRAPHIC FULL STOP */ >+ case 0xFFFC: /* OBJECT REPLACEMENT CHARACTER */ >+ case 0xFFFD: /* REPLACEMENT CHARACTER */ >+ case 0x1F50F: /* LOCK WITH INK PEN */ >+ case 0x1F510: /* CLOSED LOCK WITH KEY */ >+ case 0x1F511: /* KEY */ >+ case 0x1F512: /* LOCK */ >+ case 0x1F513: /* OPEN LOCK */ >+ return true; >+ case 0x0307: /* COMBINING DOT ABOVE */ >+ return previousCodePoint == 0x0237 /* LATIN SMALL LETTER DOTLESS J */ >+ || previousCodePoint == 0x0131 /* LATIN SMALL LETTER DOTLESS I */ >+ || previousCodePoint == 0x05D5; /* HEBREW LETTER VAV */ >+ case 0x0548: /* ARMENIAN CAPITAL LETTER VO */ >+ case 0x054D: /* ARMENIAN CAPITAL LETTER SEH */ >+ case 0x0578: /* ARMENIAN SMALL LETTER VO */ >+ case 0x057D: /* ARMENIAN SMALL LETTER SEH */ >+ return previousCodePoint >+ && !isASCIIDigitOrValidHostCharacter(previousCodePoint.value()) >+ && !isArmenianScriptCharacter(previousCodePoint.value()); >+ case '.': >+ return false; >+ default: >+ return previousCodePoint >+ && isArmenianLookalikeCharacter(previousCodePoint.value()) >+ && !(isArmenianScriptCharacter(charCode) || isASCIIDigitOrValidHostCharacter(charCode)); >+ } >+} >+ >+static bool allCharactersInIDNScriptWhiteList(const UChar *buffer, int32_t length, const uint32_t (&IDNScriptWhiteList)[(USCRIPT_CODE_LIMIT + 31) / 32]) >+{ >+ int32_t i = 0; >+ std::optional<UChar32> previousCodePoint; >+ while (i < length) { >+ UChar32 c; >+ U16_NEXT(buffer, i, length, c) >+ UErrorCode error = U_ZERO_ERROR; >+ UScriptCode script = uscript_getScript(c, &error); >+ if (error != U_ZERO_ERROR) { >+ LOG_ERROR("got ICU error while trying to look at scripts: %d", error); >+ return false; >+ } >+ if (script < 0) { >+ LOG_ERROR("got negative number for script code from ICU: %d", script); >+ return false; >+ } >+ if (script >= USCRIPT_CODE_LIMIT) >+ return false; >+ >+ size_t index = script / 32; >+ uint32_t mask = 1 << (script % 32); >+ if (!(IDNScriptWhiteList[index] & mask)) >+ return false; >+ >+ if (URLHelpers::isLookalikeCharacter(previousCodePoint, c)) >+ return false; >+ previousCodePoint = c; >+ } >+ return true; >+} >+ >+static bool isSecondLevelDomainNameAllowedByTLDRules(const UChar* buffer, int32_t length, const WTF::Function<bool(UChar)>& characterIsAllowed) >+{ >+ ASSERT(length > 0); >+ >+ for (int32_t i = length - 1; i >= 0; --i) { >+ UChar ch = buffer[i]; >+ >+ if (characterIsAllowed(ch)) >+ continue; >+ >+ // Only check the second level domain. Lower level registrars may have different rules. >+ if (ch == '.') >+ break; >+ >+ return false; >+ } >+ return true; >+} >+ >+#define CHECK_RULES_IF_SUFFIX_MATCHES(suffix, function) \ >+ { \ >+ static const int32_t suffixLength = sizeof(suffix) / sizeof(suffix[0]); \ >+ if (length > suffixLength && !memcmp(buffer + length - suffixLength, suffix, sizeof(suffix))) \ >+ return isSecondLevelDomainNameAllowedByTLDRules(buffer, length - suffixLength, function); \ >+ } >+ >+static bool isRussianDomainNameCharacter(UChar ch) >+{ >+ // Only modern Russian letters, digits and dashes are allowed. >+ return (ch >= 0x0430 && ch <= 0x044f) || ch == 0x0451 || isASCIIDigit(ch) || ch == '-'; >+} >+ >+static bool allCharactersAllowedByTLDRules(const UChar* buffer, int32_t length) >+{ >+ // Skip trailing dot for root domain. >+ if (buffer[length - 1] == '.') >+ length--; >+ >+ // http://cctld.ru/files/pdf/docs/rules_ru-rf.pdf >+ static const UChar cyrillicRF[] = { >+ '.', >+ 0x0440, // CYRILLIC SMALL LETTER ER >+ 0x0444 // CYRILLIC SMALL LETTER EF >+ }; >+ CHECK_RULES_IF_SUFFIX_MATCHES(cyrillicRF, isRussianDomainNameCharacter); >+ >+ // http://rusnames.ru/rules.pl >+ static const UChar cyrillicRUS[] = { >+ '.', >+ 0x0440, // CYRILLIC SMALL LETTER ER >+ 0x0443, // CYRILLIC SMALL LETTER U >+ 0x0441 // CYRILLIC SMALL LETTER ES >+ }; >+ CHECK_RULES_IF_SUFFIX_MATCHES(cyrillicRUS, isRussianDomainNameCharacter); >+ >+ // http://ru.faitid.org/projects/moscow/documents/moskva/idn >+ static const UChar cyrillicMOSKVA[] = { >+ '.', >+ 0x043C, // CYRILLIC SMALL LETTER EM >+ 0x043E, // CYRILLIC SMALL LETTER O >+ 0x0441, // CYRILLIC SMALL LETTER ES >+ 0x043A, // CYRILLIC SMALL LETTER KA >+ 0x0432, // CYRILLIC SMALL LETTER VE >+ 0x0430 // CYRILLIC SMALL LETTER A >+ }; >+ CHECK_RULES_IF_SUFFIX_MATCHES(cyrillicMOSKVA, isRussianDomainNameCharacter); >+ >+ // http://www.dotdeti.ru/foruser/docs/regrules.php >+ static const UChar cyrillicDETI[] = { >+ '.', >+ 0x0434, // CYRILLIC SMALL LETTER DE >+ 0x0435, // CYRILLIC SMALL LETTER IE >+ 0x0442, // CYRILLIC SMALL LETTER TE >+ 0x0438 // CYRILLIC SMALL LETTER I >+ }; >+ CHECK_RULES_IF_SUFFIX_MATCHES(cyrillicDETI, isRussianDomainNameCharacter); >+ >+ // http://corenic.org - rules not published. The word is Russian, so only allowing Russian at this time, >+ // although we may need to revise the checks if this ends up being used with other languages spoken in Russia. >+ static const UChar cyrillicONLAYN[] = { >+ '.', >+ 0x043E, // CYRILLIC SMALL LETTER O >+ 0x043D, // CYRILLIC SMALL LETTER EN >+ 0x043B, // CYRILLIC SMALL LETTER EL >+ 0x0430, // CYRILLIC SMALL LETTER A >+ 0x0439, // CYRILLIC SMALL LETTER SHORT I >+ 0x043D // CYRILLIC SMALL LETTER EN >+ }; >+ CHECK_RULES_IF_SUFFIX_MATCHES(cyrillicONLAYN, isRussianDomainNameCharacter); >+ >+ // http://corenic.org - same as above. >+ static const UChar cyrillicSAYT[] = { >+ '.', >+ 0x0441, // CYRILLIC SMALL LETTER ES >+ 0x0430, // CYRILLIC SMALL LETTER A >+ 0x0439, // CYRILLIC SMALL LETTER SHORT I >+ 0x0442 // CYRILLIC SMALL LETTER TE >+ }; >+ CHECK_RULES_IF_SUFFIX_MATCHES(cyrillicSAYT, isRussianDomainNameCharacter); >+ >+ // http://pir.org/products/opr-domain/ - rules not published. According to the registry site, >+ // the intended audience is "Russian and other Slavic-speaking markets". >+ // Chrome appears to only allow Russian, so sticking with that for now. >+ static const UChar cyrillicORG[] = { >+ '.', >+ 0x043E, // CYRILLIC SMALL LETTER O >+ 0x0440, // CYRILLIC SMALL LETTER ER >+ 0x0433 // CYRILLIC SMALL LETTER GHE >+ }; >+ CHECK_RULES_IF_SUFFIX_MATCHES(cyrillicORG, isRussianDomainNameCharacter); >+ >+ // http://cctld.by/rules.html >+ static const UChar cyrillicBEL[] = { >+ '.', >+ 0x0431, // CYRILLIC SMALL LETTER BE >+ 0x0435, // CYRILLIC SMALL LETTER IE >+ 0x043B // CYRILLIC SMALL LETTER EL >+ }; >+ CHECK_RULES_IF_SUFFIX_MATCHES(cyrillicBEL, [](UChar ch) { >+ // Russian and Byelorussian letters, digits and dashes are allowed. >+ return (ch >= 0x0430 && ch <= 0x044f) || ch == 0x0451 || ch == 0x0456 || ch == 0x045E || ch == 0x2019 || isASCIIDigit(ch) || ch == '-'; >+ }); >+ >+ // http://www.nic.kz/docs/poryadok_vnedreniya_kaz_ru.pdf >+ static const UChar cyrillicKAZ[] = { >+ '.', >+ 0x049B, // CYRILLIC SMALL LETTER KA WITH DESCENDER >+ 0x0430, // CYRILLIC SMALL LETTER A >+ 0x0437 // CYRILLIC SMALL LETTER ZE >+ }; >+ CHECK_RULES_IF_SUFFIX_MATCHES(cyrillicKAZ, [](UChar ch) { >+ // Kazakh letters, digits and dashes are allowed. >+ return (ch >= 0x0430 && ch <= 0x044f) || ch == 0x0451 || ch == 0x04D9 || ch == 0x0493 || ch == 0x049B || ch == 0x04A3 || ch == 0x04E9 || ch == 0x04B1 || ch == 0x04AF || ch == 0x04BB || ch == 0x0456 || isASCIIDigit(ch) || ch == '-'; >+ }); >+ >+ // http://uanic.net/docs/documents-ukr/Rules%20of%20UKR_v4.0.pdf >+ static const UChar cyrillicUKR[] = { >+ '.', >+ 0x0443, // CYRILLIC SMALL LETTER U >+ 0x043A, // CYRILLIC SMALL LETTER KA >+ 0x0440 // CYRILLIC SMALL LETTER ER >+ }; >+ CHECK_RULES_IF_SUFFIX_MATCHES(cyrillicUKR, [](UChar ch) { >+ // Russian and Ukrainian letters, digits and dashes are allowed. >+ return (ch >= 0x0430 && ch <= 0x044f) || ch == 0x0451 || ch == 0x0491 || ch == 0x0404 || ch == 0x0456 || ch == 0x0457 || isASCIIDigit(ch) || ch == '-'; >+ }); >+ >+ // http://www.rnids.rs/data/DOKUMENTI/idn-srb-policy-termsofuse-v1.4-eng.pdf >+ static const UChar cyrillicSRB[] = { >+ '.', >+ 0x0441, // CYRILLIC SMALL LETTER ES >+ 0x0440, // CYRILLIC SMALL LETTER ER >+ 0x0431 // CYRILLIC SMALL LETTER BE >+ }; >+ CHECK_RULES_IF_SUFFIX_MATCHES(cyrillicSRB, [](UChar ch) { >+ // Serbian letters, digits and dashes are allowed. >+ return (ch >= 0x0430 && ch <= 0x0438) || (ch >= 0x043A && ch <= 0x0448) || ch == 0x0452 || ch == 0x0458 || ch == 0x0459 || ch == 0x045A || ch == 0x045B || ch == 0x045F || isASCIIDigit(ch) || ch == '-'; >+ }); >+ >+ // http://marnet.mk/doc/pravilnik-mk-mkd.pdf >+ static const UChar cyrillicMKD[] = { >+ '.', >+ 0x043C, // CYRILLIC SMALL LETTER EM >+ 0x043A, // CYRILLIC SMALL LETTER KA >+ 0x0434 // CYRILLIC SMALL LETTER DE >+ }; >+ CHECK_RULES_IF_SUFFIX_MATCHES(cyrillicMKD, [](UChar ch) { >+ // Macedonian letters, digits and dashes are allowed. >+ return (ch >= 0x0430 && ch <= 0x0438) || (ch >= 0x043A && ch <= 0x0448) || ch == 0x0453 || ch == 0x0455 || ch == 0x0458 || ch == 0x0459 || ch == 0x045A || ch == 0x045C || ch == 0x045F || isASCIIDigit(ch) || ch == '-'; >+ }); >+ >+ // https://www.mon.mn/cs/ >+ static const UChar cyrillicMON[] = { >+ '.', >+ 0x043C, // CYRILLIC SMALL LETTER EM >+ 0x043E, // CYRILLIC SMALL LETTER O >+ 0x043D // CYRILLIC SMALL LETTER EN >+ }; >+ CHECK_RULES_IF_SUFFIX_MATCHES(cyrillicMON, [](UChar ch) { >+ // Mongolian letters, digits and dashes are allowed. >+ return (ch >= 0x0430 && ch <= 0x044f) || ch == 0x0451 || ch == 0x04E9 || ch == 0x04AF || isASCIIDigit(ch) || ch == '-'; >+ }); >+ >+ // Not a known top level domain with special rules. >+ return false; >+} >+ >+String URLHelpers::decodePunycode(const String& hostName, bool encode, const uint32_t (&IDNScriptWhiteList)[(USCRIPT_CODE_LIMIT + 31) / 32], bool* error) >+{ >+ if (hostName.isNull() || hostName.isEmpty()) >+ return String(); >+ >+ // Needs to be big enough to hold an IDN-encoded name. >+ // For host names bigger than this, we won't do IDN encoding, which is almost certainly OK. >+ static const int32_t kHostNameBufferLength = 2048; >+ int32_t length = static_cast<int32_t>(hostName.length()); >+ if (length > kHostNameBufferLength) >+ return String(); >+ >+ String bufferFor16BitData; >+ const UChar* inputBuffer; >+ if (LIKELY(hostName.is8Bit())) { >+ bufferFor16BitData = String::make16BitFrom8BitSource(hostName.characters8(), hostName.length()); >+ inputBuffer = bufferFor16BitData.characters16(); >+ } else >+ inputBuffer = hostName.characters16(); >+ >+ UChar outputBuffer[kHostNameBufferLength]; >+ UErrorCode uerror = U_ZERO_ERROR; >+ UIDNAInfo uinfo = UIDNA_INFO_INITIALIZER; >+ int32_t numCharactersConverted = (encode ? uidna_nameToASCII : uidna_nameToUnicode)(&URLParser::internationalDomainNameTranscoder(), inputBuffer, length, outputBuffer, kHostNameBufferLength, &uinfo, &uerror); >+ if (U_FAILURE(uerror) || uinfo.errors) { >+ *error = true; >+ return String(); >+ } >+ >+ if (numCharactersConverted == length && !memcmp(inputBuffer, outputBuffer, length * sizeof(UChar))) >+ return String(); >+ >+ // Decoding needs additional checks. >+ if (!encode && !allCharactersInIDNScriptWhiteList(outputBuffer, numCharactersConverted, IDNScriptWhiteList) && !allCharactersAllowedByTLDRules(outputBuffer, numCharactersConverted)) >+ return String(); >+ >+ return String(outputBuffer, numCharactersConverted); >+} >+ >+} // namespace WebCore >diff --git a/Source/WebCore/platform/URLHelpers.h b/Source/WebCore/platform/URLHelpers.h >new file mode 100644 >index 0000000000000000000000000000000000000000..7fc91f08e4cdedf9d466571e9643fcd7fdae8941 >--- /dev/null >+++ b/Source/WebCore/platform/URLHelpers.h >@@ -0,0 +1,46 @@ >+/* >+ * Copyright (C) 2018 Metrological Group B.V. >+ * Copyright (C) 2018 Igalia S.L. >+ * >+ * Redistribution and use in source and binary forms, with or without >+ * modification, are permitted provided that the following conditions >+ * are met: >+ * 1. Redistributions of source code must retain the above copyright >+ * notice, this list of conditions and the following disclaimer. >+ * 2. Redistributions in binary form must reproduce the above copyright >+ * notice, this list of conditions and the following disclaimer in the >+ * documentation and/or other materials provided with the distribution. >+ * >+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. AND ITS CONTRIBUTORS ``AS IS'' >+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, >+ * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR >+ * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL APPLE INC. OR ITS CONTRIBUTORS >+ * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR >+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF >+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS >+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN >+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) >+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF >+ * THE POSSIBILITY OF SUCH DAMAGE. >+ */ >+ >+#pragma once >+ >+#include <unicode/uscript.h> >+#include <wtf/Forward.h> >+ >+namespace std { >+template<typename> class optional; >+} >+ >+namespace WebCore { >+ >+using ICUConvertHostnameWhitelist = uint32_t[(USCRIPT_CODE_LIMIT + 31) / 32]; >+ >+class URLHelpers final { >+public: >+ static bool isLookalikeCharacter(std::optional<UChar32>, UChar32); >+ static String decodePunycode(const String& hostName, bool encode, const ICUConvertHostnameWhitelist& IDNScriptWhiteList, bool* error); >+}; >+ >+} // namespace WebCore >diff --git a/Source/WebCore/platform/mac/WebCoreNSURLExtras.mm b/Source/WebCore/platform/mac/WebCoreNSURLExtras.mm >index bd777f0fcb9d59b5d2a7d281a129039043cc8e96..8cea23908c99ec0643e94b9d5f524ef31c8f3dfe 100644 >--- a/Source/WebCore/platform/mac/WebCoreNSURLExtras.mm >+++ b/Source/WebCore/platform/mac/WebCoreNSURLExtras.mm >@@ -27,8 +27,10 @@ > */ > > #import "config.h" >-#import "URLParser.h" > #import "WebCoreNSURLExtras.h" >+ >+#import "URL.h" >+#import "URLHelpers.h" > #import <wtf/Function.h> > #import <wtf/HexNumber.h> > #import <wtf/ObjcRuntimeExtras.h> >@@ -54,203 +56,6 @@ static uint32_t IDNScriptWhiteList[(USCRIPT_CODE_LIMIT + 31) / 32]; > > namespace WebCore { > >-static bool isArmenianLookalikeCharacter(UChar32 codePoint) >-{ >- return codePoint == 0x0548 || codePoint == 0x054D || codePoint == 0x0578 || codePoint == 0x057D; >-} >- >-static bool isArmenianScriptCharacter(UChar32 codePoint) >-{ >- UErrorCode error = U_ZERO_ERROR; >- UScriptCode script = uscript_getScript(codePoint, &error); >- if (error != U_ZERO_ERROR) { >- LOG_ERROR("got ICU error while trying to look at scripts: %d", error); >- return false; >- } >- >- return script == USCRIPT_ARMENIAN; >-} >- >- >-template<typename CharacterType> inline bool isASCIIDigitOrValidHostCharacter(CharacterType charCode) >-{ >- if (!isASCIIDigitOrPunctuation(charCode)) >- return false; >- >- // Things the URL Parser rejects: >- switch (charCode) { >- case '#': >- case '%': >- case '/': >- case ':': >- case '?': >- case '@': >- case '[': >- case '\\': >- case ']': >- return false; >- default: >- return true; >- } >-} >- >- >- >-static BOOL isLookalikeCharacter(std::optional<UChar32> previousCodePoint, UChar32 charCode) >-{ >- // This function treats the following as unsafe, lookalike characters: >- // any non-printable character, any character considered as whitespace, >- // any ignorable character, and emoji characters related to locks. >- >- // We also considered the characters in Mozilla's blacklist <http://kb.mozillazine.org/Network.IDN.blacklist_chars>. >- >- // Some of the characters here will never appear once ICU has encoded. >- // For example, ICU transforms most spaces into an ASCII space and most >- // slashes into an ASCII solidus. But one of the two callers uses this >- // on characters that have not been processed by ICU, so they are needed here. >- >- if (!u_isprint(charCode) || u_isUWhiteSpace(charCode) || u_hasBinaryProperty(charCode, UCHAR_DEFAULT_IGNORABLE_CODE_POINT)) >- return YES; >- >- switch (charCode) { >- case 0x00BC: /* VULGAR FRACTION ONE QUARTER */ >- case 0x00BD: /* VULGAR FRACTION ONE HALF */ >- case 0x00BE: /* VULGAR FRACTION THREE QUARTERS */ >- case 0x00ED: /* LATIN SMALL LETTER I WITH ACUTE */ >- case 0x01C3: /* LATIN LETTER RETROFLEX CLICK */ >- case 0x0251: /* LATIN SMALL LETTER ALPHA */ >- case 0x0261: /* LATIN SMALL LETTER SCRIPT G */ >- case 0x02D0: /* MODIFIER LETTER TRIANGULAR COLON */ >- case 0x0335: /* COMBINING SHORT STROKE OVERLAY */ >- case 0x0337: /* COMBINING SHORT SOLIDUS OVERLAY */ >- case 0x0338: /* COMBINING LONG SOLIDUS OVERLAY */ >- case 0x0589: /* ARMENIAN FULL STOP */ >- case 0x05B4: /* HEBREW POINT HIRIQ */ >- case 0x05BC: /* HEBREW POINT DAGESH OR MAPIQ */ >- case 0x05C3: /* HEBREW PUNCTUATION SOF PASUQ */ >- case 0x05F4: /* HEBREW PUNCTUATION GERSHAYIM */ >- case 0x0609: /* ARABIC-INDIC PER MILLE SIGN */ >- case 0x060A: /* ARABIC-INDIC PER TEN THOUSAND SIGN */ >- case 0x0650: /* ARABIC KASRA */ >- case 0x0660: /* ARABIC INDIC DIGIT ZERO */ >- case 0x066A: /* ARABIC PERCENT SIGN */ >- case 0x06D4: /* ARABIC FULL STOP */ >- case 0x06F0: /* EXTENDED ARABIC INDIC DIGIT ZERO */ >- case 0x0701: /* SYRIAC SUPRALINEAR FULL STOP */ >- case 0x0702: /* SYRIAC SUBLINEAR FULL STOP */ >- case 0x0703: /* SYRIAC SUPRALINEAR COLON */ >- case 0x0704: /* SYRIAC SUBLINEAR COLON */ >- case 0x1735: /* PHILIPPINE SINGLE PUNCTUATION */ >- case 0x1D04: /* LATIN LETTER SMALL CAPITAL C */ >- case 0x1D0F: /* LATIN LETTER SMALL CAPITAL O */ >- case 0x1D1C: /* LATIN LETTER SMALL CAPITAL U */ >- case 0x1D20: /* LATIN LETTER SMALL CAPITAL V */ >- case 0x1D21: /* LATIN LETTER SMALL CAPITAL W */ >- case 0x1D22: /* LATIN LETTER SMALL CAPITAL Z */ >- case 0x1ECD: /* LATIN SMALL LETTER O WITH DOT BELOW */ >- case 0x2010: /* HYPHEN */ >- case 0x2011: /* NON-BREAKING HYPHEN */ >- case 0x2024: /* ONE DOT LEADER */ >- case 0x2027: /* HYPHENATION POINT */ >- case 0x2039: /* SINGLE LEFT-POINTING ANGLE QUOTATION MARK */ >- case 0x203A: /* SINGLE RIGHT-POINTING ANGLE QUOTATION MARK */ >- case 0x2041: /* CARET INSERTION POINT */ >- case 0x2044: /* FRACTION SLASH */ >- case 0x2052: /* COMMERCIAL MINUS SIGN */ >- case 0x2153: /* VULGAR FRACTION ONE THIRD */ >- case 0x2154: /* VULGAR FRACTION TWO THIRDS */ >- case 0x2155: /* VULGAR FRACTION ONE FIFTH */ >- case 0x2156: /* VULGAR FRACTION TWO FIFTHS */ >- case 0x2157: /* VULGAR FRACTION THREE FIFTHS */ >- case 0x2158: /* VULGAR FRACTION FOUR FIFTHS */ >- case 0x2159: /* VULGAR FRACTION ONE SIXTH */ >- case 0x215A: /* VULGAR FRACTION FIVE SIXTHS */ >- case 0x215B: /* VULGAR FRACTION ONE EIGHT */ >- case 0x215C: /* VULGAR FRACTION THREE EIGHTHS */ >- case 0x215D: /* VULGAR FRACTION FIVE EIGHTHS */ >- case 0x215E: /* VULGAR FRACTION SEVEN EIGHTHS */ >- case 0x215F: /* FRACTION NUMERATOR ONE */ >- case 0x2212: /* MINUS SIGN */ >- case 0x2215: /* DIVISION SLASH */ >- case 0x2216: /* SET MINUS */ >- case 0x2236: /* RATIO */ >- case 0x233F: /* APL FUNCTIONAL SYMBOL SLASH BAR */ >- case 0x23AE: /* INTEGRAL EXTENSION */ >- case 0x244A: /* OCR DOUBLE BACKSLASH */ >- case 0x2571: /* DisplayType::Box DRAWINGS LIGHT DIAGONAL UPPER RIGHT TO LOWER LEFT */ >- case 0x2572: /* DisplayType::Box DRAWINGS LIGHT DIAGONAL UPPER LEFT TO LOWER RIGHT */ >- case 0x29F6: /* SOLIDUS WITH OVERBAR */ >- case 0x29F8: /* BIG SOLIDUS */ >- case 0x2AFB: /* TRIPLE SOLIDUS BINARY RELATION */ >- case 0x2AFD: /* DOUBLE SOLIDUS OPERATOR */ >- case 0x2FF0: /* IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO RIGHT */ >- case 0x2FF1: /* IDEOGRAPHIC DESCRIPTION CHARACTER ABOVE TO BELOW */ >- case 0x2FF2: /* IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO MIDDLE AND RIGHT */ >- case 0x2FF3: /* IDEOGRAPHIC DESCRIPTION CHARACTER ABOVE TO MIDDLE AND BELOW */ >- case 0x2FF4: /* IDEOGRAPHIC DESCRIPTION CHARACTER FULL SURROUND */ >- case 0x2FF5: /* IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM ABOVE */ >- case 0x2FF6: /* IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM BELOW */ >- case 0x2FF7: /* IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM LEFT */ >- case 0x2FF8: /* IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM UPPER LEFT */ >- case 0x2FF9: /* IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM UPPER RIGHT */ >- case 0x2FFA: /* IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM LOWER LEFT */ >- case 0x2FFB: /* IDEOGRAPHIC DESCRIPTION CHARACTER OVERLAID */ >- case 0x3002: /* IDEOGRAPHIC FULL STOP */ >- case 0x3008: /* LEFT ANGLE BRACKET */ >- case 0x3014: /* LEFT TORTOISE SHELL BRACKET */ >- case 0x3015: /* RIGHT TORTOISE SHELL BRACKET */ >- case 0x3033: /* VERTICAL KANA REPEAT MARK UPPER HALF */ >- case 0x3035: /* VERTICAL KANA REPEAT MARK LOWER HALF */ >- case 0x321D: /* PARENTHESIZED KOREAN CHARACTER OJEON */ >- case 0x321E: /* PARENTHESIZED KOREAN CHARACTER O HU */ >- case 0x33AE: /* SQUARE RAD OVER S */ >- case 0x33AF: /* SQUARE RAD OVER S SQUARED */ >- case 0x33C6: /* SQUARE C OVER KG */ >- case 0x33DF: /* SQUARE A OVER M */ >- case 0x05B9: /* HEBREW POINT HOLAM */ >- case 0x05BA: /* HEBREW POINT HOLAM HASER FOR VAV */ >- case 0x05C1: /* HEBREW POINT SHIN DOT */ >- case 0x05C2: /* HEBREW POINT SIN DOT */ >- case 0x05C4: /* HEBREW MARK UPPER DOT */ >- case 0xA731: /* LATIN LETTER SMALL CAPITAL S */ >- case 0xA771: /* LATIN SMALL LETTER DUM */ >- case 0xA789: /* MODIFIER LETTER COLON */ >- case 0xFE14: /* PRESENTATION FORM FOR VERTICAL SEMICOLON */ >- case 0xFE15: /* PRESENTATION FORM FOR VERTICAL EXCLAMATION MARK */ >- case 0xFE3F: /* PRESENTATION FORM FOR VERTICAL LEFT ANGLE BRACKET */ >- case 0xFE5D: /* SMALL LEFT TORTOISE SHELL BRACKET */ >- case 0xFE5E: /* SMALL RIGHT TORTOISE SHELL BRACKET */ >- case 0xFF0E: /* FULLWIDTH FULL STOP */ >- case 0xFF0F: /* FULL WIDTH SOLIDUS */ >- case 0xFF61: /* HALFWIDTH IDEOGRAPHIC FULL STOP */ >- case 0xFFFC: /* OBJECT REPLACEMENT CHARACTER */ >- case 0xFFFD: /* REPLACEMENT CHARACTER */ >- case 0x1F50F: /* LOCK WITH INK PEN */ >- case 0x1F510: /* CLOSED LOCK WITH KEY */ >- case 0x1F511: /* KEY */ >- case 0x1F512: /* LOCK */ >- case 0x1F513: /* OPEN LOCK */ >- return YES; >- case 0x0307: /* COMBINING DOT ABOVE */ >- return previousCodePoint == 0x0237 /* LATIN SMALL LETTER DOTLESS J */ >- || previousCodePoint == 0x0131 /* LATIN SMALL LETTER DOTLESS I */ >- || previousCodePoint == 0x05D5; /* HEBREW LETTER VAV */ >- case 0x0548: /* ARMENIAN CAPITAL LETTER VO */ >- case 0x054D: /* ARMENIAN CAPITAL LETTER SEH */ >- case 0x0578: /* ARMENIAN SMALL LETTER VO */ >- case 0x057D: /* ARMENIAN SMALL LETTER SEH */ >- return previousCodePoint >- && !isASCIIDigitOrValidHostCharacter(previousCodePoint.value()) >- && !isArmenianScriptCharacter(previousCodePoint.value()); >- case '.': >- return NO; >- default: >- return previousCodePoint >- && isArmenianLookalikeCharacter(previousCodePoint.value()) >- && !(isArmenianScriptCharacter(charCode) || isASCIIDigitOrValidHostCharacter(charCode)); >- } >-} >- > static BOOL readIDNScriptWhiteListFile(NSString *filename) > { > if (!filename) >@@ -287,7 +92,7 @@ static BOOL readIDNScriptWhiteListFile(NSString *filename) > return YES; > } > >-static BOOL allCharactersInIDNScriptWhiteList(const UChar *buffer, int32_t length) >+static void loadIDNScriptWhiteList() > { > static dispatch_once_t flag; > dispatch_once(&flag, ^{ >@@ -305,224 +110,8 @@ static BOOL allCharactersInIDNScriptWhiteList(const UChar *buffer, int32_t lengt > if (!readIDNScriptWhiteListFile([bundle pathForResource:@"IDNScriptWhiteList" ofType:@"txt"])) > CRASH(); > }); >- >- int32_t i = 0; >- std::optional<UChar32> previousCodePoint; >- while (i < length) { >- UChar32 c; >- U16_NEXT(buffer, i, length, c) >- UErrorCode error = U_ZERO_ERROR; >- UScriptCode script = uscript_getScript(c, &error); >- if (error != U_ZERO_ERROR) { >- LOG_ERROR("got ICU error while trying to look at scripts: %d", error); >- return NO; >- } >- if (script < 0) { >- LOG_ERROR("got negative number for script code from ICU: %d", script); >- return NO; >- } >- if (script >= USCRIPT_CODE_LIMIT) >- return NO; >- >- size_t index = script / 32; >- uint32_t mask = 1 << (script % 32); >- if (!(IDNScriptWhiteList[index] & mask)) >- return NO; >- >- if (isLookalikeCharacter(previousCodePoint, c)) >- return NO; >- previousCodePoint = c; >- } >- return YES; > } >- >-static bool isSecondLevelDomainNameAllowedByTLDRules(const UChar* buffer, int32_t length, const WTF::Function<bool(UChar)>& characterIsAllowed) >-{ >- ASSERT(length > 0); >- >- for (int32_t i = length - 1; i >= 0; --i) { >- UChar ch = buffer[i]; >- >- if (characterIsAllowed(ch)) >- continue; >- >- // Only check the second level domain. Lower level registrars may have different rules. >- if (ch == '.') >- break; >- >- return false; >- } >- return true; >-} >- >-#define CHECK_RULES_IF_SUFFIX_MATCHES(suffix, function) \ >- { \ >- static const int32_t suffixLength = sizeof(suffix) / sizeof(suffix[0]); \ >- if (length > suffixLength && 0 == memcmp(buffer + length - suffixLength, suffix, sizeof(suffix))) \ >- return isSecondLevelDomainNameAllowedByTLDRules(buffer, length - suffixLength, function); \ >- } >- >-static bool isRussianDomainNameCharacter(UChar ch) >-{ >- // Only modern Russian letters, digits and dashes are allowed. >- return (ch >= 0x0430 && ch <= 0x044f) || ch == 0x0451 || isASCIIDigit(ch) || ch == '-'; >-} >- >-static BOOL allCharactersAllowedByTLDRules(const UChar* buffer, int32_t length) >-{ >- // Skip trailing dot for root domain. >- if (buffer[length - 1] == '.') >- length--; >- >- // http://cctld.ru/files/pdf/docs/rules_ru-rf.pdf >- static const UChar cyrillicRF[] = { >- '.', >- 0x0440, // CYRILLIC SMALL LETTER ER >- 0x0444 // CYRILLIC SMALL LETTER EF >- }; >- CHECK_RULES_IF_SUFFIX_MATCHES(cyrillicRF, isRussianDomainNameCharacter); >- >- // http://rusnames.ru/rules.pl >- static const UChar cyrillicRUS[] = { >- '.', >- 0x0440, // CYRILLIC SMALL LETTER ER >- 0x0443, // CYRILLIC SMALL LETTER U >- 0x0441 // CYRILLIC SMALL LETTER ES >- }; >- CHECK_RULES_IF_SUFFIX_MATCHES(cyrillicRUS, isRussianDomainNameCharacter); >- >- // http://ru.faitid.org/projects/moscow/documents/moskva/idn >- static const UChar cyrillicMOSKVA[] = { >- '.', >- 0x043C, // CYRILLIC SMALL LETTER EM >- 0x043E, // CYRILLIC SMALL LETTER O >- 0x0441, // CYRILLIC SMALL LETTER ES >- 0x043A, // CYRILLIC SMALL LETTER KA >- 0x0432, // CYRILLIC SMALL LETTER VE >- 0x0430 // CYRILLIC SMALL LETTER A >- }; >- CHECK_RULES_IF_SUFFIX_MATCHES(cyrillicMOSKVA, isRussianDomainNameCharacter); >- >- // http://www.dotdeti.ru/foruser/docs/regrules.php >- static const UChar cyrillicDETI[] = { >- '.', >- 0x0434, // CYRILLIC SMALL LETTER DE >- 0x0435, // CYRILLIC SMALL LETTER IE >- 0x0442, // CYRILLIC SMALL LETTER TE >- 0x0438 // CYRILLIC SMALL LETTER I >- }; >- CHECK_RULES_IF_SUFFIX_MATCHES(cyrillicDETI, isRussianDomainNameCharacter); >- >- // http://corenic.org - rules not published. The word is Russian, so only allowing Russian at this time, >- // although we may need to revise the checks if this ends up being used with other languages spoken in Russia. >- static const UChar cyrillicONLAYN[] = { >- '.', >- 0x043E, // CYRILLIC SMALL LETTER O >- 0x043D, // CYRILLIC SMALL LETTER EN >- 0x043B, // CYRILLIC SMALL LETTER EL >- 0x0430, // CYRILLIC SMALL LETTER A >- 0x0439, // CYRILLIC SMALL LETTER SHORT I >- 0x043D // CYRILLIC SMALL LETTER EN >- }; >- CHECK_RULES_IF_SUFFIX_MATCHES(cyrillicONLAYN, isRussianDomainNameCharacter); >- >- // http://corenic.org - same as above. >- static const UChar cyrillicSAYT[] = { >- '.', >- 0x0441, // CYRILLIC SMALL LETTER ES >- 0x0430, // CYRILLIC SMALL LETTER A >- 0x0439, // CYRILLIC SMALL LETTER SHORT I >- 0x0442 // CYRILLIC SMALL LETTER TE >- }; >- CHECK_RULES_IF_SUFFIX_MATCHES(cyrillicSAYT, isRussianDomainNameCharacter); >- >- // http://pir.org/products/opr-domain/ - rules not published. According to the registry site, >- // the intended audience is "Russian and other Slavic-speaking markets". >- // Chrome appears to only allow Russian, so sticking with that for now. >- static const UChar cyrillicORG[] = { >- '.', >- 0x043E, // CYRILLIC SMALL LETTER O >- 0x0440, // CYRILLIC SMALL LETTER ER >- 0x0433 // CYRILLIC SMALL LETTER GHE >- }; >- CHECK_RULES_IF_SUFFIX_MATCHES(cyrillicORG, isRussianDomainNameCharacter); >- >- // http://cctld.by/rules.html >- static const UChar cyrillicBEL[] = { >- '.', >- 0x0431, // CYRILLIC SMALL LETTER BE >- 0x0435, // CYRILLIC SMALL LETTER IE >- 0x043B // CYRILLIC SMALL LETTER EL >- }; >- CHECK_RULES_IF_SUFFIX_MATCHES(cyrillicBEL, [](UChar ch) { >- // Russian and Byelorussian letters, digits and dashes are allowed. >- return (ch >= 0x0430 && ch <= 0x044f) || ch == 0x0451 || ch == 0x0456 || ch == 0x045E || ch == 0x2019 || isASCIIDigit(ch) || ch == '-'; >- }); >- >- // http://www.nic.kz/docs/poryadok_vnedreniya_kaz_ru.pdf >- static const UChar cyrillicKAZ[] = { >- '.', >- 0x049B, // CYRILLIC SMALL LETTER KA WITH DESCENDER >- 0x0430, // CYRILLIC SMALL LETTER A >- 0x0437 // CYRILLIC SMALL LETTER ZE >- }; >- CHECK_RULES_IF_SUFFIX_MATCHES(cyrillicKAZ, [](UChar ch) { >- // Kazakh letters, digits and dashes are allowed. >- return (ch >= 0x0430 && ch <= 0x044f) || ch == 0x0451 || ch == 0x04D9 || ch == 0x0493 || ch == 0x049B || ch == 0x04A3 || ch == 0x04E9 || ch == 0x04B1 || ch == 0x04AF || ch == 0x04BB || ch == 0x0456 || isASCIIDigit(ch) || ch == '-'; >- }); >- >- // http://uanic.net/docs/documents-ukr/Rules%20of%20UKR_v4.0.pdf >- static const UChar cyrillicUKR[] = { >- '.', >- 0x0443, // CYRILLIC SMALL LETTER U >- 0x043A, // CYRILLIC SMALL LETTER KA >- 0x0440 // CYRILLIC SMALL LETTER ER >- }; >- CHECK_RULES_IF_SUFFIX_MATCHES(cyrillicUKR, [](UChar ch) { >- // Russian and Ukrainian letters, digits and dashes are allowed. >- return (ch >= 0x0430 && ch <= 0x044f) || ch == 0x0451 || ch == 0x0491 || ch == 0x0404 || ch == 0x0456 || ch == 0x0457 || isASCIIDigit(ch) || ch == '-'; >- }); >- >- // http://www.rnids.rs/data/DOKUMENTI/idn-srb-policy-termsofuse-v1.4-eng.pdf >- static const UChar cyrillicSRB[] = { >- '.', >- 0x0441, // CYRILLIC SMALL LETTER ES >- 0x0440, // CYRILLIC SMALL LETTER ER >- 0x0431 // CYRILLIC SMALL LETTER BE >- }; >- CHECK_RULES_IF_SUFFIX_MATCHES(cyrillicSRB, [](UChar ch) { >- // Serbian letters, digits and dashes are allowed. >- return (ch >= 0x0430 && ch <= 0x0438) || (ch >= 0x043A && ch <= 0x0448) || ch == 0x0452 || ch == 0x0458 || ch == 0x0459 || ch == 0x045A || ch == 0x045B || ch == 0x045F || isASCIIDigit(ch) || ch == '-'; >- }); >- >- // http://marnet.mk/doc/pravilnik-mk-mkd.pdf >- static const UChar cyrillicMKD[] = { >- '.', >- 0x043C, // CYRILLIC SMALL LETTER EM >- 0x043A, // CYRILLIC SMALL LETTER KA >- 0x0434 // CYRILLIC SMALL LETTER DE >- }; >- CHECK_RULES_IF_SUFFIX_MATCHES(cyrillicMKD, [](UChar ch) { >- // Macedonian letters, digits and dashes are allowed. >- return (ch >= 0x0430 && ch <= 0x0438) || (ch >= 0x043A && ch <= 0x0448) || ch == 0x0453 || ch == 0x0455 || ch == 0x0458 || ch == 0x0459 || ch == 0x045A || ch == 0x045C || ch == 0x045F || isASCIIDigit(ch) || ch == '-'; >- }); >- >- // https://www.mon.mn/cs/ >- static const UChar cyrillicMON[] = { >- '.', >- 0x043C, // CYRILLIC SMALL LETTER EM >- 0x043E, // CYRILLIC SMALL LETTER O >- 0x043D // CYRILLIC SMALL LETTER EN >- }; >- CHECK_RULES_IF_SUFFIX_MATCHES(cyrillicMON, [](UChar ch) { >- // Mongolian letters, digits and dashes are allowed. >- return (ch >= 0x0430 && ch <= 0x044f) || ch == 0x0451 || ch == 0x04E9 || ch == 0x04AF || isASCIIDigit(ch) || ch == '-'; >- }); >- >- // Not a known top level domain with special rules. >- return NO; >-} >- >+ > // Return value of nil means no mapping is necessary. > // If makeString is NO, then return value is either nil or self to indicate mapping is necessary. > // If makeString is YES, then return value is either nil or the mapped string. >@@ -534,9 +123,6 @@ static NSString *mapHostNameWithRange(NSString *string, NSRange range, BOOL enco > if (![string length]) > return nil; > >- UChar sourceBuffer[HOST_NAME_BUFFER_LENGTH]; >- UChar destinationBuffer[HOST_NAME_BUFFER_LENGTH]; >- > if (encode && [string rangeOfString:@"%" options:NSLiteralSearch range:range].location != NSNotFound) { > NSString *substring = [string substringWithRange:range]; > substring = CFBridgingRelease(CFURLCreateStringByReplacingPercentEscapes(nullptr, (CFStringRef)substring, CFSTR(""))); >@@ -545,25 +131,20 @@ static NSString *mapHostNameWithRange(NSString *string, NSRange range, BOOL enco > range = NSMakeRange(0, [string length]); > } > } >- >- int length = range.length; >- [string getCharacters:sourceBuffer range:range]; >- >- UErrorCode uerror = U_ZERO_ERROR; >- UIDNAInfo processingDetails = UIDNA_INFO_INITIALIZER; >- int32_t numCharactersConverted = (encode ? uidna_nameToASCII : uidna_nameToUnicode)(&URLParser::internationalDomainNameTranscoder(), sourceBuffer, length, destinationBuffer, HOST_NAME_BUFFER_LENGTH, &processingDetails, &uerror); >- if (length && (U_FAILURE(uerror) || processingDetails.errors)) { >+ >+ loadIDNScriptWhiteList(); >+ >+ NSString* substring = [string substringWithRange:range]; >+ bool conversionError = false; >+ String convertedString = URLHelpers::decodePunycode(substring, encode, IDNScriptWhiteList, &conversionError); >+ if (conversionError) > *error = YES; >+ if (!convertedString) > return nil; >- } >- >- if (numCharactersConverted == length && !memcmp(sourceBuffer, destinationBuffer, length * sizeof(UChar))) >- return nil; >- >- if (!encode && !allCharactersInIDNScriptWhiteList(destinationBuffer, numCharactersConverted) && !allCharactersAllowedByTLDRules(destinationBuffer, numCharactersConverted)) >- return nil; >- >- return makeString ? [NSString stringWithCharacters:destinationBuffer length:numCharactersConverted] : string; >+ >+ NSString* convertedString_(convertedString); >+ >+ return makeString ? convertedString_ : string; > } > > BOOL hostNameNeedsDecodingWithRange(NSString *string, NSRange range, BOOL *error) >@@ -1072,7 +653,7 @@ static CFStringRef createStringWithEscapedUnsafeCharacters(CFStringRef string) > UChar32 c; > U16_NEXT(sourceBuffer, i, length, c) > >- if (isLookalikeCharacter(previousCodePoint, c)) { >+ if (URLHelpers::isLookalikeCharacter(previousCodePoint, c)) { > uint8_t utf8Buffer[4]; > CFIndex offset = 0; > UBool failure = false; >diff --git a/Source/WebKit/PlatformGTK.cmake b/Source/WebKit/PlatformGTK.cmake >index 43b780c1a5c7581a4ddac0e1474b166658654c6b..f024b6b36cc0f3b6c3320d69323eb93db57e036d 100644 >--- a/Source/WebKit/PlatformGTK.cmake >+++ b/Source/WebKit/PlatformGTK.cmake >@@ -105,6 +105,7 @@ set(WebKit2GTK_INSTALLED_HEADERS > ${WEBKIT_DIR}/UIProcess/API/gtk/WebKitURIRequest.h > ${WEBKIT_DIR}/UIProcess/API/gtk/WebKitURIResponse.h > ${WEBKIT_DIR}/UIProcess/API/gtk/WebKitURISchemeRequest.h >+ ${WEBKIT_DIR}/UIProcess/API/gtk/WebKitURIUtilities.h > ${WEBKIT_DIR}/UIProcess/API/gtk/WebKitUserContent.h > ${WEBKIT_DIR}/UIProcess/API/gtk/WebKitUserContentManager.h > ${WEBKIT_DIR}/UIProcess/API/gtk/WebKitUserMediaPermissionRequest.h >diff --git a/Source/WebKit/PlatformWPE.cmake b/Source/WebKit/PlatformWPE.cmake >index a4eb7333520f540383021790013ae00cb941bc56..5c695ce997b9818a000325b5b451bbc71534dd21 100644 >--- a/Source/WebKit/PlatformWPE.cmake >+++ b/Source/WebKit/PlatformWPE.cmake >@@ -136,6 +136,7 @@ set(WPE_API_INSTALLED_HEADERS > ${WEBKIT_DIR}/UIProcess/API/wpe/WebKitURIRequest.h > ${WEBKIT_DIR}/UIProcess/API/wpe/WebKitURIResponse.h > ${WEBKIT_DIR}/UIProcess/API/wpe/WebKitURISchemeRequest.h >+ ${WEBKIT_DIR}/UIProcess/API/wpe/WebKitURIUtilities.h > ${WEBKIT_DIR}/UIProcess/API/wpe/WebKitUserContent.h > ${WEBKIT_DIR}/UIProcess/API/wpe/WebKitUserContentManager.h > ${WEBKIT_DIR}/UIProcess/API/wpe/WebKitUserMediaPermissionRequest.h >diff --git a/Source/WebKit/SourcesGTK.txt b/Source/WebKit/SourcesGTK.txt >index b7ab0dd6879e880b84dc2c6db4f6d59bc5970fa6..1903ff15031cec830a4e399c3b2198b2d17de6cb 100644 >--- a/Source/WebKit/SourcesGTK.txt >+++ b/Source/WebKit/SourcesGTK.txt >@@ -166,6 +166,7 @@ UIProcess/API/glib/WebKitSecurityOrigin.cpp @no-unify > UIProcess/API/glib/WebKitSettings.cpp @no-unify > UIProcess/API/glib/WebKitUIClient.cpp @no-unify > UIProcess/API/glib/WebKitURISchemeRequest.cpp @no-unify >+UIProcess/API/glib/WebKitURIUtilities.cpp @no-unify > UIProcess/API/glib/WebKitUserContent.cpp @no-unify > UIProcess/API/glib/WebKitUserContentManager.cpp @no-unify > UIProcess/API/glib/WebKitUserMediaPermissionRequest.cpp @no-unify >diff --git a/Source/WebKit/SourcesWPE.txt b/Source/WebKit/SourcesWPE.txt >index 47a2dcde138b74597e100638289d4094a55a5139..2c1450ffa9cc9778a5ecf6df034c8c9fc04713c3 100644 >--- a/Source/WebKit/SourcesWPE.txt >+++ b/Source/WebKit/SourcesWPE.txt >@@ -152,6 +152,7 @@ UIProcess/API/glib/WebKitSecurityOrigin.cpp @no-unify > UIProcess/API/glib/WebKitSettings.cpp @no-unify > UIProcess/API/glib/WebKitUIClient.cpp @no-unify > UIProcess/API/glib/WebKitURISchemeRequest.cpp @no-unify >+UIProcess/API/glib/WebKitURIUtilities.cpp @no-unify > UIProcess/API/glib/WebKitUserContent.cpp @no-unify > UIProcess/API/glib/WebKitUserContentManager.cpp @no-unify > UIProcess/API/glib/WebKitUserMediaPermissionRequest.cpp @no-unify >diff --git a/Source/WebKit/UIProcess/API/glib/WebKitURIUtilities.cpp b/Source/WebKit/UIProcess/API/glib/WebKitURIUtilities.cpp >new file mode 100644 >index 0000000000000000000000000000000000000000..afc0a203cd1f1bdca5a3e4666f2fed49cd0ba19d >--- /dev/null >+++ b/Source/WebKit/UIProcess/API/glib/WebKitURIUtilities.cpp >@@ -0,0 +1,86 @@ >+/* >+ * Copyright (C) 2018 Igalia S.L. >+ * >+ * This library is free software; you can redistribute it and/or >+ * modify it under the terms of the GNU Library General Public >+ * License as published by the Free Software Foundation; either >+ * version 2 of the License, or (at your option) any later version. >+ * >+ * This library is distributed in the hope that it will be useful, >+ * but WITHOUT ANY WARRANTY; without even the implied warranty of >+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >+ * Library General Public License for more details. >+ * >+ * You should have received a copy of the GNU Library General Public License >+ * along with this library; see the file COPYING.LIB. If not, write to >+ * the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, >+ * Boston, MA 02110-1301, USA. >+ */ >+ >+#include "config.h" >+#include "WebKitURIUtilities.h" >+ >+#include <WebCore/GUniquePtrSoup.h> >+#include <WebCore/URL.h> >+#include <WebCore/URLHelpers.h> >+#include <libsoup/soup.h> >+#include <mutex> >+#include <unicode/uidna.h> >+#include <wtf/HashMap.h> >+ >+/** >+ * SECTION: WebKitURIUtilities >+ * @Short_description: Utility functions to manipulate URIs >+ * @Title: WebKitURIUtilities >+ **/ >+ >+/** >+ * webkit_uri_for_display: >+ * @uri: the URI to be converted >+ * >+ * Use this function to format a URI for display. The URIs used internally by >+ * WebKit may contain percent-encoded characters or Punycode, which are not >+ * generally suitable to display to users. This function provides protection >+ * against IDN homograph attacks, so in some cases the host part of the returned >+ * URI may be in Punycode if the safety check fails. >+ * >+ * Returns: (nullable) (transfer full): @uri suitable for display, or %NULL in >+ * case of error. >+ **/ >+gchar* webkit_uri_for_display(const gchar* uri) >+{ >+ g_return_val_if_fail(uri, nullptr); >+ >+ auto coreURI = WebCore::URL { WebCore::URL { }, String::fromUTF8(uri) }; >+ if (!coreURI.isValid()) >+ return nullptr; >+ >+ // Remove password and percent-decode host name. >+ coreURI.setPass(emptyString()); >+ auto soupURI = coreURI.createSoupURI(); >+ if (!soupURI.get()->host) >+ return nullptr; >+ >+ GUniquePtr<gchar> percentDecodedHostChars(soup_uri_decode(soupURI.get()->host)); >+ auto percentDecodedHost = String::fromUTF8(percentDecodedHostChars.get()); >+ // Handle Unicode characters in the host name. >+ WebCore::ICUConvertHostnameWhitelist IDNScriptWhiteList = { }; >+ bool error = false; >+ auto convertedHostName = WebCore::URLHelpers::decodePunycode(percentDecodedHost, false, IDNScriptWhiteList, &error); >+ if (error) >+ return nullptr; >+ >+ if (convertedHostName.isNull()) >+ convertedHostName = percentDecodedHost; >+ >+ g_free(soupURI.get()->host); >+ soupURI.get()->host = g_strdup(convertedHostName.utf8().data()); >+ >+ // Now, decode any percent-encoded characters in the URI. >+ GUniquePtr<char> percentEncodedURI(soup_uri_to_string(soupURI.get(), FALSE)); >+ char* decodedURI = g_uri_unescape_string(percentEncodedURI.get(), "/"); >+ if (!decodedURI) >+ return nullptr; >+ >+ return decodedURI; >+} >diff --git a/Source/WebKit/UIProcess/API/gtk/WebKitURIUtilities.h b/Source/WebKit/UIProcess/API/gtk/WebKitURIUtilities.h >new file mode 100644 >index 0000000000000000000000000000000000000000..3a30776613763441b0df9d8fba3adcb570326b2e >--- /dev/null >+++ b/Source/WebKit/UIProcess/API/gtk/WebKitURIUtilities.h >@@ -0,0 +1,37 @@ >+/* >+ * Copyright (C) 2018 Igalia S.L. >+ * >+ * This library is free software; you can redistribute it and/or >+ * modify it under the terms of the GNU Library General Public >+ * License as published by the Free Software Foundation; either >+ * version 2 of the License, or (at your option) any later version. >+ * >+ * This library is distributed in the hope that it will be useful, >+ * but WITHOUT ANY WARRANTY; without even the implied warranty of >+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >+ * Library General Public License for more details. >+ * >+ * You should have received a copy of the GNU Library General Public License >+ * along with this library; see the file COPYING.LIB. If not, write to >+ * the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, >+ * Boston, MA 02110-1301, USA. >+ */ >+ >+#if !defined(__WEBKIT2_H_INSIDE__) && !defined(WEBKIT2_COMPILATION) >+#error "Only <webkit2/webkit2.h> can be included directly." >+#endif >+ >+#ifndef WebKitURIUtilities_h >+#define WebKitURIUtilities_h >+ >+#include <glib.h> >+#include <webkit2/WebKitDefines.h> >+ >+G_BEGIN_DECLS >+ >+WEBKIT_API gchar * >+webkit_uri_for_display (const gchar *uri); >+ >+G_END_DECLS >+ >+#endif >diff --git a/Source/WebKit/UIProcess/API/gtk/docs/webkit2gtk-4.0-sections.txt b/Source/WebKit/UIProcess/API/gtk/docs/webkit2gtk-4.0-sections.txt >index e97743a45e2ec80fa68b11093eb73bebea4b19fd..3db037bf0cda7e80add1ecd83f545390f5d9b6d7 100644 >--- a/Source/WebKit/UIProcess/API/gtk/docs/webkit2gtk-4.0-sections.txt >+++ b/Source/WebKit/UIProcess/API/gtk/docs/webkit2gtk-4.0-sections.txt >@@ -1614,3 +1614,8 @@ WEBKIT_PRINT_CUSTOM_WIDGET_GET_CLASS > WebKitPrintCustomWidgetPrivate > webkit_print_custom_widget_get_type > </SECTION> >+ >+<SECTION> >+<FILE>WebKitURIUtilities</FILE> >+webkit_uri_for_display >+</SECTION> >diff --git a/Source/WebKit/UIProcess/API/gtk/docs/webkit2gtk-docs.sgml b/Source/WebKit/UIProcess/API/gtk/docs/webkit2gtk-docs.sgml >index 30d97d655bb7c3d39655b39e6cceb377e036d7da..4cf0b66294a68d1a4b0a483ff5a4727de66a1e7d 100644 >--- a/Source/WebKit/UIProcess/API/gtk/docs/webkit2gtk-docs.sgml >+++ b/Source/WebKit/UIProcess/API/gtk/docs/webkit2gtk-docs.sgml >@@ -73,6 +73,11 @@ > <xi:include href="xml/WebKitConsoleMessage.xml"/> > </chapter> > >+ <chapter> >+ <title>Utilities</title> >+ <xi:include href="xml/WebKitURIUtilities.xml"/> >+ </chapter> >+ > <index id="index-all"> > <title>Index</title> > </index> >diff --git a/Source/WebKit/UIProcess/API/gtk/webkit2.h b/Source/WebKit/UIProcess/API/gtk/webkit2.h >index fb0476f1b179bc4c5ca4e1385e50cb05a7df5367..38892286bae52f2ac040072301ce4b8b9f125ff4 100644 >--- a/Source/WebKit/UIProcess/API/gtk/webkit2.h >+++ b/Source/WebKit/UIProcess/API/gtk/webkit2.h >@@ -72,6 +72,7 @@ > #include <webkit2/WebKitURIRequest.h> > #include <webkit2/WebKitURIResponse.h> > #include <webkit2/WebKitURISchemeRequest.h> >+#include <webkit2/WebKitURIUtilities.h> > #include <webkit2/WebKitUserContent.h> > #include <webkit2/WebKitUserContentManager.h> > #include <webkit2/WebKitUserMediaPermissionRequest.h> >diff --git a/Source/WebKit/UIProcess/API/wpe/WebKitURIUtilities.h b/Source/WebKit/UIProcess/API/wpe/WebKitURIUtilities.h >new file mode 100644 >index 0000000000000000000000000000000000000000..cb05f000ad530f61ee8b39278d09f0639dee0688 >--- /dev/null >+++ b/Source/WebKit/UIProcess/API/wpe/WebKitURIUtilities.h >@@ -0,0 +1,37 @@ >+/* >+ * Copyright (C) 2018 Igalia S.L. >+ * >+ * This library is free software; you can redistribute it and/or >+ * modify it under the terms of the GNU Library General Public >+ * License as published by the Free Software Foundation; either >+ * version 2 of the License, or (at your option) any later version. >+ * >+ * This library is distributed in the hope that it will be useful, >+ * but WITHOUT ANY WARRANTY; without even the implied warranty of >+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >+ * Library General Public License for more details. >+ * >+ * You should have received a copy of the GNU Library General Public License >+ * along with this library; see the file COPYING.LIB. If not, write to >+ * the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, >+ * Boston, MA 02110-1301, USA. >+ */ >+ >+#if !defined(__WEBKIT_H_INSIDE__) && !defined(WEBKIT2_COMPILATION) >+#error "Only <wpe/webkit.h> can be included directly." >+#endif >+ >+#ifndef WebKitURIUtilities_h >+#define WebKitURIUtilities_h >+ >+#include <glib.h> >+#include <wpe/WebKitDefines.h> >+ >+G_BEGIN_DECLS >+ >+WEBKIT_API gchar * >+webkit_uri_for_display (const gchar *uri); >+ >+G_END_DECLS >+ >+#endif >diff --git a/Source/WebKit/UIProcess/API/wpe/webkit.h b/Source/WebKit/UIProcess/API/wpe/webkit.h >index 943ade2afa819b9e8cb0b9f7eb41a758a1e3bc1b..e3afd510b7b65a74067f2024a181e5073fcf1767 100644 >--- a/Source/WebKit/UIProcess/API/wpe/webkit.h >+++ b/Source/WebKit/UIProcess/API/wpe/webkit.h >@@ -66,6 +66,7 @@ > #include <wpe/WebKitURIRequest.h> > #include <wpe/WebKitURIResponse.h> > #include <wpe/WebKitURISchemeRequest.h> >+#include <wpe/WebKitURIUtilities.h> > #include <wpe/WebKitUserContent.h> > #include <wpe/WebKitUserContentManager.h> > #include <wpe/WebKitUserMediaPermissionRequest.h> >diff --git a/Tools/ChangeLog b/Tools/ChangeLog >index ad1ec70158d259132d3fc40590dc5328c80fc207..05790cc249bb0d3da7f9f743aba2d0bdd9ee765f 100644 >--- a/Tools/ChangeLog >+++ b/Tools/ChangeLog >@@ -1,3 +1,18 @@ >+2018-06-12 Ms2ger <Ms2ger@igalia.com> >+ >+ [GTK][WPE] Add a function to convert internal URL to display ("pretty") URL >+ https://bugs.webkit.org/show_bug.cgi?id=174816 >+ >+ Reviewed by NOBODY (OOPS!). >+ >+ This code is based almost entirely on code by Gabriel Ivascu. >+ >+ * TestWebKitAPI/Tests/WebKitGLib/TestWebKitURIUtilities.cpp: Added. >+ (testURIForDisplay): >+ (beforeAll): >+ (afterAll): >+ * TestWebKitAPI/glib/CMakeLists.txt: >+ > 2018-06-19 Robin Morisset <rmorisset@apple.com> > > [WSL] Improving the typing rules >diff --git a/Tools/TestWebKitAPI/Tests/WebKitGLib/TestWebKitURIUtilities.cpp b/Tools/TestWebKitAPI/Tests/WebKitGLib/TestWebKitURIUtilities.cpp >new file mode 100644 >index 0000000000000000000000000000000000000000..f11eb37c6f14dc21e14b5740cf7a3770e6f210d7 >--- /dev/null >+++ b/Tools/TestWebKitAPI/Tests/WebKitGLib/TestWebKitURIUtilities.cpp >@@ -0,0 +1,69 @@ >+/* >+ * Copyright (C) 2018 Igalia S.L. >+ * >+ * This library is free software; you can redistribute it and/or >+ * modify it under the terms of the GNU Library General Public >+ * License as published by the Free Software Foundation; either >+ * version 2 of the License, or (at your option) any later version. >+ * >+ * This library is distributed in the hope that it will be useful, >+ * but WITHOUT ANY WARRANTY; without even the implied warranty of >+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >+ * Library General Public License for more details. >+ * >+ * You should have received a copy of the GNU Library General Public License >+ * along with this library; see the file COPYING.LIB. If not, write to >+ * the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, >+ * Boston, MA 02110-1301, USA. >+ */ >+ >+#include "config.h" >+ >+#include "TestMain.h" >+ >+static void testURIForDisplay(Test*, gconstpointer) >+{ >+ struct { >+ const char* input; >+ const char* output; >+ } const items[] = { >+ // Arabic Kasra letter. >+ { "https://xn--apple-gkh.com/", "https://xn--apple-gkh.com/" }, >+ { "http://xn--google-yri.com/", "http://xn--google-yri.com/" }, >+ // Cyrillic and Serbian. >+ { "http://ÑпеÑодежда.онла\u0439н/", "http://ÑпеÑодежда.онлайн/" }, >+ { "http://ÑпеÑодежда.онла\u0438\u0306н/", "http://ÑпеÑодежда.онлайн/" }, >+ { "http://пÑавоÑлавнаÑапоÑодиÑа.ÑÑб/", "http://пÑавоÑлавнаÑапоÑодиÑа.ÑÑб/" }, >+ { "http://www.Ñвеовде.од.ÑÑб/", "http://www.Ñвеовде.од.ÑÑб/" }, >+ // Cyrillic country code top-level domain for the Russian Federation. >+ { "http://пÑезиденÑ.ÑÑ/", "http://пÑезиденÑ.ÑÑ/" }, >+ { "http://пÑезиденÑ.ÑÑ./", "http://пÑезиденÑ.ÑÑ./" }, >+ { "http://www.пÑезиденÑ.ÑÑ/", "http://www.пÑезиденÑ.ÑÑ/" }, >+ { "http://поÑÑа.пÑезиденÑ.ÑÑ/", "http://поÑÑа.пÑезиденÑ.ÑÑ/" }, >+ { "http://0ж9.ÑÑ/", "http://0ж9.ÑÑ/" }, >+ { "http://ÑÑда-ÑÑда.ÑÑ/", "http://ÑÑда-ÑÑда.ÑÑ/" }, >+ { "http://пÑeзиденÑ.ÑÑ/", "http://xn--e-htbdgf6aiiy.xn--p1ai/" }, // Spoof: Roman 'e'. >+ { "http://caxap.ÑÑ/", "http://caxap.xn--p1ai/" }, // Spoof: all characters in 'caxap' are Roman. >+ // .com top-level domain doesn't allow non-Latin scripts. >+ { "http://αβγÏÏÏ.com/", "http://xn--mxacd4ffg.com/" }, >+ { "http://абгдеж.com/", "http://xn--80acgefg.com/" }, >+ { "http://ããããã®ã¯.com/", "http://xn--t8jcd20bfag.com/" }, >+ // Percent-decoding. >+ { "http://www.%7Bexample%7D.com/", "http://www.{example}.com/" }, >+ { "http://example.com/a%2Fb", nullptr }, // '/' in path needs to remain encoded. >+ }; >+ >+ for (auto& item : items) { >+ GUniquePtr<char> displayURI(webkit_uri_for_display(item.input)); >+ g_assert_cmpstr(displayURI.get(), ==, item.output); >+ } >+} >+ >+void beforeAll() >+{ >+ Test::add("WebKitURIUtilities", "uri-for-display", testURIForDisplay); >+} >+ >+void afterAll() >+{ >+} >diff --git a/Tools/TestWebKitAPI/glib/CMakeLists.txt b/Tools/TestWebKitAPI/glib/CMakeLists.txt >index f4c1e69663ba2782cdeada7817b30ad26ba9977e..5fee6a525e5eff5f996b16eaef416cbd35b89dc6 100644 >--- a/Tools/TestWebKitAPI/glib/CMakeLists.txt >+++ b/Tools/TestWebKitAPI/glib/CMakeLists.txt >@@ -134,6 +134,7 @@ ADD_WK2_TEST(TestWebExtensions ${TOOLS_DIR}/TestWebKitAPI/Tests/WebKitGLib/TestW > ADD_WK2_TEST(TestWebKitPolicyClient ${TOOLS_DIR}/TestWebKitAPI/Tests/WebKitGLib/TestWebKitPolicyClient.cpp) > ADD_WK2_TEST(TestWebKitSecurityOrigin ${TOOLS_DIR}/TestWebKitAPI/Tests/WebKitGLib/TestWebKitSecurityOrigin.cpp) > ADD_WK2_TEST(TestWebKitSettings ${TOOLS_DIR}/TestWebKitAPI/Tests/WebKitGLib/TestWebKitSettings.cpp) >+ADD_WK2_TEST(TestWebKitURIUtilities ${TOOLS_DIR}/TestWebKitAPI/Tests/WebKitGLib/TestWebKitURIUtilities.cpp) > ADD_WK2_TEST(TestWebKitWebContext ${TOOLS_DIR}/TestWebKitAPI/Tests/WebKitGLib/TestWebKitWebContext.cpp) > ADD_WK2_TEST(TestWebKitWebView ${TOOLS_DIR}/TestWebKitAPI/Tests/WebKitGLib/TestWebKitWebView.cpp) > ADD_WK2_TEST(TestWebKitUserContentManager ${TOOLS_DIR}/TestWebKitAPI/Tests/WebKitGLib/TestWebKitUserContentManager.cpp)
You cannot view the attachment while viewing its details because your browser does not support IFRAMEs.
View the attachment on a separate page
.
View Attachment As Diff
View Attachment As Raw
Actions:
View
|
Formatted Diff
|
Diff
Attachments on
bug 174816
:
330658
|
330687
|
331338
|
331339
|
331342
|
331343
|
331345
|
331840
|
331841
|
331842
|
331843
|
332474
|
332477
|
332478
|
332480
|
342542
|
343057
|
343059
|
343066
|
343806
|
344291
|
344294
|
344298
|
356804
|
356808
|
357043
|
357316
|
357550
|
357551
|
360348
|
360451
|
360462
|
360561
|
360567