[Cocoa] Map commonly used Chinese Windows font names to names present on Cocoa operating systems
Created attachment 281068 [details] WIP
Created attachment 281092 [details] Patch
<rdar://problem/13258122>
Comment on attachment 281092 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=281092&action=review > Source/WebCore/ChangeLog:4 > + https://bugs.webkit.org/show_bug.cgi?id=158649 <rdar://problem/13258122>
Comment on attachment 281092 [details] Patch Attachment 281092 [details] did not pass mac-debug-ews (mac): Output: http://webkit-queues.webkit.org/results/1483029 Number of test failures exceeded the failure limit.
Created attachment 281094 [details] Archive of layout-test-results from ews116 for mac-yosemite The attached test failures were seen while running run-webkit-tests on the mac-debug-ews. Bot: ews116 Port: mac-yosemite Platform: Mac OS X 10.10.5
What’s going on with all these crashes on mac-debug? Presumably some assertion is firing.
I don’t see failures on the build.webkit.org Dashboard.
Created attachment 281102 [details] Patch
(In reply to comment #7) > What’s going on with all these crashes on mac-debug? Presumably some > assertion is firing. I used equalLettersIgnoringASCIICase() instead of equalIgnoringASCIICase() with a string which contains non-letter characters.
Created attachment 281103 [details] Patch
Comment on attachment 281103 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=281103&action=review review+ but I see ways to improve this considerably > Source/WebCore/platform/graphics/FontCache.cpp:162 > +AtomicString FontCache::alternateFamilyName(const AtomicString& familyName) We should consider an approach where we have functions for each of the font names that use NeverDestroyed and return const AtomicString& rather than atomic strings that are re-created each time. Then this function could return const AtomicString& instead of AtomicString, and we would no longer compute a hash and do an atomic string hash table lookup every time this function returns a non-null alternate family name. > Source/WebCore/platform/graphics/cocoa/FontCacheCoreText.cpp:813 > +#if (PLATFORM(MAC) && __MAC_OS_X_VERSION_MIN_REQUIRED >= 101100) || PLATFORM(IOS) > + AtomicString heitiSCReplacement = AtomicString("PingFang SC", AtomicString::ConstructFromLiteral); > + AtomicString heitiTCReplacement = AtomicString("PingFang TC", AtomicString::ConstructFromLiteral); > +#else > + AtomicString heitiSCReplacement = AtomicString("Heiti SC", AtomicString::ConstructFromLiteral); > + AtomicString heitiTCReplacement = AtomicString("Heiti TC", AtomicString::ConstructFromLiteral); > +#endif It seems a bit peculiar to go to the trouble of using ConstructFromLiteral in an attempt to optimize, but also computing hashes and possibly even allocating and destroying two strings every time this function is called. We should have functions that return appropriate font names instead, so only the one that is needed ends up getting called. Even better if we combine it with my suggestion above. Also, it’s unnecessarily wordy to write: AtomicString x = AtomicString(y); Instead we should write one of these (my favorite on top): AtomicString x { y }; AtomicString x(y); AtomicString x = { y }; auto x = AtomicString { y }; auto x = AtomicString(y); > Source/WebCore/platform/graphics/cocoa/FontCacheCoreText.cpp:817 > + if (familyName == AtomicString(songtiString)) This is a rather inefficient way to compare a string with an array of UChar. Making an AtomicString means computing a hash, doing a hash table lookup, and possibly even allocating and deallocating a string each time, and we have no reason to do any of that. Instead we should write a simple helper function that can compare a string with an array of UChar without allocating any memory. Here’s a cut at the function we could write: template<unsigned charactersCount> inline bool equalNonLatin1(StringView string, const UChar (&characters)[charactersCount]) { ASSERT(string.length() == charactersCount); return !string.is8Bit && !memcmp(string.characters16(), characters, charactersCount * sizeof(UChar)); } Or could make a version that takes const String& or const AtomicString& instead. Note that of course if we do this, then we must *not* terminate the strings with a null character. > Source/WebCore/platform/graphics/cocoa/FontCacheCoreText.cpp:846 > + // On Windows, Courier New is a TrueType font that is always present and > + // Courier is a bitmap font that we do not support. So, we don't want to map > + // Courier New to Courier. > + // FIXME: Not sure why this is harmful on Windows, since the alternative will > + // only be tried if Courier New is not found. This comment no longer makes sense, since it is about the Windows version of this function, not about this function; the point of the comment is to explain why this is *not* done on Windows. So the comment needs to be reworded and moved to the FontCacheWin.cpp version of this function. Maybe we should put this back into the platform-independent version with #if !OS(WINDOWS), anticipating that after a little research, the fix for the FIXME will be to remove the #if and remove the comment. That would have a positive effect since we would not need a second copy of this code in FontCacheFreeType.cpp. > Source/WebCore/platform/graphics/freetype/FontCacheFreeType.cpp:393 > + // On Windows, Courier New is a TrueType font that is always present and > + // Courier is a bitmap font that we do not support. So, we don't want to map > + // Courier New to Courier. > + // FIXME: Not sure why this is harmful on Windows, since the alternative will > + // only be tried if Courier New is not found. Ditto.
(In reply to comment #12) > Comment on attachment 281103 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=281103&action=review > > review+ but I see ways to improve this considerably > > > Source/WebCore/platform/graphics/FontCache.cpp:162 > > +AtomicString FontCache::alternateFamilyName(const AtomicString& familyName) > > We should consider an approach where we have functions for each of the font > names that use NeverDestroyed and return const AtomicString& rather than > atomic strings that are re-created each time. Then this function could > return const AtomicString& instead of AtomicString, and we would no longer > compute a hash and do an atomic string hash table lookup every time this > function returns a non-null alternate family name. > > > Source/WebCore/platform/graphics/cocoa/FontCacheCoreText.cpp:813 > > +#if (PLATFORM(MAC) && __MAC_OS_X_VERSION_MIN_REQUIRED >= 101100) || PLATFORM(IOS) > > + AtomicString heitiSCReplacement = AtomicString("PingFang SC", AtomicString::ConstructFromLiteral); > > + AtomicString heitiTCReplacement = AtomicString("PingFang TC", AtomicString::ConstructFromLiteral); > > +#else > > + AtomicString heitiSCReplacement = AtomicString("Heiti SC", AtomicString::ConstructFromLiteral); > > + AtomicString heitiTCReplacement = AtomicString("Heiti TC", AtomicString::ConstructFromLiteral); > > +#endif > > It seems a bit peculiar to go to the trouble of using ConstructFromLiteral > in an attempt to optimize, but also computing hashes and possibly even > allocating and destroying two strings every time this function is called. We > should have functions that return appropriate font names instead, so only > the one that is needed ends up getting called. Even better if we combine it > with my suggestion above. > > Also, it’s unnecessarily wordy to write: > > AtomicString x = AtomicString(y); > > Instead we should write one of these (my favorite on top): > > AtomicString x { y }; > AtomicString x(y); > AtomicString x = { y }; > auto x = AtomicString { y }; > auto x = AtomicString(y); > > > Source/WebCore/platform/graphics/cocoa/FontCacheCoreText.cpp:817 > > + if (familyName == AtomicString(songtiString)) > > This is a rather inefficient way to compare a string with an array of UChar. > Making an AtomicString means computing a hash, doing a hash table lookup, > and possibly even allocating and deallocating a string each time, and we > have no reason to do any of that. Instead we should write a simple helper > function that can compare a string with an array of UChar without allocating > any memory. Here’s a cut at the function we could write: > > template<unsigned charactersCount> inline bool equalNonLatin1(StringView > string, const UChar (&characters)[charactersCount]) > { > ASSERT(string.length() == charactersCount); > return !string.is8Bit && !memcmp(string.characters16(), characters, > charactersCount * sizeof(UChar)); > } I think this is good except for the name. Unicode string equality in the general case is locale-sensitive and needs to be done through ICU. In my use case, that is overkill, but implementing a function named equalNonLatin1() shouldn't do a memcmp(). But, if I make the hardcoded font names a NeverDestroyed<AtomicString>, the hash would only be computed once, so the comparison would be a raw pointer equality. That approach seems preferential. > > Or could make a version that takes const String& or const AtomicString& > instead. Note that of course if we do this, then we must *not* terminate the > strings with a null character. > > > Source/WebCore/platform/graphics/cocoa/FontCacheCoreText.cpp:846 > > + // On Windows, Courier New is a TrueType font that is always present and > > + // Courier is a bitmap font that we do not support. So, we don't want to map > > + // Courier New to Courier. > > + // FIXME: Not sure why this is harmful on Windows, since the alternative will > > + // only be tried if Courier New is not found. > > This comment no longer makes sense, since it is about the Windows version of > this function, not about this function; the point of the comment is to > explain why this is *not* done on Windows. So the comment needs to be > reworded and moved to the FontCacheWin.cpp version of this function. > > Maybe we should put this back into the platform-independent version with #if > !OS(WINDOWS), anticipating that after a little research, the fix for the > FIXME will be to remove the #if and remove the comment. > > That would have a positive effect since we would not need a second copy of > this code in FontCacheFreeType.cpp. > > > Source/WebCore/platform/graphics/freetype/FontCacheFreeType.cpp:393 > > + // On Windows, Courier New is a TrueType font that is always present and > > + // Courier is a bitmap font that we do not support. So, we don't want to map > > + // Courier New to Courier. > > + // FIXME: Not sure why this is harmful on Windows, since the alternative will > > + // only be tried if Courier New is not found. > > Ditto.
(In reply to comment #13) > I think this is good except for the name. Unicode string equality in the > general case is locale-sensitive and needs to be done through ICU. In my use > case, that is overkill, but implementing a function named equalNonLatin1() > shouldn't do a memcmp(). I believe you are mistaken. It's true that correct string *comparison* is a bit complex, since there is unit order vs. code point order to consider. And string searching is, of course, locale sensitive. But equality, even case-insensitive equality with full case folding, is not locale-sensitive. See ICU functions like u_strCaseCompare. I see no evidence whatsoever that string equality is locale-sensitive, nor that ICU provides functions that check equality and take a locale argument. ICU provides a u_memcmp function for code just like what I wrote, to allow us to omit the sizeof(UChar) bit. If you are convinced that the AtomicString code was doing locale-sensitive string equality through ICU maybe you can give me a specific example of what is locale sensitive or perhaps cite the API you believe AtomicString is using to do locale-sensitive Unicode string equality checks through ICU?
(In reply to comment #14) > (In reply to comment #13) > > I think this is good except for the name. Unicode string equality in the > > general case is locale-sensitive and needs to be done through ICU. In my use > > case, that is overkill, but implementing a function named equalNonLatin1() > > shouldn't do a memcmp(). > > I believe you are mistaken. > > It's true that correct string *comparison* is a bit complex, since there is > unit order vs. code point order to consider. And string searching is, of > course, locale sensitive. But equality, even case-insensitive equality with > full case folding, is not locale-sensitive. See ICU functions like > u_strCaseCompare. > > I see no evidence whatsoever that string equality is locale-sensitive, nor > that ICU provides functions that check equality and take a locale argument. > > ICU provides a u_memcmp function for code just like what I wrote, to allow > us to omit the sizeof(UChar) bit. > > If you are convinced that the AtomicString code was doing locale-sensitive > string equality through ICU maybe you can give me a specific example of what > is locale sensitive or perhaps cite the API you believe AtomicString is > using to do locale-sensitive Unicode string equality checks through ICU? I was thinking of case mapping, which is locale-specific [1]. Equality requires unicode normalization [2], which is not locale-specific. The thesis I was making, however, still applies - a raw memcmp() won't perform normalization, and is therefore incorrect in the general case. (Consider comparing the string U+65 U+301 with the string U+E9. These strings are equal but memcmp() would say they are different.) Luckily, normalization isn't required for this patch because of the specific hardcoded strings involved. Similarly, (I believe) normalization isn't required if all the code units have their high-bit equal to 0 in UTF-16 (which means our current LChar functions are correct). [1] http://userguide.icu-project.org/transforms/casemappings [2] http://userguide.icu-project.org/transforms/normalization
Created attachment 281111 [details] Patch for committing
Comment on attachment 281111 [details] Patch for committing Clearing flags on attachment: 281111 Committed r201978: <http://trac.webkit.org/changeset/201978>
(In reply to comment #15) > Equality requires unicode normalization [2] Yes, Unicode normalization is an important topic, but it is not integrated into WTF and WebKit at the "equality function" level. So the == operator and functions like equalIgnoringASCII case do not deal with normalization, nor do we intend to change that. ICU makes the same design choice, and does not offer functions that make equality comparisons while dealing with normalization. Instead normalization is modeled as a transformation you make on text before doing things like equality comparisons. This equalNonLatin1 function I proposed (intended to be local to this source file), fit into those families of functions and the library designs of both ICU and WebKit from the point of view of normalization. The rest of what you say is accurate, but I don’t think it’s relevant to my equalNonLatin1 proposal.
Comment on attachment 281111 [details] Patch for committing View in context: https://bugs.webkit.org/attachment.cgi?id=281111&action=review > Source/WTF/wtf/text/StringCommon.h:326 > +template<typename StringClass, unsigned length> bool naiveEqualWithoutPerformingUnicodeNormalization(const StringClass& a, const UChar (&codepoints)[length]) This is not a good function name. None of the other equality operations perform Unicode normalization, nor should they. And we don’t want to add “naive” and “without performing unicode normalization” to lots of other function names. There handling of the second argument in this template is risky; it’s relatively common to construct literals with null character termination, but this function expects an array without a terminating null character. I don’t think that is going to be clear to callers.
Comment on attachment 281111 [details] Patch for committing View in context: https://bugs.webkit.org/attachment.cgi?id=281111&action=review >> Source/WTF/wtf/text/StringCommon.h:326 >> +template<typename StringClass, unsigned length> bool naiveEqualWithoutPerformingUnicodeNormalization(const StringClass& a, const UChar (&codepoints)[length]) > > This is not a good function name. None of the other equality operations perform Unicode normalization, nor should they. And we don’t want to add “naive” and “without performing unicode normalization” to lots of other function names. > > There handling of the second argument in this template is risky; it’s relatively common to construct literals with null character termination, but this function expects an array without a terminating null character. I don’t think that is going to be clear to callers. What I meant before is: This function should be named "equal"; the *many* other functions named equal have the same semantics as this one. None are any more or less "naive" and none perform normalization. Maybe the second problem, where it’s not obvious that this takes an array of UTF-16 code units that is *not* terminated with a null character, is something we can live with. I’m not sure what to make of the argument name "codepoints", since the argument is an array of UTF-16 code units, not of code points. I suggest using a different name, perhaps "codeUnits".
Comment on attachment 281111 [details] Patch for committing View in context: https://bugs.webkit.org/attachment.cgi?id=281111&action=review > Source/WebCore/platform/graphics/FontCache.cpp:169 > + static NeverDestroyed<AtomicString> helvetica("Helvetica", AtomicString::ConstructFromLiteral); > + static NeverDestroyed<AtomicString> timesNewRoman("Times New Roman", AtomicString::ConstructFromLiteral); > + static NeverDestroyed<AtomicString> courierNew("Courier New", AtomicString::ConstructFromLiteral); > + static NeverDestroyed<AtomicString> arial("Arial", AtomicString::ConstructFromLiteral); > + static NeverDestroyed<AtomicString> courier("Courier", AtomicString::ConstructFromLiteral); > + static NeverDestroyed<AtomicString> times("Times", AtomicString::ConstructFromLiteral); Normally we sort things like this alphabetically to avoid having them be in an arbitrary order. > Source/WebCore/platform/graphics/cocoa/FontCacheCoreText.cpp:805 > + static const UChar songtiString[] = { 0x5b8b, 0x4f53 }; > + static const UChar weiruanYaHeiString[] = { 0x5fae, 0x8f6f, 0x96c5, 0x9ed1 }; > + static const UChar heitiString[] = { 0x9ed1, 0x4f53 }; > + static const UChar weiruanZhengHeitiString[] = { 0x5fae, 0x8edf, 0x6b63, 0x9ed1, 0x9ad4 }; > + static const UChar weiruanXinXiMingTi[] = { 0x5fae, 0x8edf, 0x65b0, 0x7d30, 0x660e, 0x9ad4 }; Normally we sort things like this alphabetically to avoid having them be in an arbitrary order. > Source/WebCore/platform/graphics/cocoa/FontCacheCoreText.cpp:809 > +#if (PLATFORM(MAC) && __MAC_OS_X_VERSION_MIN_REQUIRED >= 101100) || PLATFORM(IOS) I think we should put the past compatibility for Mac case first, then the modern version. And I think this should just be: #if PLATFORM(MAC) && __MAC_OS_X_VERSION_MIN_REQUIRED < 101100 ... old ... #else ... modern/iOS ... #endif > Source/WebCore/platform/graphics/freetype/FontCacheFreeType.cpp:386 > +const AtomicString& FontCache::platformAlternateFamilyName(const AtomicString& familyName) Should omit this argument name since it is unused.
Committed r201979: <http://trac.webkit.org/changeset/201979>
Build fix in http://trac.webkit.org/changeset/201979
> I’m not sure what to make of the argument name "codepoints", since the > argument is an array of UTF-16 code units, not of code points. I suggest > using a different name, perhaps "codeUnits". Yeah, "code points" is definitely wrong!
Comment on attachment 281111 [details] Patch for committing View in context: https://bugs.webkit.org/attachment.cgi?id=281111&action=review >>> Source/WTF/wtf/text/StringCommon.h:326 >>> +template<typename StringClass, unsigned length> bool naiveEqualWithoutPerformingUnicodeNormalization(const StringClass& a, const UChar (&codepoints)[length]) >> >> This is not a good function name. None of the other equality operations perform Unicode normalization, nor should they. And we don’t want to add “naive” and “without performing unicode normalization” to lots of other function names. >> >> There handling of the second argument in this template is risky; it’s relatively common to construct literals with null character termination, but this function expects an array without a terminating null character. I don’t think that is going to be clear to callers. > > What I meant before is: > > This function should be named "equal"; the *many* other functions named equal have the same semantics as this one. None are any more or less "naive" and none perform normalization. > > Maybe the second problem, where it’s not obvious that this takes an array of UTF-16 code units that is *not* terminated with a null character, is something we can live with. > > I’m not sure what to make of the argument name "codepoints", since the argument is an array of UTF-16 code units, not of code points. I suggest using a different name, perhaps "codeUnits". The null-termination is a byproduct of string literals, which are incompatible with this function. Therefore, I don't think forcing every caller to explicitly list a 0 at the end of their array is a good idea. I didn't know our policy of viewing normalization as orthogonal from equality. I just assumed that our LChar functions didn't do normalization because it is unnecessary (but a more general function taking arbitrary unicode should do the normalization). I'm happy to rename this function to "equal".
Comment on attachment 281111 [details] Patch for committing View in context: https://bugs.webkit.org/attachment.cgi?id=281111&action=review >> Source/WebCore/platform/graphics/FontCache.cpp:169 >> + static NeverDestroyed<AtomicString> times("Times", AtomicString::ConstructFromLiteral); > > Normally we sort things like this alphabetically to avoid having them be in an arbitrary order. "Normally" ... I see what you did there :P
Committed r201982: <http://trac.webkit.org/changeset/201982>
The test fails every time on El Capitan and on iOS: http://webkit-test-results.webkit.org/dashboards/flakiness_dashboard.html#showAllRuns=true&tests=fast%2Ftext%2Fchinese-font-name-aliases-2.html
(In reply to comment #28) > The test fails every time on El Capitan and on iOS: > > http://webkit-test-results.webkit.org/dashboards/flakiness_dashboard. > html#showAllRuns=true&tests=fast%2Ftext%2Fchinese-font-name-aliases-2.html This is expected - I marked it as failing on ElCapitan. See http://trac.webkit.org/changeset/201978/trunk/LayoutTests/platform/mac/TestExpectations The test is platform-specific, and the -expected.html is different on El Capitan and Yosemite, so the two tests are the same, but the -expected.html files are different, and one is marked as failing on El Capitan and the other is marked as failing on Yosemite. I didn't have any luck with platform-specific reftest -expected.html files, so I had to go with this solution.
These tests are passing on GTK; removing our failure expectation....