WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
168182
Update custom line breaking iterators to the latest version of Unicode
https://bugs.webkit.org/show_bug.cgi?id=168182
Summary
Update custom line breaking iterators to the latest version of Unicode
Myles C. Maxfield
Reported
2017-02-11 19:49:31 PST
The custom rules we use are from 2014 and appear to be from ICU 54. We should update them.
Attachments
WIP
(46.43 KB, patch)
2017-02-12 15:16 PST
,
Myles C. Maxfield
no flags
Details
Formatted Diff
Diff
Patch
(530.45 KB, patch)
2017-02-12 20:00 PST
,
Myles C. Maxfield
zalan
: review+
Details
Formatted Diff
Diff
Show Obsolete
(1)
View All
Add attachment
proposed patch, testcase, etc.
Myles C. Maxfield
Comment 1
2017-02-11 20:04:53 PST
I went through the breaking rules line by line, and compared it to ICU's 54.1 release[1] . I found a few things: 1. This version of ICU has no concept of strict vs loose line breaking. Therefore, my comparisons are done ignoring the loose/normal/strict pieces of our custom rules. These pieces of our rules work by adding / removing characters from the existing unicode sets. Newer unicode does have a concept of strict / loose rules. 1. Our emoji handling is custom, and not included in ICU. 2. We have a couple declarations hidden behind ADDITIONAL_EMOJI_SUPPORT which ICU includes. 3. There are three constructions we have which the open source rules don't have: $EXcm $INcm; $CM* $IN $CM* $EX; $CM+ $RI; We should probably just opt all ports into the ADDITIONAL_EMOJI_SUPPORT flag and delete the flag. The first two of the different constructions have to do with characters in the inseparable class, and the second one has to do with regional indicators. I'm not sure, but my theory right now is that these are just an oversight and aren't necessary. [1]
http://source.icu-project.org/repos/icu/icu/tags/release-54-1/source/data/brkitr/line.txt
Myles C. Maxfield
Comment 2
2017-02-11 20:10:39 PST
ICU 55.1 is the first release to support the concept of loose vs strict line breaking. However, I'm not sure which ICU is used on the oldest OS we support.
Myles C. Maxfield
Comment 3
2017-02-11 20:22:07 PST
I believe the oldest OS release we support is El Capitan, which uses ICU 55, which has loose & strict line breaking. Perhaps the best solution is to just remove the custom rules entirely.
Myles C. Maxfield
Comment 4
2017-02-11 20:23:24 PST
(In reply to
comment #3
)
> which uses ICU 55,
*** 55.1
Radar WebKit Bug Importer
Comment 5
2017-02-11 20:30:26 PST
<
rdar://problem/30481669
>
Myles C. Maxfield
Comment 6
2017-02-12 15:16:53 PST
Created
attachment 301320
[details]
WIP
Myles C. Maxfield
Comment 7
2017-02-12 15:19:11 PST
See also:
https://bugs.webkit.org/show_bug.cgi?id=89235
Myles C. Maxfield
Comment 8
2017-02-12 20:00:09 PST
Created
attachment 301329
[details]
Patch
Myles C. Maxfield
Comment 9
2017-02-13 11:09:32 PST
Committed
r212235
: <
http://trac.webkit.org/changeset/212235
>
Saam Barati
Comment 10
2020-04-28 21:32:29 PDT
Comment on
attachment 301329
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=301329&action=review
> Source/WTF/wtf/text/LineBreakIteratorPoolICU.h:54 > + Vector<char> scratchBuffer(utf8Locale.length() + 11, 0);
Was this supposed to be + 1?
Myles C. Maxfield
Comment 11
2020-05-04 23:18:43 PDT
Comment on
attachment 301329
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=301329&action=review
>> Source/WTF/wtf/text/LineBreakIteratorPoolICU.h:54 >> + Vector<char> scratchBuffer(utf8Locale.length() + 11, 0); > > Was this supposed to be + 1?
Nope. The below code will add things like "@lb=normal" to the string, which is 10 characters long. Plus an additional one for the '\0' makes 11.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug