WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
Bug 202041
[YARR] Properly handle surrogates when matching back references
https://bugs.webkit.org/show_bug.cgi?id=202041
Summary
[YARR] Properly handle surrogates when matching back references
Paulo Matos
Reported
2019-09-20 02:30:10 PDT
Since the fix for
https://bugs.webkit.org/show_bug.cgi?id=201647
(whose report I don't have permissions to read), that we noticed the test stress/regexp-unicode-surrogate-pair-increment-should-involve-length-check.js (added with the mentioned fix) started failing on 32bits ARM. Turns out this is because we don't have YARR JIT enabled and it's not specific to 32bits. If I disable YARR JIT on 64bits, I can reproduce the same failure in other archs. $ WebKitBuild/Debug/bin/jsc --useRegExpJIT=0 JSTests/stress/regexp-unicode-surrogate-pair-increment-should-involve-length-check.js null Exception: Expected /([^x]+)[^]*\1([^])/u to match "�𐀀" but it didn't
Attachments
WIP - Patch
(4.02 KB, patch)
2019-09-25 07:35 PDT
,
Paulo Matos
msaboff
: review-
ews-watchlist
: commit-queue-
Details
Formatted Diff
Diff
Patch
(8.65 KB, patch)
2019-09-25 17:24 PDT
,
Michael Saboff
no flags
Details
Formatted Diff
Diff
Updated ChangeLogs citing Paulo Matos <pmatos@igalia.com> original contribution.
(8.90 KB, patch)
2019-09-25 17:33 PDT
,
Michael Saboff
keith_miller
: review+
Details
Formatted Diff
Diff
Show Obsolete
(2)
View All
Add attachment
proposed patch, testcase, etc.
Paulo Matos
Comment 1
2019-09-20 02:31:22 PDT
Adding to CC msaboff and ysuzuki who were involved with
https://bugs.webkit.org/show_bug.cgi?id=201647
Paulo Matos
Comment 2
2019-09-20 05:22:37 PDT
Related patch to add failed run to set of tests:
https://bugs.webkit.org/show_bug.cgi?id=202042
Paulo Matos
Comment 3
2019-09-23 06:32:52 PDT
I am investigating this at the moment. It seems interpretation of backreferences are broken:
>>> /([^x]+)\1/u.exec("\ud800\ud800\udc00")
null when starting jsc with $ WebKitBuild/Debug/bin/jsc --useRegExpJIT=0
Paulo Matos
Comment 4
2019-09-24 01:50:53 PDT
The current understanding is slightly different than the initial one. The RegExp JIT is broken, not the interpreter. All other engines seem to agree that: /([^x]+)[^]*\1([^])/u.exec("\ud800\ud800\udc00") should _not_ match. There are three surrogates, one unpaired (the first \ud800) and two paired with each other (\ud800\udc00). The [^x]+ should match and capture the unpaired surrogate \ud800 and the backreference should not possibly capture part of the paired surrogate with \1. This should therefore fail.
Paulo Matos
Comment 5
2019-09-25 07:35:55 PDT
Created
attachment 379547
[details]
WIP - Patch
EWS Watchlist
Comment 6
2019-09-25 09:48:12 PDT
Comment on
attachment 379547
[details]
WIP - Patch
Attachment 379547
[details]
did not pass jsc-ews (mac): Output:
https://webkit-queues.webkit.org/results/13067037
New failing tests: jsc-layout-tests.yaml/fast/regex/script-tests/backreferences.js.layout-dfg-eager-no-cjit jsc-layout-tests.yaml/js/script-tests/regexp-unicode.js.layout-no-ftl jsc-layout-tests.yaml/js/script-tests/regexp-unicode.js.layout-ftl-no-cjit jsc-layout-tests.yaml/fast/regex/script-tests/backreferences.js.layout-ftl-eager-no-cjit jsc-layout-tests.yaml/js/script-tests/regexp-unicode.js.layout-ftl-eager-no-cjit jsc-layout-tests.yaml/fast/regex/script-tests/backreferences.js.layout-no-ftl jsc-layout-tests.yaml/fast/regex/script-tests/backreferences.js.layout-no-llint jsc-layout-tests.yaml/fast/regex/script-tests/backreferences.js.layout jsc-layout-tests.yaml/fast/regex/script-tests/backreferences.js.layout-no-cjit jsc-layout-tests.yaml/js/script-tests/regexp-unicode.js.layout-dfg-eager-no-cjit jsc-layout-tests.yaml/fast/regex/script-tests/backreferences.js.layout-ftl-no-cjit jsc-layout-tests.yaml/js/script-tests/regexp-unicode.js.layout-no-cjit jsc-layout-tests.yaml/js/script-tests/regexp-unicode.js.layout jsc-layout-tests.yaml/js/script-tests/regexp-unicode.js.layout-no-llint
Michael Saboff
Comment 7
2019-09-25 16:30:10 PDT
Comment on
attachment 379547
[details]
WIP - Patch r- This patch is mostly right. It needs 2 changes. 1) We need to properly increment the patternIndex and index when matching non-BMP characters. 2) There is a register clobbering issue on X86_64 since tryReadUnicodeCharImpl() uses r10 as a temp and patternChar is also r10. Posting a patch shortly that builds on this patch and addresses these two issues.
Michael Saboff
Comment 8
2019-09-25 17:24:16 PDT
Created
attachment 379595
[details]
Patch
Michael Saboff
Comment 9
2019-09-25 17:33:03 PDT
Created
attachment 379597
[details]
Updated ChangeLogs citing Paulo Matos <
pmatos@igalia.com
> original contribution.
Paulo Matos
Comment 10
2019-09-26 08:19:37 PDT
Thanks Michael, I was still looking at this issue and just noticed you fixed it. I had a little testsuite with the most relevant tests to run: let match1 = /([^x])\1./u.exec('\ud800\ud800\udc00'); if (match1) { print('error 1'); } let match2 = /(\u{10123})x\1/u.exec('\u{10123}x\u{10123}'); if (!match2 || match2[1] !== '\u{10123}') { print('error 2'); } let match3 = /(\ud800\udd23)x\1/u.exec('\u{10123}x\u{10123}'); if (!match3 || match3[1] !== '\u{10123}') { print('error 3'); } let match4 = /(\u{10123})x\ud800\udd23/u.exec('\u{10123}x\u{10123}'); if (!match4 || match4[1] !== '\u{10123}') { print('error 4'); } let match5 = /(\ud800)\udd23x\1/u.exec('\u{10123}x\u{10123}'); if (match5) { print('error 5'); } let match6 = /([^x])\ud800./u.exec('\ud800\ud800\udc00'); if (match6) { print('error 6'); } print('DONE'); All bots pass and the above tests pass too so I think it's fixed. I didn't recognize the problems you mentioned and debugging this for the first time turned out to be a long exercise in C++ and x86_64 assembly reading. Once again, thanks for the help on this.
Keith Miller
Comment 11
2019-10-01 10:48:35 PDT
Comment on
attachment 379597
[details]
Updated ChangeLogs citing Paulo Matos <
pmatos@igalia.com
> original contribution. r=me. I'll take your word on some of the character encodings because I don't know them very well.
Keith Miller
Comment 12
2019-10-01 10:49:27 PDT
Comment on
attachment 379597
[details]
Updated ChangeLogs citing Paulo Matos <
pmatos@igalia.com
> original contribution. Can you also add Paulos' test?
Michael Saboff
Comment 13
2019-10-01 11:01:43 PDT
(In reply to Keith Miller from
comment #12
)
> Comment on
attachment 379597
[details]
> Updated ChangeLogs citing Paulo Matos <
pmatos@igalia.com
> original > contribution. > > Can you also add Paulos' test?
His test is a collection of subtests that where failing with his patch. They are already part of our existing tests.
Michael Saboff
Comment 14
2019-10-01 11:23:56 PDT
Committed
r250568
: <
https://trac.webkit.org/changeset/250568
>
Radar WebKit Bug Importer
Comment 15
2019-10-01 11:24:24 PDT
<
rdar://problem/55883253
>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug