Bug 209579 - Escaped UTF-8 not parsed correctly with icu 52
Summary: Escaped UTF-8 not parsed correctly with icu 52
Status: RESOLVED WONTFIX
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: WebKit Nightly Build
Hardware: All Linux
: P2 Minor
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-03-25 19:20 PDT by Mike Gorse
Modified: 2020-04-21 22:12 PDT (History)
14 users (show)

See Also:


Attachments
Patch. (1.54 KB, patch)
2020-03-25 19:24 PDT, Mike Gorse
darin: review-
darin: commit-queue-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mike Gorse 2020-03-25 19:20:21 PDT
I'm not sure if there's a minimum version of icu that we support; I noticed this while trying to prepare a security update for SLED 12 SP2, which has old versions of many things, including icu.
Anyhow, prior to icu 53, U8_COUNT_TRAIL_BYTES does not cast its parameter to unsigned before doing comparisons, and it ends up always returning 0 if a signed char is passed in.
Comment 1 Mike Gorse 2020-03-25 19:24:29 PDT
Created attachment 394569 [details]
Patch.
Comment 2 Alexey Proskuryakov 2020-03-26 10:19:25 PDT
Comment on attachment 394569 [details]
Patch.

View in context: https://bugs.webkit.org/attachment.cgi?id=394569&action=review

> Source/JavaScriptCore/ChangeLog:3
> +        Escaped UTF-8 not parsed correctly with icu 52

That's... from 2013. It doesn't seem meaningful to support such systems in WebKit trunk.
Comment 3 Darin Adler 2020-03-26 11:31:14 PDT
Comment on attachment 394569 [details]
Patch.

View in context: https://bugs.webkit.org/attachment.cgi?id=394569&action=review

>> Source/JavaScriptCore/ChangeLog:3
>> +        Escaped UTF-8 not parsed correctly with icu 52
> 
> That's... from 2013. It doesn't seem meaningful to support such systems in WebKit trunk.

I agree, unless there are some extenuating circumstances that require that we support this going forward.
Comment 4 Darin Adler 2020-03-26 11:31:39 PDT
Comment on attachment 394569 [details]
Patch.

View in context: https://bugs.webkit.org/attachment.cgi?id=394569&action=review

>>> Source/JavaScriptCore/ChangeLog:3
>>> +        Escaped UTF-8 not parsed correctly with icu 52
>> 
>> That's... from 2013. It doesn't seem meaningful to support such systems in WebKit trunk.
> 
> I agree, unless there are some extenuating circumstances that require that we support this going forward.

Who has a need to build the very latest WebKit, but with very old ICU?
Comment 5 Mike Gorse 2020-03-26 13:13:56 PDT
(In reply to Darin Adler from comment #4)
> Comment on attachment 394569 [details]
> Patch.
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=394569&action=review
> 
> >>> Source/JavaScriptCore/ChangeLog:3
> >>> +        Escaped UTF-8 not parsed correctly with icu 52
> >> 
> >> That's... from 2013. It doesn't seem meaningful to support such systems in WebKit trunk.
> > 
> > I agree, unless there are some extenuating circumstances that require that we support this going forward.
> 
> Who has a need to build the very latest WebKit, but with very old ICU?

We are still, for instance, supporting SLE 12, which is on GNOME 3.20 as of SP2. But, from what I gather, it is recommended to ship the newest WebKitGTK release, in order to provide security fixes, although this becomes increasingly difficult over time when we are shipping versions of several libraries that WebKit no longer supports. Anyway, which versions of libraries to support is a larger question than just icu, and this wouldn't be the only downstream patch that I'd need to carry...
Comment 6 Darin Adler 2020-03-26 13:15:21 PDT
(In reply to Mike Gorse from comment #5)
> We are still, for instance, supporting SLE 12, which is on GNOME 3.20 as of
> SP2. But, from what I gather, it is recommended to ship the newest WebKitGTK
> release, in order to provide security fixes, although this becomes
> increasingly difficult over time when we are shipping versions of several
> libraries that WebKit no longer supports. Anyway, which versions of
> libraries to support is a larger question than just icu, and this wouldn't
> be the only downstream patch that I'd need to carry...

If updating WebKit I suggest also updating ICU to something newer than 7 years old.

If that’s not practical, then I think carrying a downstream patch would be the preferred solution.
Comment 7 Milan Crha 2020-03-26 23:51:11 PDT
(In reply to Darin Adler from comment #6)
> If updating WebKit I suggest also updating ICU to something newer than 7
> years old.

I agree with this, though if WebKit itself doesn't require newer ICU, it should work properly with the older too. In other words, if WebKit expects certain functionality or behaviour of some of its dependencies, which is available only since some version, then it's WebKit, which should claim that it requires such version of the dependencies. Just like when using new API from the dependency.

The other way around is to make the software working in the older versions as well.

Looking into Mike's change, it's quite trivial. I do not know the details, but I'd even use 'unsigned char' for the 'b0' directly, to not have it defined as 'char'. There's always a problem with the highest bit when it comes to cast between signed and unsigned types.
Comment 8 Darin Adler 2020-03-27 09:45:25 PDT
(In reply to Milan Crha from comment #7)
> if WebKit itself doesn't require newer ICU, it
> should work properly with the older too

That’s something we don’t agree on. WebKit often moves dependency forward and it’s a non-goal to work around bugs in very old versions of dependent libraries.

. In other words, if WebKit expects
> certain functionality or behaviour of some of its dependencies, which is
> available only since some version, then it's WebKit, which should claim that
> it requires such version of the dependencies. Just like when using new API
> from the dependency

Yes, the WebKit package should clearly state the dependency on a newer version of ICU. Let's fix that!

> The other way around is to make the software working in the older versions
> as well.

Easier said than done.

> Looking into Mike's change, it's quite trivial. I do not know the details,
> but I'd even use 'unsigned char' for the 'b0' directly, to not have it
> defined as 'char'. There's always a problem with the highest bit when it
> comes to cast between signed and unsigned types.

I’m not sure I agree. Is this the only problem with older ICU? We don’t test with those older versions and there is no guarantee this is sufficient.
Comment 9 Milan Crha 2020-03-30 02:51:30 PDT
(In reply to Darin Adler from comment #8)
> That’s something we don’t agree on.

Partly. You kind of contradict that statement lower in the comment. See my adds:

> WebKit often moves dependency forward and it’s a non-goal to work around
> bugs in very old versions of dependent libraries.

I definitely agree with you in this, there are no resources to try to build with every single combination of the libraries out there. Neither the software should fix bugs in the libraries it uses, that's not the way it's supposed to work by all means. There are sometimes workarounds, but the rule of thumb should be: fix the library, not me, thus everyone will benefit from the fix.

> Yes, the WebKit package should clearly state the dependency on a newer
> version of ICU. Let's fix that!

This makes us agree on the subject :)

> > The other way around is to make the software working in the older versions
> > as well.
> 
> Easier said than done.

I know. I'm sorry for the language, it was supposed to be sarcastic and hopefully (at least partly) funny by pushing into the extremes, but s I re-read it it didn't make it. I'm sorry.

> I’m not sure I agree. Is this the only problem with older ICU? We don’t test
> with those older versions and there is no guarantee this is sufficient.

I cannot answer the above question. My comments about the patch were more or less generic, knowing the issues with conversion between signed and unsigned types and what issues it can cause.

As long as you bump the version of the ICU requirement it's fine to ignore the patch.
Comment 10 Michael Catanzaro 2020-04-01 07:47:56 PDT
So our dependencies policy [1] allows us to require ICU 60 (the version available in Ubuntu 18.04).

[1] https://trac.webkit.org/wiki/WebKitGTK/DependenciesPolicy

On the other hand, this is a one-line patch. Normally when we bump dependencies, we do it because there's some actual strong reason to do so (to use a new API, or due to an incompatible change in the dependency) rather than to avoid one cast. :P I would just accept the patch; it's small and noninvasive and saves us the minor inconvenience of us both having to carry the patch downstream in RHEL and SLED. (I assume I'll need this when I try to update WebKitGTK in RHEL 7; we have ICU 50 there. :)

If we really want to bump our ICU dependency to avoid a cast, then we need to declare it properly in every Options*.cmake that uses find_package(ICU): OptionsAppleWin.cmake, OptionsFTW.cmake, OptionsGTK.cmake, OptionsJSCOnly.cmake, OptionsMac.cmake, OptionsPlayStation.cmake, OptionsWPE.cmake, and OptionsWinCairo.cmake. We currently don't pass any minimum required version there.
Comment 11 Michael Catanzaro 2020-04-01 07:52:26 PDT
(In reply to Mike Gorse from comment #0)
> Anyhow, prior to icu 53, U8_COUNT_TRAIL_BYTES does not cast its parameter to
> unsigned before doing comparisons, and it ends up always returning 0 if a
> signed char is passed in.

Oh, *wow*, so this is a runtime failure, not a build failure?

I didn't look closely enough at first and assumed it was a build failure. Since this is a runtime failure, and distributors updating WebKit will not see any build error, that seems like a pretty strong argument to take the patch; otherwise, we'll just wind up with broken behavior at runtime. Is there some example HTML that demonstrates the incorrect behavior?
Comment 12 Michael Catanzaro 2020-04-01 08:00:39 PDT
My thinking is this: a build failure is inconvenient. Carrying patches in WebKit to fix downstream build failures is also inconvenient for WebKit. If it's a big patch, that's probably too much inconvenience for upstream, and a downstream patch is warranted. If it's a one-liner, it's probably simpler for everyone to take the patch upstream.

But a runtime failure, where WebKit does not behave properly at runtime, is a footgun. We can't possibly expect distributors to patch WebKit to avoid this, because they receive no build warning to notice that anything is wrong. In particular, in this case, we're calling a library function that used to fail unless passed an unsigned char... just adding a cast to unsigned char is not very onerous.
Comment 13 Carlos Alberto Lopez Perez 2020-04-01 09:00:21 PDT
(In reply to Michael Catanzaro from comment #10)
> If we really want to bump our ICU dependency to avoid a cast, then we need
> to declare it properly in every Options*.cmake that uses find_package(ICU):
> OptionsAppleWin.cmake, OptionsFTW.cmake, OptionsGTK.cmake,
> OptionsJSCOnly.cmake, OptionsMac.cmake, OptionsPlayStation.cmake,
> OptionsWPE.cmake, and OptionsWinCairo.cmake. We currently don't pass any
> minimum required version there.

See bug 209694
Comment 14 Mike Gorse 2020-04-01 11:18:10 PDT
(In reply to Michael Catanzaro from comment #11)
> (In reply to Mike Gorse from comment #0)
> > Anyhow, prior to icu 53, U8_COUNT_TRAIL_BYTES does not cast its parameter to
> > unsigned before doing comparisons, and it ends up always returning 0 if a
> > signed char is passed in.
> 
> Oh, *wow*, so this is a runtime failure, not a build failure?
> 
> I didn't look closely enough at first and assumed it was a build failure.
> Since this is a runtime failure, and distributors updating WebKit will not
> see any build error, that seems like a pretty strong argument to take the
> patch; otherwise, we'll just wind up with broken behavior at runtime. Is
> there some example HTML that demonstrates the incorrect behavior?

Yes, it is a runtime failure. I don't have a sample off-hand. I noticed it when I tried to smoke test my WebKitGTK package and found that I couldn't authenticate a gmail connection in evolution; clicking Next on the screen that asked for an email address wouldn't move to the next screen for me to enter a password. JavaScriptCore appears to encounter %e2 somewhere on the page and fail to recognize it as the beginning of a UTF-8 sequence.