Bug 180539 - ApplicationManifestParser should strip whitespace from the raw input
Summary: ApplicationManifestParser should strip whitespace from the raw input
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2017-12-07 11:45 PST by David Quesada
Modified: 2017-12-11 10:32 PST (History)
10 users (show)

See Also:


Attachments
Patch (4.94 KB, patch)
2017-12-07 14:04 PST, David Quesada
joepeck: review+
Details | Formatted Diff | Diff
Patch v2 (6.12 KB, patch)
2017-12-08 09:14 PST, David Quesada
joepeck: review+
Details | Formatted Diff | Diff
Patch for landing (6.41 KB, patch)
2017-12-08 11:12 PST, David Quesada
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews104 for mac-elcapitan-wk2 (2.56 MB, application/zip)
2017-12-08 12:59 PST, EWS Watchlist
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description David Quesada 2017-12-07 11:45:42 PST
Parsing manifests with additional whitespace seemed to work in earlier versions of my patches that used JavaScriptCore for parsing the manifest. The current method of parsing JSON doesn't like this, and many manifests from the web that were working before no longer parse.
Comment 1 Radar WebKit Bug Importer 2017-12-07 11:50:31 PST
<rdar://problem/35915075>
Comment 2 David Quesada 2017-12-07 14:04:28 PST
Created attachment 328735 [details]
Patch
Comment 3 Joseph Pecoraro 2017-12-07 16:41:49 PST
Comment on attachment 328735 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=328735&action=review

r=me

> Source/WebCore/Modules/applicationmanifest/ApplicationManifestParser.cpp:60
>      RefPtr<JSON::Value> jsonValue;
> -    if (!JSON::Value::parseJSON(text, jsonValue)) {
> +    if (!JSON::Value::parseJSON(text.stripWhiteSpace(), jsonValue)) {

Does this actually change behavior? parseJSON looks to ignore leading whitespace:

> Token parseToken(const UChar* start, const UChar* end, const UChar** tokenStart, const UChar** tokenEnd)
> {
>     while (start < end && isSpaceOrNewline(*start))
>         ++start;
>     ...

It might also be good to have a link to the spec at the top of this file somewhere for quick reference. I suspect that is:
https://w3c.github.io/manifest/

This part appears to be:
https://w3c.github.io/manifest/#dfn-obtaining-the-manifest

1. Let json be the result of parsing text. If parsing throws an error: ...

And here parseJSON(text) should satisfy this without the stripWhiteSpace.
Comment 4 David Quesada 2017-12-07 17:00:57 PST
(In reply to Joseph Pecoraro from comment #3)
> Comment on attachment 328735 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=328735&action=review
> 
> r=me
> 
> > Source/WebCore/Modules/applicationmanifest/ApplicationManifestParser.cpp:60
> >      RefPtr<JSON::Value> jsonValue;
> > -    if (!JSON::Value::parseJSON(text, jsonValue)) {
> > +    if (!JSON::Value::parseJSON(text.stripWhiteSpace(), jsonValue)) {
> 
> Does this actually change behavior? parseJSON looks to ignore leading
> whitespace:
> 
> > Token parseToken(const UChar* start, const UChar* end, const UChar** tokenStart, const UChar** tokenEnd)
> > {
> >     while (start < end && isSpaceOrNewline(*start))
> >         ++start;
> >     ...
> 
> It might also be good to have a link to the spec at the top of this file
> somewhere for quick reference. I suspect that is:
> https://w3c.github.io/manifest/
> 
> This part appears to be:
> https://w3c.github.io/manifest/#dfn-obtaining-the-manifest
> 
> 1. Let json be the result of parsing text. If parsing throws an error: ...
> 
> And here parseJSON(text) should satisfy this without the stripWhiteSpace.

The issue is with trailing whitespace. JSON::Value::parseJSON() (JSONValues.cpp:531) appears to fail if there are any characters remaining after parsing the value:

    const UChar* start = characters;
    const UChar* end = start + jsonInput.length();
    const UChar* tokenEnd;
    auto result = buildValue(start, end, &tokenEnd, 0);
    if (!result || tokenEnd != end)
        return false;

where actual JSON parsing seems to handle it just fine:

    > JSON.parse("\t[1,2,3] \n")
    < [1, 2, 3]

Also the fact that some manifests that apparently have trailing whitespace (e.g. https://www.progressivewebflap.com) parsed prior to switching to JSON::Value::parseJSON(), but not after, (but did again after this patch), means this *is* a change in behavior.
Comment 5 Joseph Pecoraro 2017-12-07 17:07:24 PST
> > 1. Let json be the result of parsing text. If parsing throws an error: ...
> > 
> > And here parseJSON(text) should satisfy this without the stripWhiteSpace.
> 
> The issue is with trailing whitespace. JSON::Value::parseJSON()
> (JSONValues.cpp:531) appears to fail if there are any characters remaining
> after parsing the value:

Sounds like we should fix JSON::Value::parseJSON then.
Comment 6 David Quesada 2017-12-08 09:14:03 PST
Created attachment 328824 [details]
Patch v2

Fix JSON::Value::parseJSON() instead of stripping its input in ApplicationManifestParser.
Comment 7 Joseph Pecoraro 2017-12-08 10:57:50 PST
Comment on attachment 328824 [details]
Patch v2

View in context: https://bugs.webkit.org/attachment.cgi?id=328824&action=review

> Tools/TestWebKitAPI/Tests/WTF/JSONValue.cpp:645
> +    }

I'd suggest another section to test leading/trailing whitespace permutations:

    {
        RefPtr<JSON::Value> value;
        EXPECT_TRUE(JSON::Value::parseJSON(" 1", value));
        EXPECT_TRUE(JSON::Value::parseJSON("\t1", value));
        EXPECT_TRUE(JSON::Value::parseJSON("\n1", value));
        EXPECT_TRUE(JSON::Value::parseJSON("1 ", value));
        EXPECT_TRUE(JSON::Value::parseJSON("1\t", value));
        EXPECT_TRUE(JSON::Value::parseJSON("1\n", value));
        EXPECT_TRUE(JSON::Value::parseJSON(" 1 ", value));
        EXPECT_TRUE(JSON::Value::parseJSON(" {} ", value));
        EXPECT_TRUE(JSON::Value::parseJSON(" [] ", value));

        EXPECT_FALSE(JSON::Value::parseJSON("1 1", value));
        EXPECT_FALSE(JSON::Value::parseJSON("{} {}", value));
        EXPECT_FALSE(JSON::Value::parseJSON("[] []", value));
    }
Comment 8 David Quesada 2017-12-08 11:12:49 PST
Created attachment 328837 [details]
Patch for landing
Comment 9 EWS Watchlist 2017-12-08 12:59:40 PST Comment hidden (obsolete)
Comment 10 EWS Watchlist 2017-12-08 12:59:42 PST Comment hidden (obsolete)
Comment 11 Joseph Pecoraro 2017-12-08 13:44:38 PST
Comment on attachment 328837 [details]
Patch for landing

cq+ again, failure is unrelated.
Comment 12 WebKit Commit Bot 2017-12-08 14:05:11 PST
Comment on attachment 328837 [details]
Patch for landing

Clearing flags on attachment: 328837

Committed r225699: <https://trac.webkit.org/changeset/225699>
Comment 13 WebKit Commit Bot 2017-12-08 14:05:12 PST
All reviewed patches have been landed.  Closing bug.
Comment 14 Alexey Proskuryakov 2017-12-08 16:49:57 PST
Comment on attachment 328837 [details]
Patch for landing

View in context: https://bugs.webkit.org/attachment.cgi?id=328837&action=review

> Source/WTF/wtf/JSONValues.cpp:545
> -    if (!result || tokenEnd != end)
> +    if (!result)
>          return false;
>  
> +    for (const UChar* valueEnd = tokenEnd; valueEnd < end; ++valueEnd) {
> +        if (!isSpaceOrNewline(*valueEnd))
> +            return false;
> +    }

Where is the ChangeLog for these changes? Also, a change like this needs a layout test validated against the spec and other browsers, not an API test.

> Source/WebCore/ChangeLog:10
> +        * Modules/applicationmanifest/ApplicationManifestParser.cpp:
> +        (WebCore::ApplicationManifestParser::parseManifest):

This change is not in the patch, and was not committed.
Comment 15 Alexey Proskuryakov 2017-12-08 16:51:00 PST
Please explain why this JSON parsing change is correct, and add a test. I don't think that we should keep this patch in the tree without that.
Comment 16 Joseph Pecoraro 2017-12-08 18:06:07 PST
(In reply to Alexey Proskuryakov from comment #15)
> Please explain why this JSON parsing change is correct, and add a test. I
> don't think that we should keep this patch in the tree without that.

This is a change to WTF::JSON::Value::parseJSON, used internally to WebKit and not directly exposed to the page (JSC::JSONParse). WTF::JSON::Value::parseJSON allowed leading whitespace but not trailing whitespace, and this change fixed that to allow trailing whitespace. This better aligns WTF::JSON's parse with the JavaScript `JSON.parse(" 1 ")` in webpages as well as JSON parsing in other environments (Ruby and Python in a quick test).

As this is not directly exposed to web pages, TestWebKitAPI unit tests were written to test this directly. Additionally a specific Web App Manifest test was added where WTF::JSON::Value is used to parse a JSON manifest specified by the page that previously would have failed.

I missed that the ChangeLogs were not updated between the first and second version of the patch in my review. Since it was addressing my review comments I overlooked ChangeLogs.

I don't think this needs to be rolled out. Was there a specific kind of test you were looking for?
Comment 17 Alexey Proskuryakov 2017-12-08 19:36:15 PST
I was assuming that we were not crazy enough to have two JSON parsers! If this is not exposed to JS, then no other test is necessary.
Comment 18 Joseph Pecoraro 2017-12-11 10:32:31 PST
(In reply to Alexey Proskuryakov from comment #17)
> I was assuming that we were not crazy enough to have two JSON parsers!

Heh. The JavaScriptCore JSC::JSONParse parses and immediately produces JSValues tracked by the GC which is hard to work with if you want to interact with values in the JSON. WTF::JSON::Value::parseJSON parses and produces WTF::JSON::Values so it is basically a dictionary type easy to query, build, and stringify.