Bug 201535 - Web Inspector: Formatter: Pretty Print HTML resources (including inline <script>/<style>)
Summary: Web Inspector: Formatter: Pretty Print HTML resources (including inline <scri...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Web Inspector (show other bugs)
Version: WebKit Nightly Build
Hardware: All All
: P2 Normal
Assignee: Joseph Pecoraro
URL:
Keywords: InRadar
: 156082 (view as bug list)
Depends on:
Blocks: 203748 201624 203722
  Show dependency treegraph
 
Reported: 2019-09-05 23:37 PDT by Joseph Pecoraro
Modified: 2019-11-01 11:01 PDT (History)
21 users (show)

See Also:


Attachments
[PATCH] WIP - Pretty Printing (63.37 KB, patch)
2019-09-05 23:38 PDT, Joseph Pecoraro
no flags Details | Formatted Diff | Diff
[IMAGE] HTMLFormatter Tool (746.37 KB, image/png)
2019-09-05 23:40 PDT, Joseph Pecoraro
no flags Details
[IMAGE] SourceMap Tool (943.03 KB, image/png)
2019-09-06 20:54 PDT, Joseph Pecoraro
no flags Details
[PATCH] WIP - Pretty Printing and Stepping Through Formatted <script> (232.35 KB, patch)
2019-09-06 20:56 PDT, Joseph Pecoraro
no flags Details | Formatted Diff | Diff
[TEST] Simple Test Page with Inline <script> (440 bytes, text/html)
2019-09-06 20:56 PDT, Joseph Pecoraro
no flags Details
[PATCH] Proposed Fix (526.53 KB, patch)
2019-09-09 18:21 PDT, Joseph Pecoraro
no flags Details | Formatted Diff | Diff
[PATCH] Proposed Fix (640.68 KB, patch)
2019-09-09 18:58 PDT, Joseph Pecoraro
no flags Details | Formatted Diff | Diff
[PATCH] Proposed Fix (665.66 KB, patch)
2019-09-10 11:31 PDT, Joseph Pecoraro
no flags Details | Formatted Diff | Diff
[PATCH] Proposed Fix (661.83 KB, patch)
2019-09-10 11:49 PDT, Joseph Pecoraro
no flags Details | Formatted Diff | Diff
[PATCH] Proposed Fix (662.42 KB, patch)
2019-09-10 16:05 PDT, Joseph Pecoraro
no flags Details | Formatted Diff | Diff
[PATCH] Proposed Fix (428.81 KB, patch)
2019-09-11 17:00 PDT, Joseph Pecoraro
no flags Details | Formatted Diff | Diff
[PATCH] Proposed Fix (534.42 KB, patch)
2019-09-12 11:06 PDT, Joseph Pecoraro
hi: review+
hi: commit-queue-
Details | Formatted Diff | Diff
[PATCH] For Landing (537.46 KB, patch)
2019-09-13 01:33 PDT, Joseph Pecoraro
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Joseph Pecoraro 2019-09-05 23:37:17 PDT
Pretty Print HTML resources (including inline <script>/<style>)

Some pretty printing opportunities:
- Clean up element attributes to always be quoted
- Clean up indenting to always be consistent

After pretty printing it would be important to:
- Set breakpoints in inline <script>
- Step through pretty printed inline <script>
- Ensure Style sidebar links to <style> go to the right place
Comment 1 Joseph Pecoraro 2019-09-05 23:37:28 PDT
<rdar://problem/29119232>
Comment 2 Joseph Pecoraro 2019-09-05 23:38:30 PDT Comment hidden (obsolete)
Comment 3 Joseph Pecoraro 2019-09-05 23:40:33 PDT
Created attachment 378161 [details]
[IMAGE] HTMLFormatter Tool
Comment 4 Joseph Pecoraro 2019-09-06 20:54:57 PDT
Created attachment 378272 [details]
[IMAGE] SourceMap Tool
Comment 5 Joseph Pecoraro 2019-09-06 20:56:01 PDT Comment hidden (obsolete)
Comment 6 Joseph Pecoraro 2019-09-06 20:56:34 PDT
Created attachment 378274 [details]
[TEST] Simple Test Page with Inline <script>
Comment 7 Joseph Pecoraro 2019-09-06 20:58:36 PDT Comment hidden (obsolete)
Comment 8 Joseph Pecoraro 2019-09-09 18:21:33 PDT
Created attachment 378423 [details]
[PATCH] Proposed Fix
Comment 9 Joseph Pecoraro 2019-09-09 18:47:31 PDT Comment hidden (obsolete)
Comment 10 Joseph Pecoraro 2019-09-09 18:58:01 PDT
Created attachment 378429 [details]
[PATCH] Proposed Fix

Keep a self closing tag "<br />" outputting as "<br/>" instead of just "<br>". Since we want to be a whitespace only pretty printer.
Comment 11 Joseph Pecoraro 2019-09-10 11:31:37 PDT
Created attachment 378471 [details]
[PATCH] Proposed Fix

Fix broken tests, and add new tests for pause locations in inline <script>.
Comment 12 Joseph Pecoraro 2019-09-10 11:49:04 PDT
Created attachment 378472 [details]
[PATCH] Proposed Fix
Comment 13 Joseph Pecoraro 2019-09-10 16:02:18 PDT
Oops, easy test fixes. Apparently WI.Resource get scripts() is not in a sorted order!
Comment 14 Joseph Pecoraro 2019-09-10 16:05:49 PDT
Created attachment 378503 [details]
[PATCH] Proposed Fix
Comment 15 Devin Rousso 2019-09-10 19:36:07 PDT Comment hidden (obsolete)
Comment 16 Joseph Pecoraro 2019-09-11 16:56:24 PDT Comment hidden (obsolete)
Comment 17 Joseph Pecoraro 2019-09-11 17:00:22 PDT
Created attachment 378600 [details]
[PATCH] Proposed Fix
Comment 18 Joseph Pecoraro 2019-09-11 17:07:36 PDT
Comment on attachment 378600 [details]
[PATCH] Proposed Fix

View in context: https://bugs.webkit.org/attachment.cgi?id=378600&action=review

> LayoutTests/inspector/formatting/resources/source-map-utilities.js:92
> +                let nextOriginalChar = originalLines[line][column];
> +                let nextFormattedChar = formattedLines[formattedLine][formattedColumn];
> +                if (nextOriginalChar !== undefined)
> +                    InspectorTest.assert(nextOriginalChar === nextFormattedChar, `FAIL: Mapping appears to be to a different token. ${JSON.stringify(nextOriginalChar)} => ${JSON.stringify(nextFormattedChar)}`);

This assertion a few legit issues.

    - The closing tag for comments (and doctype, cdata, etc) need to have their position in case the inner text ended with a newline.
    - One of the attribute cases was not outputting the `namePos`.

Also now we only have to search the output for "FAIL" for the obvious bad cases (of which there should be none right now!).

> Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLParser.js:475
> +    ErrorText: "error-text",

This new node type is for EOF remaining text. Our TreeBuilder formats passes this through as text and HTMLFormatter outputs it as text.
Comment 19 Joseph Pecoraro 2019-09-11 21:20:49 PDT
I can mark these tests as SLOW. They indeed have a lot of output and timed out on debug WK1 bots:
inspector/formatting/source-map-html-1.html
inspector/formatting/source-map-html-2.html
inspector/formatting/source-map-javascript-1.html
inspector/formatting/source-map-css-1.html
Comment 20 Joseph Pecoraro 2019-09-11 21:22:38 PDT
Not sure why the wincairo both failed to apply. This file is in the patch:
LayoutTests/inspector/formatting/resources/html-tests/not-well-formed-3.html
Comment 22 Joseph Pecoraro 2019-09-12 11:06:18 PDT
Created attachment 378658 [details]
[PATCH] Proposed Fix

Lets see if the bots do better on this patch.
Comment 23 Devin Rousso 2019-09-12 17:58:28 PDT
Comment on attachment 378658 [details]
[PATCH] Proposed Fix

View in context: https://bugs.webkit.org/attachment.cgi?id=378658&action=review

r=me, given how large this patch is, it's getting very difficult to track differences between each iteration, but the tests look great and I think I've looked at enough of the code enough times to think that this is good to go.

Please run this through ESLint once before landing, as I did notice some style issues (I pointed out the ones I saw, but there may be others).

Also, it looks like inspector/formatting/source-map-html-1.html is still too slow for some reason, so that needs to be fixed as well.

Fantastic work!

Please make some followup bugs for some of the other things we've discussed:
 - ensure that `<p><p>` are siblings, not parent-child
 - add XML/SVG support

> Source/WebInspectorUI/UserInterface/Workers/Formatter/FormatterContentBuilder.js:120
> +    appendNonToken(string)

Why does this need to exist?  Shouldn't every character that isn't whitespace be considered a token?

> Source/WebInspectorUI/UserInterface/Workers/Formatter/FormatterContentBuilder.js:143
> +        console.assert(!string.includes("\n"), "Appended a string with newlines. This breaks the source map.");

We should also assert that `!/\s$/.test(string)`, since any trailing whitespace should really be handled by `appendSpace()`.

> Source/WebInspectorUI/UserInterface/Workers/Formatter/FormatterContentBuilder.js:155
> +

Style: extra newline

> Source/WebInspectorUI/UserInterface/Workers/Formatter/FormatterContentBuilder.js:178
> +        this.lastTokenWasWhitespace = false;

Shouldn't this be `true` if we just added an indent?

> Source/WebInspectorUI/UserInterface/Workers/Formatter/FormatterContentBuilder.js:303
> +            let indent = this._indentCache[this._indent];
> +            if (indent)

This seems wrong.  If it doesn't exist in `this._indentCache`, shouldn't we add it?  Is it even possible to hit this?

> Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLFormatter.js:100
> +        for (let childNode of children) {

NIT: this could just be `child`

> Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLFormatter.js:123
> +

Style: extra newline

> Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLFormatter.js:144
> +        let {name, value, quote, namePos, valuePos} = attr;

Why not just destructure this in the parameters list?

> Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLFormatter.js:147
> +            let q;

Style: this needs a default value

> Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLFormatter.js:293
> +            let closingDoctypeString = ">";

Style: `const`

> Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLFormatter.js:300
> +            let closingCDataString = "]]>";

Ditto (293)

> Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLParser.js:32
> +        console.assert(treeBuilder);

What about an assertion for `sourceText`?

> Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLParser.js:54
> +    _isEOF()

Aside: we could totally make a base `Parser` class that holds all of these functions, as I'm sure they'd also be useful elsewhere

> Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLParser.js:64
> +    _peekChararacter(c)

This is spelled very wrong lol.

Can we merge this with `_peekString`?  Since there's no real difference between a character and a string in JS, we could just have `_peekString` handle single character peeks.

> Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLParser.js:69
> +    _peekRegex(regex)

This name may be a bit misleading, because it only peeks a single character, and `RegExp` can handle more than that.  Perhaps this should be `_peekCharacterRegex`?  Or even better, assert that `regex.source.startsWith("^")` and allow it to test `this._data.substring(this._pos)`.

> Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLParser.js:90
> +            let c = str[i];

This should be moved lower (or inlined) so that it isn't evaluated if we take the `!d` branch.

> Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLParser.js:106
> +

Style: extra whitespace

> Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLParser.js:276
> +                if (this._peekChararacter("/")) {
> +                    if (this._peekString("/>")) {

Why are we doing double the effort and peeking for "/" twice?  Can we not just `this._peekString("/>")`?

> Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLParser.js:308
> +        if (this._peekChararacter("/")) {
> +            if (this._peekString("/>")) {

Ditto (275)

> Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLTreeBuilderFormatter.js:102
> +                nodesToPop++;

Please move this above the `if` since it's called in both branches.  That way, we only need one call.

> Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLTreeBuilderFormatter.js:153
> +        }

Style: missing semicolon

> Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLTreeBuilderFormatter.js:164
> +        }

Ditto (153)
Comment 24 Joseph Pecoraro 2019-09-13 01:11:30 PDT
Comment on attachment 378658 [details]
[PATCH] Proposed Fix

View in context: https://bugs.webkit.org/attachment.cgi?id=378658&action=review

>> Source/WebInspectorUI/UserInterface/Workers/Formatter/FormatterContentBuilder.js:120
>> +    appendNonToken(string)
> 
> Why does this need to exist?  Shouldn't every character that isn't whitespace be considered a token?

This does not need to exist. We can aim to phase it out.

This saves a little work:

  - for output characters that immediately follow tokens such that we don't need to provide an additional position mapping which will just ignored because it is in the right position
  - avoid additional original position calculations with every appendToken

I think for now the '=' might need to carry over a position from the HTMLParser if we wanted to drop this entirely.

>> Source/WebInspectorUI/UserInterface/Workers/Formatter/FormatterContentBuilder.js:143
>> +        console.assert(!string.includes("\n"), "Appended a string with newlines. This breaks the source map.");
> 
> We should also assert that `!/\s$/.test(string)`, since any trailing whitespace should really be handled by `appendSpace()`.

Hmm, I'm not sure that matters in all cases. I can imagine a TextNode in HTML may end with whitespace. You could argue that we could strip that at any point, but I'm going to leave this as is for now.

>> Source/WebInspectorUI/UserInterface/Workers/Formatter/FormatterContentBuilder.js:178
>> +        this.lastTokenWasWhitespace = false;
> 
> Shouldn't this be `true` if we just added an indent?

Sounds good. I also dropped the `_startOfLine = false`.

>> Source/WebInspectorUI/UserInterface/Workers/Formatter/FormatterContentBuilder.js:303
>> +            if (indent)
> 
> This seems wrong.  If it doesn't exist in `this._indentCache`, shouldn't we add it?  Is it even possible to hit this?

The check above this ensures something is in the cache or not already. This check avoids empty string appends (when indent is 0).

>> Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLFormatter.js:144
>> +        let {name, value, quote, namePos, valuePos} = attr;
> 
> Why not just destructure this in the parameters list?

Readability. Not all things that are possible should be done. It is easier to read and understand this code when the method signature shows it takes an Attr node. If the destructure were to happen in the parameter list it is not necessarily clear that those properties are the properties on an Attr node unless you already knew what the properties were on an Attr node.

>> Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLFormatter.js:147
>> +            let q;
> 
> Style: this needs a default value

I don't see anything wrong with this. It is a valid initialization and we do this in hundreds of places in our code base. It would be considered a "dead store" in most static analyzers unless we were to rewrite the following code just to make the initial assignment do something useful. I will make the default case assign an empty string to ensure all cases provide a value.

>> Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLFormatter.js:293
>> +            let closingDoctypeString = ">";
> 
> Style: `const`

FWIW these comments don't appear to add any value. `let` or `const` here isn't going to change the meaning of this code, runtime behavior of the code, or make it more or less readable. I realize these values can be inlined but I pulled them out into a variable to enhance readability ("why this string, oh the variable name").

Do you see an advantage to const here other than it could be const?

>> Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLParser.js:32
>> +        console.assert(treeBuilder);
> 
> What about an assertion for `sourceText`?

Good idea!

>> Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLParser.js:64
>> +    _peekChararacter(c)
> 
> This is spelled very wrong lol.
> 
> Can we merge this with `_peekString`?  Since there's no real difference between a character and a string in JS, we could just have `_peekString` handle single character peeks.

Hah, how have we missed that until now!

I've switched to using _peekString everywhere. My original concern was performance but this performs well always using _peekString.

>> Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLParser.js:69
>> +    _peekRegex(regex)
> 
> This name may be a bit misleading, because it only peeks a single character, and `RegExp` can handle more than that.  Perhaps this should be `_peekCharacterRegex`?  Or even better, assert that `regex.source.startsWith("^")` and allow it to test `this._data.substring(this._pos)`.

I want to avoid creating substrings. I've named it _peekCharacterRegex.

>> Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLParser.js:276
>> +                    if (this._peekString("/>")) {
> 
> Why are we doing double the effort and peeking for "/" twice?  Can we not just `this._peekString("/>")`?

Good catch.
Comment 25 Joseph Pecoraro 2019-09-13 01:17:52 PDT
Comment on attachment 378658 [details]
[PATCH] Proposed Fix

View in context: https://bugs.webkit.org/attachment.cgi?id=378658&action=review

>>> Source/WebInspectorUI/UserInterface/Workers/Formatter/FormatterContentBuilder.js:178
>>> +        this.lastTokenWasWhitespace = false;
>> 
>> Shouldn't this be `true` if we just added an indent?
> 
> Sounds good. I also dropped the `_startOfLine = false`.

Hmm, changing these hurt tests in a bad way (closing </style> / </script>). I've reverted to what was in the patch.
Comment 26 Joseph Pecoraro 2019-09-13 01:33:02 PDT
Created attachment 378719 [details]
[PATCH] For Landing
Comment 27 WebKit Commit Bot 2019-09-13 03:05:46 PDT
Comment on attachment 378719 [details]
[PATCH] For Landing

Clearing flags on attachment: 378719

Committed r249831: <https://trac.webkit.org/changeset/249831>
Comment 28 Devin Rousso 2019-09-13 06:05:45 PDT
Comment on attachment 378658 [details]
[PATCH] Proposed Fix

View in context: https://bugs.webkit.org/attachment.cgi?id=378658&action=review

>>> Source/WebInspectorUI/UserInterface/Workers/Formatter/FormatterContentBuilder.js:120
>>> +    appendNonToken(string)
>> 
>> Why does this need to exist?  Shouldn't every character that isn't whitespace be considered a token?
> 
> This does not need to exist. We can aim to phase it out.
> 
> This saves a little work:
> 
>   - for output characters that immediately follow tokens such that we don't need to provide an additional position mapping which will just ignored because it is in the right position
>   - avoid additional original position calculations with every appendToken
> 
> I think for now the '=' might need to carry over a position from the HTMLParser if we wanted to drop this entirely.

Ah, I see.  The "don't add a mapping" part wasn't immediately obvious, but that does make sense.  It seems like the difference between `appendToken` and `appendNonToken` is worth a comment, or perhaps a more explicit name.

>>> Source/WebInspectorUI/UserInterface/Workers/Formatter/FormatterContentBuilder.js:143
>>> +        console.assert(!string.includes("\n"), "Appended a string with newlines. This breaks the source map.");
>> 
>> We should also assert that `!/\s$/.test(string)`, since any trailing whitespace should really be handled by `appendSpace()`.
> 
> Hmm, I'm not sure that matters in all cases. I can imagine a TextNode in HTML may end with whitespace. You could argue that we could strip that at any point, but I'm going to leave this as is for now.

Really, the only reason I suggested this is so that `lastTokenWasWhitespace` is kept as accurate as possible.  If `string` ends with a whitespace character, as it is now, `lastTokenWasWhitespace` would be `false`, which isn't really accurate.

>>>> Source/WebInspectorUI/UserInterface/Workers/Formatter/FormatterContentBuilder.js:178
>>>> +        this.lastTokenWasWhitespace = false;
>>> 
>>> Shouldn't this be `true` if we just added an indent?
>> 
>> Sounds good. I also dropped the `_startOfLine = false`.
> 
> Hmm, changing these hurt tests in a bad way (closing </style> / </script>). I've reverted to what was in the patch.

We likely still need `this._startOfLine = false;` given that we would've just indented.  I think the only thing we need to do is check whether the last character added was a whitespace, and if so set `lastTokenWasWhitespace` accordingly.

>>> Source/WebInspectorUI/UserInterface/Workers/Formatter/FormatterContentBuilder.js:303
>>> +            if (indent)
>> 
>> This seems wrong.  If it doesn't exist in `this._indentCache`, shouldn't we add it?  Is it even possible to hit this?
> 
> The check above this ensures something is in the cache or not already. This check avoids empty string appends (when indent is 0).

Ah, I see.  That is very subtle :(

>>> Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLFormatter.js:147
>>> +            let q;
>> 
>> Style: this needs a default value
> 
> I don't see anything wrong with this. It is a valid initialization and we do this in hundreds of places in our code base. It would be considered a "dead store" in most static analyzers unless we were to rewrite the following code just to make the initial assignment do something useful. I will make the default case assign an empty string to ensure all cases provide a value.

It's not that this is "wrong", just that I've thought we always preferred to have a variable have a default value, ensuring that we never tried to do anything with an uninitialized `let`/`const`.  That's why I put this as a "Style", since it's really just a stylistic difference.

>>> Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLFormatter.js:293
>>> +            let closingDoctypeString = ">";
>> 
>> Style: `const`
> 
> FWIW these comments don't appear to add any value. `let` or `const` here isn't going to change the meaning of this code, runtime behavior of the code, or make it more or less readable. I realize these values can be inlined but I pulled them out into a variable to enhance readability ("why this string, oh the variable name").
> 
> Do you see an advantage to const here other than it could be const?

AFAIU, our usual style was to use `const` for things that wouldn't change between invocations of the program.  Since this is effectively a constant value, it should use `const` instead of a let.  Again, this was just a "Style", so not a functionality issue.
Comment 29 Joseph Pecoraro 2019-09-13 11:24:36 PDT
> >   - for output characters that immediately follow tokens such that we don't need to provide an additional position mapping which will just ignored because it is in the right position
> >   - avoid additional original position calculations with every appendToken
> > 
> > I think for now the '=' might need to carry over a position from the HTMLParser if we wanted to drop this entirely.
> 
> Ah, I see.  The "don't add a mapping" part wasn't immediately obvious, but
> that does make sense.  It seems like the difference between `appendToken`
> and `appendNonToken` is worth a comment, or perhaps a more explicit name.

Yeah, I originally added this for when I was adding tokens that did not exist in the original source. For example converting `<input type=text>` to `<input type="text>`. But we abandoned that approach in favor of pure whitespace changes, so really we shouldn't have any such new tokens in the output now.


> >>> Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLFormatter.js:147
> >>> +            let q;
> >> 
> >> Style: this needs a default value
> > 
> > I don't see anything wrong with this. It is a valid initialization and we do this in hundreds of places in our code base. It would be considered a "dead store" in most static analyzers unless we were to rewrite the following code just to make the initial assignment do something useful. I will make the default case assign an empty string to ensure all cases provide a value.
> 
> It's not that this is "wrong", just that I've thought we always preferred to
> have a variable have a default value, ensuring that we never tried to do
> anything with an uninitialized `let`/`const`.  That's why I put this as a
> "Style", since it's really just a stylistic difference.

That has never been a style rule that I'm aware of. I seem to think it is normally done to denote initialization is immediately following in a conditional. For example:

    let value;
    if (...)
        value = ...;
    else
        value = ...;

Fortunately we can use a ternary when the "..." sections are short enough, but if not then I think we've used the above pattern and it has always been clear to me.
Comment 30 Joseph Pecoraro 2019-09-17 16:51:10 PDT
*** Bug 156082 has been marked as a duplicate of this bug. ***