RESOLVED FIXED 201535
Web Inspector: Formatter: Pretty Print HTML resources (including inline <script>/<style>)
https://bugs.webkit.org/show_bug.cgi?id=201535
Summary Web Inspector: Formatter: Pretty Print HTML resources (including inline <scri...
Joseph Pecoraro
Reported 2019-09-05 23:37:17 PDT
Pretty Print HTML resources (including inline <script>/<style>) Some pretty printing opportunities: - Clean up element attributes to always be quoted - Clean up indenting to always be consistent After pretty printing it would be important to: - Set breakpoints in inline <script> - Step through pretty printed inline <script> - Ensure Style sidebar links to <style> go to the right place
Attachments
[PATCH] WIP - Pretty Printing (63.37 KB, patch)
2019-09-05 23:38 PDT, Joseph Pecoraro
no flags
[IMAGE] HTMLFormatter Tool (746.37 KB, image/png)
2019-09-05 23:40 PDT, Joseph Pecoraro
no flags
[IMAGE] SourceMap Tool (943.03 KB, image/png)
2019-09-06 20:54 PDT, Joseph Pecoraro
no flags
[PATCH] WIP - Pretty Printing and Stepping Through Formatted <script> (232.35 KB, patch)
2019-09-06 20:56 PDT, Joseph Pecoraro
no flags
[TEST] Simple Test Page with Inline <script> (440 bytes, text/html)
2019-09-06 20:56 PDT, Joseph Pecoraro
no flags
[PATCH] Proposed Fix (526.53 KB, patch)
2019-09-09 18:21 PDT, Joseph Pecoraro
no flags
[PATCH] Proposed Fix (640.68 KB, patch)
2019-09-09 18:58 PDT, Joseph Pecoraro
no flags
[PATCH] Proposed Fix (665.66 KB, patch)
2019-09-10 11:31 PDT, Joseph Pecoraro
no flags
[PATCH] Proposed Fix (661.83 KB, patch)
2019-09-10 11:49 PDT, Joseph Pecoraro
no flags
[PATCH] Proposed Fix (662.42 KB, patch)
2019-09-10 16:05 PDT, Joseph Pecoraro
no flags
[PATCH] Proposed Fix (428.81 KB, patch)
2019-09-11 17:00 PDT, Joseph Pecoraro
no flags
[PATCH] Proposed Fix (534.42 KB, patch)
2019-09-12 11:06 PDT, Joseph Pecoraro
hi: review+
hi: commit-queue-
[PATCH] For Landing (537.46 KB, patch)
2019-09-13 01:33 PDT, Joseph Pecoraro
no flags
Joseph Pecoraro
Comment 1 2019-09-05 23:37:28 PDT
Joseph Pecoraro
Comment 2 2019-09-05 23:38:30 PDT Comment hidden (obsolete)
Joseph Pecoraro
Comment 3 2019-09-05 23:40:33 PDT
Created attachment 378161 [details] [IMAGE] HTMLFormatter Tool
Joseph Pecoraro
Comment 4 2019-09-06 20:54:57 PDT
Created attachment 378272 [details] [IMAGE] SourceMap Tool
Joseph Pecoraro
Comment 5 2019-09-06 20:56:01 PDT Comment hidden (obsolete)
Joseph Pecoraro
Comment 6 2019-09-06 20:56:34 PDT
Created attachment 378274 [details] [TEST] Simple Test Page with Inline <script>
Joseph Pecoraro
Comment 7 2019-09-06 20:58:36 PDT Comment hidden (obsolete)
Joseph Pecoraro
Comment 8 2019-09-09 18:21:33 PDT
Created attachment 378423 [details] [PATCH] Proposed Fix
Joseph Pecoraro
Comment 9 2019-09-09 18:47:31 PDT Comment hidden (obsolete)
Joseph Pecoraro
Comment 10 2019-09-09 18:58:01 PDT
Created attachment 378429 [details] [PATCH] Proposed Fix Keep a self closing tag "<br />" outputting as "<br/>" instead of just "<br>". Since we want to be a whitespace only pretty printer.
Joseph Pecoraro
Comment 11 2019-09-10 11:31:37 PDT
Created attachment 378471 [details] [PATCH] Proposed Fix Fix broken tests, and add new tests for pause locations in inline <script>.
Joseph Pecoraro
Comment 12 2019-09-10 11:49:04 PDT
Created attachment 378472 [details] [PATCH] Proposed Fix
Joseph Pecoraro
Comment 13 2019-09-10 16:02:18 PDT
Oops, easy test fixes. Apparently WI.Resource get scripts() is not in a sorted order!
Joseph Pecoraro
Comment 14 2019-09-10 16:05:49 PDT
Created attachment 378503 [details] [PATCH] Proposed Fix
Devin Rousso
Comment 15 2019-09-10 19:36:07 PDT Comment hidden (obsolete)
Joseph Pecoraro
Comment 16 2019-09-11 16:56:24 PDT Comment hidden (obsolete)
Joseph Pecoraro
Comment 17 2019-09-11 17:00:22 PDT
Created attachment 378600 [details] [PATCH] Proposed Fix
Joseph Pecoraro
Comment 18 2019-09-11 17:07:36 PDT
Comment on attachment 378600 [details] [PATCH] Proposed Fix View in context: https://bugs.webkit.org/attachment.cgi?id=378600&action=review > LayoutTests/inspector/formatting/resources/source-map-utilities.js:92 > + let nextOriginalChar = originalLines[line][column]; > + let nextFormattedChar = formattedLines[formattedLine][formattedColumn]; > + if (nextOriginalChar !== undefined) > + InspectorTest.assert(nextOriginalChar === nextFormattedChar, `FAIL: Mapping appears to be to a different token. ${JSON.stringify(nextOriginalChar)} => ${JSON.stringify(nextFormattedChar)}`); This assertion a few legit issues. - The closing tag for comments (and doctype, cdata, etc) need to have their position in case the inner text ended with a newline. - One of the attribute cases was not outputting the `namePos`. Also now we only have to search the output for "FAIL" for the obvious bad cases (of which there should be none right now!). > Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLParser.js:475 > + ErrorText: "error-text", This new node type is for EOF remaining text. Our TreeBuilder formats passes this through as text and HTMLFormatter outputs it as text.
Joseph Pecoraro
Comment 19 2019-09-11 21:20:49 PDT
I can mark these tests as SLOW. They indeed have a lot of output and timed out on debug WK1 bots: inspector/formatting/source-map-html-1.html inspector/formatting/source-map-html-2.html inspector/formatting/source-map-javascript-1.html inspector/formatting/source-map-css-1.html
Joseph Pecoraro
Comment 20 2019-09-11 21:22:38 PDT
Not sure why the wincairo both failed to apply. This file is in the patch: LayoutTests/inspector/formatting/resources/html-tests/not-well-formed-3.html
Joseph Pecoraro
Comment 22 2019-09-12 11:06:18 PDT
Created attachment 378658 [details] [PATCH] Proposed Fix Lets see if the bots do better on this patch.
Devin Rousso
Comment 23 2019-09-12 17:58:28 PDT
Comment on attachment 378658 [details] [PATCH] Proposed Fix View in context: https://bugs.webkit.org/attachment.cgi?id=378658&action=review r=me, given how large this patch is, it's getting very difficult to track differences between each iteration, but the tests look great and I think I've looked at enough of the code enough times to think that this is good to go. Please run this through ESLint once before landing, as I did notice some style issues (I pointed out the ones I saw, but there may be others). Also, it looks like inspector/formatting/source-map-html-1.html is still too slow for some reason, so that needs to be fixed as well. Fantastic work! Please make some followup bugs for some of the other things we've discussed: - ensure that `<p><p>` are siblings, not parent-child - add XML/SVG support > Source/WebInspectorUI/UserInterface/Workers/Formatter/FormatterContentBuilder.js:120 > + appendNonToken(string) Why does this need to exist? Shouldn't every character that isn't whitespace be considered a token? > Source/WebInspectorUI/UserInterface/Workers/Formatter/FormatterContentBuilder.js:143 > + console.assert(!string.includes("\n"), "Appended a string with newlines. This breaks the source map."); We should also assert that `!/\s$/.test(string)`, since any trailing whitespace should really be handled by `appendSpace()`. > Source/WebInspectorUI/UserInterface/Workers/Formatter/FormatterContentBuilder.js:155 > + Style: extra newline > Source/WebInspectorUI/UserInterface/Workers/Formatter/FormatterContentBuilder.js:178 > + this.lastTokenWasWhitespace = false; Shouldn't this be `true` if we just added an indent? > Source/WebInspectorUI/UserInterface/Workers/Formatter/FormatterContentBuilder.js:303 > + let indent = this._indentCache[this._indent]; > + if (indent) This seems wrong. If it doesn't exist in `this._indentCache`, shouldn't we add it? Is it even possible to hit this? > Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLFormatter.js:100 > + for (let childNode of children) { NIT: this could just be `child` > Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLFormatter.js:123 > + Style: extra newline > Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLFormatter.js:144 > + let {name, value, quote, namePos, valuePos} = attr; Why not just destructure this in the parameters list? > Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLFormatter.js:147 > + let q; Style: this needs a default value > Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLFormatter.js:293 > + let closingDoctypeString = ">"; Style: `const` > Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLFormatter.js:300 > + let closingCDataString = "]]>"; Ditto (293) > Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLParser.js:32 > + console.assert(treeBuilder); What about an assertion for `sourceText`? > Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLParser.js:54 > + _isEOF() Aside: we could totally make a base `Parser` class that holds all of these functions, as I'm sure they'd also be useful elsewhere > Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLParser.js:64 > + _peekChararacter(c) This is spelled very wrong lol. Can we merge this with `_peekString`? Since there's no real difference between a character and a string in JS, we could just have `_peekString` handle single character peeks. > Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLParser.js:69 > + _peekRegex(regex) This name may be a bit misleading, because it only peeks a single character, and `RegExp` can handle more than that. Perhaps this should be `_peekCharacterRegex`? Or even better, assert that `regex.source.startsWith("^")` and allow it to test `this._data.substring(this._pos)`. > Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLParser.js:90 > + let c = str[i]; This should be moved lower (or inlined) so that it isn't evaluated if we take the `!d` branch. > Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLParser.js:106 > + Style: extra whitespace > Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLParser.js:276 > + if (this._peekChararacter("/")) { > + if (this._peekString("/>")) { Why are we doing double the effort and peeking for "/" twice? Can we not just `this._peekString("/>")`? > Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLParser.js:308 > + if (this._peekChararacter("/")) { > + if (this._peekString("/>")) { Ditto (275) > Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLTreeBuilderFormatter.js:102 > + nodesToPop++; Please move this above the `if` since it's called in both branches. That way, we only need one call. > Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLTreeBuilderFormatter.js:153 > + } Style: missing semicolon > Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLTreeBuilderFormatter.js:164 > + } Ditto (153)
Joseph Pecoraro
Comment 24 2019-09-13 01:11:30 PDT
Comment on attachment 378658 [details] [PATCH] Proposed Fix View in context: https://bugs.webkit.org/attachment.cgi?id=378658&action=review >> Source/WebInspectorUI/UserInterface/Workers/Formatter/FormatterContentBuilder.js:120 >> + appendNonToken(string) > > Why does this need to exist? Shouldn't every character that isn't whitespace be considered a token? This does not need to exist. We can aim to phase it out. This saves a little work: - for output characters that immediately follow tokens such that we don't need to provide an additional position mapping which will just ignored because it is in the right position - avoid additional original position calculations with every appendToken I think for now the '=' might need to carry over a position from the HTMLParser if we wanted to drop this entirely. >> Source/WebInspectorUI/UserInterface/Workers/Formatter/FormatterContentBuilder.js:143 >> + console.assert(!string.includes("\n"), "Appended a string with newlines. This breaks the source map."); > > We should also assert that `!/\s$/.test(string)`, since any trailing whitespace should really be handled by `appendSpace()`. Hmm, I'm not sure that matters in all cases. I can imagine a TextNode in HTML may end with whitespace. You could argue that we could strip that at any point, but I'm going to leave this as is for now. >> Source/WebInspectorUI/UserInterface/Workers/Formatter/FormatterContentBuilder.js:178 >> + this.lastTokenWasWhitespace = false; > > Shouldn't this be `true` if we just added an indent? Sounds good. I also dropped the `_startOfLine = false`. >> Source/WebInspectorUI/UserInterface/Workers/Formatter/FormatterContentBuilder.js:303 >> + if (indent) > > This seems wrong. If it doesn't exist in `this._indentCache`, shouldn't we add it? Is it even possible to hit this? The check above this ensures something is in the cache or not already. This check avoids empty string appends (when indent is 0). >> Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLFormatter.js:144 >> + let {name, value, quote, namePos, valuePos} = attr; > > Why not just destructure this in the parameters list? Readability. Not all things that are possible should be done. It is easier to read and understand this code when the method signature shows it takes an Attr node. If the destructure were to happen in the parameter list it is not necessarily clear that those properties are the properties on an Attr node unless you already knew what the properties were on an Attr node. >> Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLFormatter.js:147 >> + let q; > > Style: this needs a default value I don't see anything wrong with this. It is a valid initialization and we do this in hundreds of places in our code base. It would be considered a "dead store" in most static analyzers unless we were to rewrite the following code just to make the initial assignment do something useful. I will make the default case assign an empty string to ensure all cases provide a value. >> Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLFormatter.js:293 >> + let closingDoctypeString = ">"; > > Style: `const` FWIW these comments don't appear to add any value. `let` or `const` here isn't going to change the meaning of this code, runtime behavior of the code, or make it more or less readable. I realize these values can be inlined but I pulled them out into a variable to enhance readability ("why this string, oh the variable name"). Do you see an advantage to const here other than it could be const? >> Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLParser.js:32 >> + console.assert(treeBuilder); > > What about an assertion for `sourceText`? Good idea! >> Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLParser.js:64 >> + _peekChararacter(c) > > This is spelled very wrong lol. > > Can we merge this with `_peekString`? Since there's no real difference between a character and a string in JS, we could just have `_peekString` handle single character peeks. Hah, how have we missed that until now! I've switched to using _peekString everywhere. My original concern was performance but this performs well always using _peekString. >> Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLParser.js:69 >> + _peekRegex(regex) > > This name may be a bit misleading, because it only peeks a single character, and `RegExp` can handle more than that. Perhaps this should be `_peekCharacterRegex`? Or even better, assert that `regex.source.startsWith("^")` and allow it to test `this._data.substring(this._pos)`. I want to avoid creating substrings. I've named it _peekCharacterRegex. >> Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLParser.js:276 >> + if (this._peekString("/>")) { > > Why are we doing double the effort and peeking for "/" twice? Can we not just `this._peekString("/>")`? Good catch.
Joseph Pecoraro
Comment 25 2019-09-13 01:17:52 PDT
Comment on attachment 378658 [details] [PATCH] Proposed Fix View in context: https://bugs.webkit.org/attachment.cgi?id=378658&action=review >>> Source/WebInspectorUI/UserInterface/Workers/Formatter/FormatterContentBuilder.js:178 >>> + this.lastTokenWasWhitespace = false; >> >> Shouldn't this be `true` if we just added an indent? > > Sounds good. I also dropped the `_startOfLine = false`. Hmm, changing these hurt tests in a bad way (closing </style> / </script>). I've reverted to what was in the patch.
Joseph Pecoraro
Comment 26 2019-09-13 01:33:02 PDT
Created attachment 378719 [details] [PATCH] For Landing
WebKit Commit Bot
Comment 27 2019-09-13 03:05:46 PDT
Comment on attachment 378719 [details] [PATCH] For Landing Clearing flags on attachment: 378719 Committed r249831: <https://trac.webkit.org/changeset/249831>
Devin Rousso
Comment 28 2019-09-13 06:05:45 PDT
Comment on attachment 378658 [details] [PATCH] Proposed Fix View in context: https://bugs.webkit.org/attachment.cgi?id=378658&action=review >>> Source/WebInspectorUI/UserInterface/Workers/Formatter/FormatterContentBuilder.js:120 >>> + appendNonToken(string) >> >> Why does this need to exist? Shouldn't every character that isn't whitespace be considered a token? > > This does not need to exist. We can aim to phase it out. > > This saves a little work: > > - for output characters that immediately follow tokens such that we don't need to provide an additional position mapping which will just ignored because it is in the right position > - avoid additional original position calculations with every appendToken > > I think for now the '=' might need to carry over a position from the HTMLParser if we wanted to drop this entirely. Ah, I see. The "don't add a mapping" part wasn't immediately obvious, but that does make sense. It seems like the difference between `appendToken` and `appendNonToken` is worth a comment, or perhaps a more explicit name. >>> Source/WebInspectorUI/UserInterface/Workers/Formatter/FormatterContentBuilder.js:143 >>> + console.assert(!string.includes("\n"), "Appended a string with newlines. This breaks the source map."); >> >> We should also assert that `!/\s$/.test(string)`, since any trailing whitespace should really be handled by `appendSpace()`. > > Hmm, I'm not sure that matters in all cases. I can imagine a TextNode in HTML may end with whitespace. You could argue that we could strip that at any point, but I'm going to leave this as is for now. Really, the only reason I suggested this is so that `lastTokenWasWhitespace` is kept as accurate as possible. If `string` ends with a whitespace character, as it is now, `lastTokenWasWhitespace` would be `false`, which isn't really accurate. >>>> Source/WebInspectorUI/UserInterface/Workers/Formatter/FormatterContentBuilder.js:178 >>>> + this.lastTokenWasWhitespace = false; >>> >>> Shouldn't this be `true` if we just added an indent? >> >> Sounds good. I also dropped the `_startOfLine = false`. > > Hmm, changing these hurt tests in a bad way (closing </style> / </script>). I've reverted to what was in the patch. We likely still need `this._startOfLine = false;` given that we would've just indented. I think the only thing we need to do is check whether the last character added was a whitespace, and if so set `lastTokenWasWhitespace` accordingly. >>> Source/WebInspectorUI/UserInterface/Workers/Formatter/FormatterContentBuilder.js:303 >>> + if (indent) >> >> This seems wrong. If it doesn't exist in `this._indentCache`, shouldn't we add it? Is it even possible to hit this? > > The check above this ensures something is in the cache or not already. This check avoids empty string appends (when indent is 0). Ah, I see. That is very subtle :( >>> Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLFormatter.js:147 >>> + let q; >> >> Style: this needs a default value > > I don't see anything wrong with this. It is a valid initialization and we do this in hundreds of places in our code base. It would be considered a "dead store" in most static analyzers unless we were to rewrite the following code just to make the initial assignment do something useful. I will make the default case assign an empty string to ensure all cases provide a value. It's not that this is "wrong", just that I've thought we always preferred to have a variable have a default value, ensuring that we never tried to do anything with an uninitialized `let`/`const`. That's why I put this as a "Style", since it's really just a stylistic difference. >>> Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLFormatter.js:293 >>> + let closingDoctypeString = ">"; >> >> Style: `const` > > FWIW these comments don't appear to add any value. `let` or `const` here isn't going to change the meaning of this code, runtime behavior of the code, or make it more or less readable. I realize these values can be inlined but I pulled them out into a variable to enhance readability ("why this string, oh the variable name"). > > Do you see an advantage to const here other than it could be const? AFAIU, our usual style was to use `const` for things that wouldn't change between invocations of the program. Since this is effectively a constant value, it should use `const` instead of a let. Again, this was just a "Style", so not a functionality issue.
Joseph Pecoraro
Comment 29 2019-09-13 11:24:36 PDT
> > - for output characters that immediately follow tokens such that we don't need to provide an additional position mapping which will just ignored because it is in the right position > > - avoid additional original position calculations with every appendToken > > > > I think for now the '=' might need to carry over a position from the HTMLParser if we wanted to drop this entirely. > > Ah, I see. The "don't add a mapping" part wasn't immediately obvious, but > that does make sense. It seems like the difference between `appendToken` > and `appendNonToken` is worth a comment, or perhaps a more explicit name. Yeah, I originally added this for when I was adding tokens that did not exist in the original source. For example converting `<input type=text>` to `<input type="text>`. But we abandoned that approach in favor of pure whitespace changes, so really we shouldn't have any such new tokens in the output now. > >>> Source/WebInspectorUI/UserInterface/Workers/Formatter/HTMLFormatter.js:147 > >>> + let q; > >> > >> Style: this needs a default value > > > > I don't see anything wrong with this. It is a valid initialization and we do this in hundreds of places in our code base. It would be considered a "dead store" in most static analyzers unless we were to rewrite the following code just to make the initial assignment do something useful. I will make the default case assign an empty string to ensure all cases provide a value. > > It's not that this is "wrong", just that I've thought we always preferred to > have a variable have a default value, ensuring that we never tried to do > anything with an uninitialized `let`/`const`. That's why I put this as a > "Style", since it's really just a stylistic difference. That has never been a style rule that I'm aware of. I seem to think it is normally done to denote initialization is immediately following in a conditional. For example: let value; if (...) value = ...; else value = ...; Fortunately we can use a ternary when the "..." sections are short enough, but if not then I think we've used the above pattern and it has always been clear to me.
Joseph Pecoraro
Comment 30 2019-09-17 16:51:10 PDT
*** Bug 156082 has been marked as a duplicate of this bug. ***
Note You need to log in before you can comment on or make changes to this bug.