Bug 208271 - Add performance probes for HTML parsing
Summary: Add performance probes for HTML parsing
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Page Loading (show other bugs)
Version: WebKit Local Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Ben Nham
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2020-02-26 15:43 PST by Ben Nham
Modified: 2020-03-02 21:18 PST (History)
13 users (show)

See Also:


Attachments
Patch (6.04 KB, patch)
2020-02-26 15:47 PST, Ben Nham
no flags Details | Formatted Diff | Diff
Patch (7.36 KB, patch)
2020-02-28 12:56 PST, Ben Nham
no flags Details | Formatted Diff | Diff
Patch for landing (7.40 KB, patch)
2020-03-02 20:33 PST, Ben Nham
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ben Nham 2020-02-26 15:43:13 PST
When analyzing a document loading trace, it's useful to know what lines of HTML have been parsed. Let's add some trace points for this.
Comment 1 Radar WebKit Bug Importer 2020-02-26 15:43:45 PST
<rdar://problem/59824845>
Comment 2 Ben Nham 2020-02-26 15:47:02 PST
Created attachment 391793 [details]
Patch
Comment 3 Daniel Bates 2020-02-27 21:13:15 PST
Comment on attachment 391793 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=391793&action=review

This patch is good.

> Source/WTF/ChangeLog:8
> +        This adds some instrumentation so we can know which lines of HTML have been parsed when

This is ok as-is. No change is necessary. FYI it is an "in the know" convention that change log lines are wrapped at ~100 characters.

> Source/WebCore/ChangeLog:8
> +        This adds some instrumentation so we can know which lines of HTML have been parsed when

Ditto. Again, NO change necessary.

> Source/WebCore/html/parser/HTMLDocumentParser.cpp:30
> +#include <wtf/SystemTracing.h>

This is ok as-is. According to WebKit code style this include should be after all other includes.

> Source/WebCore/html/parser/HTMLDocumentParser.cpp:307
> +    bool isMainDocument = document() && document()->frame() && document()->frame()->isMainFrame();

This is ok as-is. The optimal solution removes the null check of document() because 

1. It can never be nullptr when THIS function called.

> Source/WebCore/html/parser/HTMLDocumentParser.cpp:309
> +        TextPosition startPosition = textPosition();

This is ok as-is. The optimal solution is to use auto because

1. The right hand side (rhs)  is as descriptive as the left.
2. If the data type of the rhs were to change in the future then using auto will prevent implicit conversion, which could be a 🐜.

> Source/WebCore/html/parser/HTMLDocumentParser.cpp:316
> +        TextPosition endPosition = textPosition();

Ditto.

> Source/WebCore/html/parser/HTMLDocumentParser.cpp:317
> +        tracePoint(ParseHTMLEnd, endPosition.m_line.oneBasedInt(), endPosition.m_column.oneBasedInt());

This is ok as-is. A better solution would be to define a lambda or INLINED private member function that takes a TracePointCode and use this function twice because:

1. Do not need to duplicate code
2. If there is a bug in this code or future enhancement then it can be made in one place instead of 2.

One suggestion for a name, tracePointIfMainDocument()

> Tools/ChangeLog:8
> +        This adds some instrumentation so we can know which lines of HTML have been parsed when

Ok as is. NO change necessary. Line is long.
Comment 4 Daniel Bates 2020-02-27 21:14:41 PST
Comment on attachment 391793 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=391793&action=review

> Source/WTF/ChangeLog:5
> +

Ok as-is. NO change necessary. Another "in the know" convention is to put radar URL here and in every ChangeLog because:

1. Makes it convenient for Apple engineers to open radar bug
Comment 5 Ben Nham 2020-02-28 12:56:57 PST
Created attachment 392006 [details]
Patch
Comment 6 Sam Weinig 2020-02-28 17:13:13 PST
Comment on attachment 392006 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=392006&action=review

> Source/WebCore/html/parser/HTMLDocumentParser.cpp:70
> +    , m_shouldEmitTracePoints(isMainDocumentLoadingFromHTTP(document))

Why only when isMainDocumentLoadingFromHTTP?
Comment 7 Ben Nham 2020-02-29 07:53:28 PST
(In reply to Sam Weinig from comment #6)
> Comment on attachment 392006 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=392006&action=review
> 
> > Source/WebCore/html/parser/HTMLDocumentParser.cpp:70
> > +    , m_shouldEmitTracePoints(isMainDocumentLoadingFromHTTP(document))
> 
> Why only when isMainDocumentLoadingFromHTTP?

For one thing, when you only log intervals related to the main document, it's unambiguous which URL is being parsed when you see the interval in the trace. Once you start emitting parsing intervals related to iframes in the trace, the next question is probably what iframe URL is being parsed. Passing in string metadata like this isn't easy to do for a ktrace-based probe.

For another, if trace points can nest (like iframe parsing interval under a main frame parsing interval), then you have to pass something like a pointer to allow the instrumentation tool to properly nest the intervals. We don't do this properly anywhere with WTF::tracePoint right now.

I am thinking of passing HTML parsing events through os_signpost, which has a richer API for annotation which allows us to pass things like URLs, but they show up in a bit uglier way in our instrumentation tool, and it's blocked by https://bugs.webkit.org/show_bug.cgi?id=208395.
Comment 8 Ben Nham 2020-03-02 20:33:42 PST
Created attachment 392245 [details]
Patch for landing
Comment 9 WebKit Commit Bot 2020-03-02 21:18:09 PST
Comment on attachment 392245 [details]
Patch for landing

Clearing flags on attachment: 392245

Committed r257763: <https://trac.webkit.org/changeset/257763>
Comment 10 WebKit Commit Bot 2020-03-02 21:18:11 PST
All reviewed patches have been landed.  Closing bug.