Bug 117488 - Consider atomicizing inline scripts before execution
Summary: Consider atomicizing inline scripts before execution
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: Text (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords: BlinkMergeCandidate
Depends on:
Blocks:
 
Reported: 2013-06-10 22:08 PDT by Ryosuke Niwa
Modified: 2023-07-07 10:15 PDT (History)
7 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ryosuke Niwa 2013-06-10 22:08:34 PDT
https://chromium.googlesource.com/chromium/blink/+/d14b5c9da043f4569a0fa2058a0c59e1e984d006
alloon-juice.com should use 85 kB less memory

Browsing around the web with STRING_STATS, I noticed we commonly end up with
duplicated scripts String. Studying the issue more carefully, I determined that
these scripts were inline scripts that appeared multiple times (e.g., in
different iframes).

This CL teaches Blink to atomize inline scripts immediately before executing
them. If an inline script appears in multiple documents, they'll end up sharing
the same underlying string storage.

An alternative is to atomize the string as we're constructing the DOM, but that
can work poorly if there is a network packet boundary in the middle of (large)
inline script blocks. Instead, we want to wait until we're done parsing the
script element. Doing the atomization immediately before execution also lets us
piggyback on the logic that checks whether the script element has a single
child Text node, which is exactly the case in which atomization is likely to be
useful.

I used ballon-juice.com as a test bed for studying this issue because it is a
fairly typical blog-based web site. This CL dropped the memory wasted by
duplicate strings from 94,809 kB to 10,280 kB, a savings of 85 kB. Here is an
example string before and after this CL:

== Before ==

2 copies (ok) of length 52790 /* <![CDATA[ */?if(!((window._atc||{}).ver)){var _atd="www.addth...

== After ==

1 copies (ok) of length 52790 /* <![CDATA[ */?if(!((window._atc||{}).ver)){var _atd="www.addth...

We should consider the same optimization for inline style elements as well if
we see them wasting memory in the same way.
Comment 1 Ahmad Saleem 2023-07-06 14:56:37 PDT
I tried:

Source/WebCore/dom/CharacterData.h:

void atomize() { m_data = AtomString(m_data); }

__________

Source/WebCore/dom/TextNodeTraversal.cpp:


String childTextContent(const ContainerNode& root)
{
    StringBuilder result;
+    for (Text* text = TextNodeTraversal::firstChild(root); text; text = TextNodeTraversal::nextSibling(*text)) {
+        text->atomize();
        result.append(text->data());
+    }
    return result.toString();
}

__________

But again, it compiles but we might need AB testing to confirm any benefit on memory and no impact on performance from this. Any Apple engineer interested?
Comment 2 Darin Adler 2023-07-07 10:00:06 PDT
This change is a peculiar way to accomplish this. The rationale is specific to script and style elements, but the change is in a general purpose function that is not specific to script and style elements. Maybe there is benefit to doing this as a side effect any time any code calls childTextContent, but it’s a strange choice to have this operation have an unrelated side effect without even changing its name.

The idea itself seems likely sound, but my first choice given this problem statement would be to atomize them during the parsing process, not just before execution. The memory cost takes effect as soon as they are parsed and is paid even if they are never executed. Atomizing each individual text node’s text is not related to concatenating them all into a single large string.
Comment 3 Darin Adler 2023-07-07 10:05:35 PDT
We already do an atomization of whitespace strings during the parsing process.
Comment 4 Chris Dumez 2023-07-07 10:08:19 PDT
(In reply to Darin Adler from comment #3)
> We already do an atomization of whitespace strings during the parsing
> process.

I don't think we do anymore? If I remember correctly, it was a speedometer progression to not atomize whitespace.
Comment 5 Darin Adler 2023-07-07 10:11:52 PDT
(In reply to Chris Dumez from comment #4)
> If I remember correctly, it was a speedometer progression to not atomize whitespace.

I didn’t know that.

If there is a significant memory savings, we could seek out an efficient way to atomize whatever we need to. For example, I can imagine a *very* efficient way to atomize the most common simple runs of whitespace that would be different from general purpose AtomString construction.

For the text in <style> and <script> elements I feel confident we could do the atomizing without affecting other parsing so likely mitigate the performance cost.
Comment 6 Darin Adler 2023-07-07 10:15:48 PDT
Despite Ryosuke’s original comment, it’s easy to add code to parsing that kicks in only when there’s a single text node child of a script or style element. I am pretty sure that waiting until execution is not necessary or even all that helpful.