Bug 117488

Summary: Consider atomicizing inline scripts before execution
Product: WebKit Reporter: Ryosuke Niwa <rniwa>
Component: TextAssignee: Nobody <webkit-unassigned>
Status: NEW    
Severity: Normal CC: ahmad.saleem792, ap, barraclough, benjamin, cdumez, darin, fpizlo
Priority: P2 Keywords: BlinkMergeCandidate
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   

Ryosuke Niwa
Reported 2013-06-10 22:08:34 PDT
https://chromium.googlesource.com/chromium/blink/+/d14b5c9da043f4569a0fa2058a0c59e1e984d006 alloon-juice.com should use 85 kB less memory Browsing around the web with STRING_STATS, I noticed we commonly end up with duplicated scripts String. Studying the issue more carefully, I determined that these scripts were inline scripts that appeared multiple times (e.g., in different iframes). This CL teaches Blink to atomize inline scripts immediately before executing them. If an inline script appears in multiple documents, they'll end up sharing the same underlying string storage. An alternative is to atomize the string as we're constructing the DOM, but that can work poorly if there is a network packet boundary in the middle of (large) inline script blocks. Instead, we want to wait until we're done parsing the script element. Doing the atomization immediately before execution also lets us piggyback on the logic that checks whether the script element has a single child Text node, which is exactly the case in which atomization is likely to be useful. I used ballon-juice.com as a test bed for studying this issue because it is a fairly typical blog-based web site. This CL dropped the memory wasted by duplicate strings from 94,809 kB to 10,280 kB, a savings of 85 kB. Here is an example string before and after this CL: == Before == 2 copies (ok) of length 52790 /* <![CDATA[ */?if(!((window._atc||{}).ver)){var _atd="www.addth... == After == 1 copies (ok) of length 52790 /* <![CDATA[ */?if(!((window._atc||{}).ver)){var _atd="www.addth... We should consider the same optimization for inline style elements as well if we see them wasting memory in the same way.
Attachments
Ahmad Saleem
Comment 1 2023-07-06 14:56:37 PDT
I tried: Source/WebCore/dom/CharacterData.h: void atomize() { m_data = AtomString(m_data); } __________ Source/WebCore/dom/TextNodeTraversal.cpp: String childTextContent(const ContainerNode& root) { StringBuilder result; + for (Text* text = TextNodeTraversal::firstChild(root); text; text = TextNodeTraversal::nextSibling(*text)) { + text->atomize(); result.append(text->data()); + } return result.toString(); } __________ But again, it compiles but we might need AB testing to confirm any benefit on memory and no impact on performance from this. Any Apple engineer interested?
Darin Adler
Comment 2 2023-07-07 10:00:06 PDT
This change is a peculiar way to accomplish this. The rationale is specific to script and style elements, but the change is in a general purpose function that is not specific to script and style elements. Maybe there is benefit to doing this as a side effect any time any code calls childTextContent, but it’s a strange choice to have this operation have an unrelated side effect without even changing its name. The idea itself seems likely sound, but my first choice given this problem statement would be to atomize them during the parsing process, not just before execution. The memory cost takes effect as soon as they are parsed and is paid even if they are never executed. Atomizing each individual text node’s text is not related to concatenating them all into a single large string.
Darin Adler
Comment 3 2023-07-07 10:05:35 PDT
We already do an atomization of whitespace strings during the parsing process.
Chris Dumez
Comment 4 2023-07-07 10:08:19 PDT
(In reply to Darin Adler from comment #3) > We already do an atomization of whitespace strings during the parsing > process. I don't think we do anymore? If I remember correctly, it was a speedometer progression to not atomize whitespace.
Darin Adler
Comment 5 2023-07-07 10:11:52 PDT
(In reply to Chris Dumez from comment #4) > If I remember correctly, it was a speedometer progression to not atomize whitespace. I didn’t know that. If there is a significant memory savings, we could seek out an efficient way to atomize whatever we need to. For example, I can imagine a *very* efficient way to atomize the most common simple runs of whitespace that would be different from general purpose AtomString construction. For the text in <style> and <script> elements I feel confident we could do the atomizing without affecting other parsing so likely mitigate the performance cost.
Darin Adler
Comment 6 2023-07-07 10:15:48 PDT
Despite Ryosuke’s original comment, it’s easy to add code to parsing that kicks in only when there’s a single text node child of a script or style element. I am pretty sure that waiting until execution is not necessary or even all that helpful.
Note You need to log in before you can comment on or make changes to this bug.