WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
18665
HTMLTokenizer double allocates strings for <script>
https://bugs.webkit.org/show_bug.cgi?id=18665
Summary
HTMLTokenizer double allocates strings for <script>
Mike Belshe
Reported
2008-04-21 12:56:09 PDT
In HTMLTokenizer, we call: String scriptCode(buffer, dest - buffer); // this allocates a string processToken(); // This allocates another string for the Text in the DOM So all javascript code gets allocated twice; in the case of the popular web app I was using, this totalled over 1.3MB of extra space. (The JS code measured ~650KB, but in UChars, that measures 1.3MB) I'm attaching a small patch, which passes all layout tests. The fix is to have the scriptCode share the string created via the node Text. The only question I have is that this string is created via StringImpl::createStrippingNullCharacters(). Is it possible to have embedded null characters inside script? I don't think so, but I am not certain.
Attachments
Patch to HTMLTokenizer string allocation
(559 bytes, patch)
2008-04-21 12:57 PDT
,
Mike Belshe
mbelshe
: review-
Details
Formatted Diff
Diff
Update.
(1.81 KB, patch)
2008-05-08 14:29 PDT
,
Mike Belshe
no flags
Details
Formatted Diff
Diff
Updated patch, this one with a changelog.
(2.84 KB, patch)
2008-05-12 15:52 PDT
,
Mike Belshe
koivisto
: review+
Details
Formatted Diff
Diff
View All
Add attachment
proposed patch, testcase, etc.
Mike Belshe
Comment 1
2008-04-21 12:57:07 PDT
Created
attachment 20738
[details]
Patch to HTMLTokenizer string allocation
Eric Seidel (no email)
Comment 2
2008-04-21 13:14:04 PDT
I bet this could be a win on the PLT.
Eric Seidel (no email)
Comment 3
2008-04-21 15:11:23 PDT
Comment on
attachment 20738
[details]
Patch to HTMLTokenizer string allocation We don't generally use PassRefPtr<> on the stack, rather we use RefPtr<> and call .release() on the ref ptr (to make it more explicit that you're clearing it). This patch changes two behaviors. 1. The string passed off into JSC now has all nulls removed. I'm not sure what we would actually do when encoutering a null char in a script? I assume that's an error, in which case scripts will now succeed which would previously fail... 2. Under what circumstances can processToken() return 0? You've changed the behavior. Before we would have passed a non-empty string off to JSC, where as now we'll pass an empty string in that case. In general the patch looks good, and the idea is sound, but we should answer these minor details before landing this.
Mike Belshe
Comment 4
2008-04-21 15:20:06 PDT
I believe the case for processToken() returning null is when the node hasn't been fully processed yet (e.g. you've seen the start <script> tag, but no end tag) However, I admit, I am not an expert in this code, and would like someone who knows it better to help!
Mike Belshe
Comment 5
2008-04-28 07:18:34 PDT
The patch I submitted is not sufficient; turns out that there is a 64KB limit on text nodes, so if you pass <script></script> larger than that, it gets processed and breaks. The memory savings potential here is real, however.
Mike Belshe
Comment 6
2008-05-08 14:29:41 PDT
Created
attachment 21029
[details]
Update.
Mike Belshe
Comment 7
2008-05-08 14:31:46 PDT
I've updated with a new patch. Currently, the parser chops text blocks into 64KB chunks. For normal text, this works fine. However, in order to share the string which is passed to JS with the string used in the DOM, we need one contiguous string. Modified the HTMLParser such that it will not chop text blocks into 64KB chunks if it is part of a script tag. Passes all layout tests.
Maciej Stachowiak
Comment 8
2008-05-09 02:08:47 PDT
Comment on
attachment 21029
[details]
Update. Please provide a ChangeLog entry, and once that is done flag for review.
Mike Belshe
Comment 9
2008-05-12 15:52:57 PDT
Created
attachment 21094
[details]
Updated patch, this one with a changelog.
Eric Seidel (no email)
Comment 10
2008-08-01 01:49:54 PDT
Comment on
attachment 21094
[details]
Updated patch, this one with a changelog. I assume he meant to mark this for review. One style nit: + if (t->tagName == textAtom && t->text && current->localName() != scriptTag) + { { should be on the same line as the if in WebKit style. Also, generally the changelog reads: date/name line blank reviewed by line blank description and link to bug blank test cases (if applicable) blank comments on individual files. The patch looks good to me, but I think someone else who has hacked in this code recently should at least glance at it.
Eric Seidel (no email)
Comment 11
2008-08-02 00:58:51 PDT
Comment on
attachment 21094
[details]
Updated patch, this one with a changelog. Assigning this to Beth. Any reviewer is still of course welcome to review it. But given that she's hacked in this area recently, it makes sense to get her comments. Beth, if you don't feel you should review this, please feel free to request someone else, or just un-assign it with comment.
Antti Koivisto
Comment 12
2008-08-04 13:22:38 PDT
Comment on
attachment 21094
[details]
Updated patch, this one with a changelog. r=me, with the style fixes pointed out by Eric. Nice simple memory optimization.
Eric Seidel (no email)
Comment 13
2008-08-04 17:33:45 PDT
I've applied this locally, once the tests finish running it will be committed.
Eric Seidel (no email)
Comment 14
2008-08-04 17:39:48 PDT
http://trac.webkit.org/changeset/35544
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug