Bug 16076 - DOMParser().parseFromString() freezes Safari when parsing large nodes with XML entities
Summary: DOMParser().parseFromString() freezes Safari when parsing large nodes with XM...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: DOM (show other bugs)
Version: 528+ (Nightly build)
Hardware: Mac (Intel) OS X 10.4
: P2 Normal
Assignee: Mark Rowe (bdash)
URL:
Keywords: HasReduction, InRadar, YahooBug
Depends on:
Blocks:
 
Reported: 2007-11-20 16:31 PST by Brian Kaull
Modified: 2019-02-06 09:04 PST (History)
3 users (show)

See Also:


Attachments
Slow XML Parsing Test (2.32 KB, text/html)
2007-11-20 16:37 PST, Brian Kaull
no flags Details
Patch (5.34 KB, patch)
2007-11-20 22:47 PST, Mark Rowe (bdash)
mjs: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Brian Kaull 2007-11-20 16:31:41 PST
Calling DOMParser.parseFromString can freeze safari for a long time when the xml that is trying to be parsed contains at least one node that is extremely large and has many xml-safe characters (ie.
"&", "<", ">", """, "'").  By large, I mean at least 10,000 repetitions of text:

For example
1. This is a long & boring message.
2. This is a long & boring message.
...
10000. This is a long & boring message.

(new DOMParser()).parseFromString( xml, "text/xml" );

The larger the number of repitions, the longer it takes to parse.  Right now I'm seeing numbers such as:
5,000 reps  = ~1 sec
10,000 reps = ~7 sec
20,000 reps = ~32 sec
If the text of the node doesn't have any xml safe characters, the parsing runs very quickly, generally less than a second.  I'm going to attach an example page that has several tests that parse large sample xml text.  I have run this test on Firefox 2(PC & Mac) and IE7 and the numbers for the tests there are never over 3 seconds for even the largest test, 100,000 reps.

I've been testing with Safari 3.0.4 (523.12) and WebKit r27930
Comment 1 Brian Kaull 2007-11-20 16:37:54 PST
Created attachment 17422 [details]
Slow XML Parsing Test

These tests create large amounts of xml and parse that xml using:
(new DOMParser()).parseFromString( xml, "text/xml" );

Note: If the text of the node is wrapped in CDATA the parsing times are extremely fast.

Note: If the text of the node is large and there are no xml-safe characters the parsing times are very fast.
Comment 2 Maciej Stachowiak 2007-11-20 17:43:23 PST
This test case seems to show O(N^2) behavior:

5,000 - 0.682sec
10,000 - 6.547sec
20,000 - 33.04sec


Comment 3 Maciej Stachowiak 2007-11-20 17:44:34 PST
<rdar://problem/5609579>
Comment 4 Mark Rowe (bdash) 2007-11-20 18:56:43 PST
A Shark profile shows that > 90% of the time is spent in memcpy.  This appears to be due to the naive implementation of StringImpl::append(UChar*, unsigned) that allocates and copies on every append rather than growing more gracefully.
Comment 5 Adam Roben (:aroben) 2007-11-20 20:14:50 PST
(In reply to comment #4)
> A Shark profile shows that > 90% of the time is spent in memcpy.  This appears
> to be due to the naive implementation of StringImpl::append(UChar*, unsigned)
> that allocates and copies on every append rather than growing more gracefully.

While fixing StringImpl::append would be nice, perhaps in this specific case we could use a Vector<UChar> and String::adopt(), as we do in other places in WebCore.
Comment 6 Mark Rowe (bdash) 2007-11-20 22:47:08 PST
Created attachment 17425 [details]
Patch

This patch causes the largest case in the test document to take around 0.18 seconds on my MacBook Pro, down from several minutes.
Comment 7 Mark Rowe (bdash) 2007-11-20 22:48:03 PST
I did not create a layout test because I'm unsure how this could be tested without being timing dependent. I'd rather not introduce another layout test that fails on slower machines or when run under guard malloc.
Comment 8 Maciej Stachowiak 2007-11-20 22:48:43 PST
Comment on attachment 17425 [details]
Patch

r=me
Comment 9 Alexey Proskuryakov 2007-11-20 23:11:17 PST
(In reply to comment #5)
> While fixing StringImpl::append would be nice, 

I actually think that we should eventually get rid of it - no need to make String identical to Vector in implementation.
Comment 10 Mark Rowe (bdash) 2007-11-20 23:30:45 PST
Landed in r27936.
Comment 11 Lucas Forschler 2019-02-06 09:04:15 PST
Mass moving XML DOM bugs to the "DOM" Component.