Bug 55898 - REGRESSION (HTML5 tree builder): Text selection in a large text document is extremely slow
Summary: REGRESSION (HTML5 tree builder): Text selection in a large text document is e...
Alias: None
Product: WebKit
Classification: Unclassified
Component: DOM (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Alexey Proskuryakov
URL: http://dscoder.com/test.txt
Depends on:
Blocks: 41115
  Show dependency treegraph
Reported: 2011-03-07 13:08 PST by Alexey Proskuryakov
Modified: 2011-03-07 19:18 PST (History)
9 users (show)

See Also:

proposed fix (21.39 KB, patch)
2011-03-07 16:45 PST, Alexey Proskuryakov
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alexey Proskuryakov 2011-03-07 13:08:45 PST
See bug URL. Selecting near the end of document is extremely slow. This regressed because a fix for bug 12833 has been reverted in order to match HTML5.

Current releases of Safari and Firefox limit the size of parser inserted text nodes. This has been forbidden in HTML5 and dropped in implementations because client code can break when trying to match text that crosses an engine node boundary, and we shouldn't surprise JS developers like that. I'm not sure if this was a good tradeoff:
1) Most client code needs to deal with text crossing node boundaries anyway (e.g. for style changes, or for line breaks, or for DOM-inserted nodes).
2) We shouldn't surprise client code developers with huge amounts of text either. Just like we bend backwards to spare them from threading or reentrancy issues, we can limit the harm from non-linear algorithms here.
3) The failure mode for an O(N^2) algorithm on an unlimited length text (freezing) is often worse than missing a match in a huge text due to hitting a boundary.

This has also been affecting clients of Objective C WebKit API (<rdar://problem/9013049> for those who can see it). There isn't much difference from JavaScript case, with one twist: when using Objective C autorelease pools, a client can easily take a huge hit in memory consumption, in addition to CPU performance.

Regardless of what we do long term, I think that we need to re-introduce the node size limit in WebKit until we can sort out internal performance, and get clients updated.
Comment 1 Adam Barth 2011-03-07 13:13:38 PST
I gave ap some pointers on IRC to where in the code needs to be changed.  Eric seems to think he did this already, but I can't find any evidence of that.
Comment 2 Boris Zbarsky 2011-03-07 13:18:48 PST
For what it's worth, the most common compat issue we had with the old behavior in Gecko was sites using XHR and then extracting the text from a node in the responseXML.  They _could_ use .textContent, but tended to use .firstChild.data, and this would break in random ways depending on the data...

We did have to fix a few of the resulting performance issues in Gecko 2, yes.  Wasn't that big a deal.
Comment 3 Eric Seidel (no email) 2011-03-07 15:51:11 PST
I'll dig around.  I swear I implemented text node splitting when doing the new parser.
Comment 4 Alexey Proskuryakov 2011-03-07 16:34:10 PST
I have a patch that I'm testing now.
Comment 5 Alexey Proskuryakov 2011-03-07 16:45:38 PST
Created attachment 84996 [details]
proposed fix

Even if we improve text selection performance, add a quirk for Mail, and decide to not split the nodes, this code will still be needed for the quirk.
Comment 6 Darin Adler 2011-03-07 18:19:10 PST
Comment on attachment 84996 [details]
proposed fix

Looks good.
Comment 7 Eric Seidel (no email) 2011-03-07 18:56:06 PST
Comment on attachment 84996 [details]
proposed fix

Comment 8 WebKit Commit Bot 2011-03-07 19:18:05 PST
Comment on attachment 84996 [details]
proposed fix

Clearing flags on attachment: 84996

Committed r80526: <http://trac.webkit.org/changeset/80526>
Comment 9 WebKit Commit Bot 2011-03-07 19:18:11 PST
All reviewed patches have been landed.  Closing bug.