See bug URL. Selecting near the end of document is extremely slow. This regressed because a fix for bug 12833 has been reverted in order to match HTML5. Current releases of Safari and Firefox limit the size of parser inserted text nodes. This has been forbidden in HTML5 and dropped in implementations because client code can break when trying to match text that crosses an engine node boundary, and we shouldn't surprise JS developers like that. I'm not sure if this was a good tradeoff: 1) Most client code needs to deal with text crossing node boundaries anyway (e.g. for style changes, or for line breaks, or for DOM-inserted nodes). 2) We shouldn't surprise client code developers with huge amounts of text either. Just like we bend backwards to spare them from threading or reentrancy issues, we can limit the harm from non-linear algorithms here. 3) The failure mode for an O(N^2) algorithm on an unlimited length text (freezing) is often worse than missing a match in a huge text due to hitting a boundary. This has also been affecting clients of Objective C WebKit API (<rdar://problem/9013049> for those who can see it). There isn't much difference from JavaScript case, with one twist: when using Objective C autorelease pools, a client can easily take a huge hit in memory consumption, in addition to CPU performance. Regardless of what we do long term, I think that we need to re-introduce the node size limit in WebKit until we can sort out internal performance, and get clients updated.
I gave ap some pointers on IRC to where in the code needs to be changed. Eric seems to think he did this already, but I can't find any evidence of that.
For what it's worth, the most common compat issue we had with the old behavior in Gecko was sites using XHR and then extracting the text from a node in the responseXML. They _could_ use .textContent, but tended to use .firstChild.data, and this would break in random ways depending on the data... We did have to fix a few of the resulting performance issues in Gecko 2, yes. Wasn't that big a deal.
I'll dig around. I swear I implemented text node splitting when doing the new parser.
I have a patch that I'm testing now.
Created attachment 84996 [details] proposed fix Even if we improve text selection performance, add a quirk for Mail, and decide to not split the nodes, this code will still be needed for the quirk.
Comment on attachment 84996 [details] proposed fix Looks good.
Comment on attachment 84996 [details] proposed fix OK.
Comment on attachment 84996 [details] proposed fix Clearing flags on attachment: 84996 Committed r80526: <http://trac.webkit.org/changeset/80526>
All reviewed patches have been landed. Closing bug.