RESOLVED FIXED 5571
REGRESSION (412.5-TOT): duplicated words/sentences at shakespeer.sourceforge.net
https://bugs.webkit.org/show_bug.cgi?id=5571
Summary REGRESSION (412.5-TOT): duplicated words/sentences at shakespeer.sourceforge.net
Jérome Foucher
Reported 2005-10-31 02:55:32 PST
I'm running the nightly-build of WebKit version 420, from sunday october 30th The page http://shakespeer.sourceforge.net/ loads very strangely. Many words/sentences on the page are duplicated. For instance, [[About]] appears 9 times : [[About]] [About]] About]] bout]] out]] ut]] t]] ]] ] This worked fined on the default version of WebKit that runs on 10.4.2 with the latest updates applied.
Attachments
Patch (3.45 KB, patch)
2005-11-06 11:00 PST, Geoffrey Garen
darin: review+
Layout test (6.69 KB, text/html)
2005-11-06 11:01 PST, Geoffrey Garen
no flags
Alexey Proskuryakov
Comment 1 2005-10-31 21:31:24 PST
Confirmed with ToT from October 30th. Setting priority to P1 (regression).
Alexey Proskuryakov
Comment 2 2005-10-31 22:11:35 PST
And yes, this is visible in DOM tree, so probably doesn't belong to Layout and Rendering (thanks to Mitz Pettel for pointing this out).
Jérome Foucher
Comment 3 2005-11-02 01:05:35 PST
It has to be noted that with version 416.12 (part of the 10.4.3 update), the bug occurs with a different behavior : - The page http://shakespeer.sourceforge.net/ loaded normally with version 412.5 - With version 416.12, the words/sentences are not duplicated, but the links About, News.... that should appear in the left column are drawn as plain text bracketed with [[ ]] Hope this additional comments help.
Darin Adler
Comment 4 2005-11-02 13:20:10 PST
The problem mentioned by Jérome above in 10.4.3 is bug 5597.
Darin Adler
Comment 5 2005-11-02 20:21:12 PST
With my patch for bug 5602 in place, the page doesn't have all the duplicated words, but it still doesn't look right.
mitz
Comment 6 2005-11-03 07:52:40 PST
As I said in bug 5597, it looks like TOT has a problem with characters >255 in RegExp (but one that bug 5597's testcase doesn't detect). With the current patch for bug 5602 in place, if you replace all such characters in regex patterns in the page's source with \u00ff, it works.
Geoffrey Garen
Comment 7 2005-11-05 00:54:36 PST
Looks like 'data' gets corrupted somehow -- probably when the expression compiles. 'data' ends up holding a value of 128 when, as far as I can tell, it should only hold enumerated values of the range 0-4. Why that ends up causing false positives is a thrilling jaunt through pcre_xclass.c that I'll leave as an exercise for the reader.
Geoffrey Garen
Comment 8 2005-11-05 11:00:25 PST
OK. I don't think data is corrupted per se. Rather, we're interpreting data's bits incorrectly. data stores a UTF-8 encoded character class, but we interpret its characters in UTF-16 mode. This has to do with the macros we override to handle UTF-16 input strings. So 128 isn't actually a corrupted opcode; it's just the misinterpreted tail end of the UTF-8 encoding of 0x0100. Say that 10 times fast. I dare you.
Geoffrey Garen
Comment 9 2005-11-06 11:00:18 PST
Geoffrey Garen
Comment 10 2005-11-06 11:01:28 PST
Created attachment 4614 [details] Layout test Also renamed an existing layout test for consistency. fast/js is getting kinda crowded.
Note You need to log in before you can comment on or make changes to this bug.