Bug 5571 - REGRESSION (412.5-TOT): duplicated words/sentences at shakespeer.sourceforge.net
Summary: REGRESSION (412.5-TOT): duplicated words/sentences at shakespeer.sourceforge.net
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 420+
Hardware: Mac OS X 10.4
: P1 Critical
Assignee: Geoffrey Garen
URL: http://shakespeer.sourceforge.net/
Keywords:
Depends on: 5597 5602
Blocks:
  Show dependency treegraph
 
Reported: 2005-10-31 02:55 PST by Jérome Foucher
Modified: 2005-11-06 12:43 PST (History)
1 user (show)

See Also:


Attachments
Patch (3.45 KB, patch)
2005-11-06 11:00 PST, Geoffrey Garen
darin: review+
Details | Formatted Diff | Diff
Layout test (6.69 KB, text/html)
2005-11-06 11:01 PST, Geoffrey Garen
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jérome Foucher 2005-10-31 02:55:32 PST
I'm running the nightly-build of WebKit version 420, from sunday october 30th

The page http://shakespeer.sourceforge.net/ loads very strangely. Many words/sentences on the page 
are duplicated.
For instance, [[About]] appears 9 times :
[[About]]
[About]]
About]]
bout]]
out]]
ut]]
t]]
]]
]

This worked fined on the default version of WebKit that runs on 10.4.2 with the latest updates applied.
Comment 1 Alexey Proskuryakov 2005-10-31 21:31:24 PST
Confirmed with ToT from October 30th. Setting priority to P1 (regression).
Comment 2 Alexey Proskuryakov 2005-10-31 22:11:35 PST
And yes, this is visible in DOM tree, so probably doesn't belong to Layout and Rendering (thanks to Mitz 
Pettel for pointing this out).
Comment 3 Jérome Foucher 2005-11-02 01:05:35 PST
It has to be noted that with version 416.12 (part of the 10.4.3 update), the bug occurs with a different 
behavior :

- The page http://shakespeer.sourceforge.net/ loaded normally with version 412.5

- With version 416.12, the words/sentences are not duplicated, but the links About, News.... that should 
appear in the left column are drawn as plain text bracketed with [[ ]]

Hope this additional comments help.
Comment 4 Darin Adler 2005-11-02 13:20:10 PST
The problem mentioned by Jérome above in 10.4.3 is bug 5597.
Comment 5 Darin Adler 2005-11-02 20:21:12 PST
With my patch for bug 5602 in place, the page doesn't have all the duplicated words, but it still doesn't 
look right.
Comment 6 mitz 2005-11-03 07:52:40 PST
As I said in bug 5597, it looks like TOT has a problem with characters >255 in RegExp (but one that bug 
5597's testcase doesn't detect). With the current patch for bug 5602 in place, if you replace all such 
characters in regex patterns in the page's source with \u00ff, it works.
Comment 7 Geoffrey Garen 2005-11-05 00:54:36 PST
Looks like 'data' gets corrupted somehow -- probably when the expression compiles. 'data' ends up 
holding a value of 128 when, as far as I can tell, it should only hold enumerated values of the range 0-4. 
Why that ends up causing false positives is a thrilling jaunt through pcre_xclass.c that I'll leave as an 
exercise for the reader.
Comment 8 Geoffrey Garen 2005-11-05 11:00:25 PST
OK. I don't think data is corrupted per se. Rather, we're interpreting data's bits incorrectly. data stores a 
UTF-8 encoded character class, but we interpret its characters in UTF-16 mode. This has to do with the 
macros we override to handle UTF-16 input strings. So 128 isn't actually a corrupted opcode; it's just the 
misinterpreted tail end of the UTF-8 encoding of 0x0100. Say that 10 times fast. I dare you.
Comment 9 Geoffrey Garen 2005-11-06 11:00:18 PST
Created attachment 4613 [details]
Patch
Comment 10 Geoffrey Garen 2005-11-06 11:01:28 PST
Created attachment 4614 [details]
Layout test

Also renamed an existing layout test for consistency. fast/js is getting kinda
crowded.