5571 – REGRESSION (412.5-TOT): duplicated words/sentences at shakespeer.sourceforge.net

RESOLVED FIXED 5571

REGRESSION (412.5-TOT): duplicated words/sentences at shakespeer.sourceforge.net

https://bugs.webkit.org/show_bug.cgi?id=5571

Summary REGRESSION (412.5-TOT): duplicated words/sentences at shakespeer.sourceforge.net

Jérome Foucher

Reported 2005-10-31 02:55:32 PST

I'm running the nightly-build of WebKit version 420, from sunday october 30th The page http://shakespeer.sourceforge.net/ loads very strangely. Many words/sentences on the page are duplicated. For instance, [[About]] appears 9 times : [[About]] [About]] About]] bout]] out]] ut]] t]] ]] ] This worked fined on the default version of WebKit that runs on 10.4.2 with the latest updates applied.

Attachments
Patch (3.45 KB, patch) 2005-11-06 11:00 PST, Geoffrey Garen	darin: review+	Details Formatted Diff Diff
Layout test (6.69 KB, text/html) 2005-11-06 11:01 PST, Geoffrey Garen	no flags	Details
View All Add attachment proposed patch, testcase, etc.

Alexey Proskuryakov

Comment 1 2005-10-31 21:31:24 PST

Confirmed with ToT from October 30th. Setting priority to P1 (regression).

Alexey Proskuryakov

Comment 2 2005-10-31 22:11:35 PST

And yes, this is visible in DOM tree, so probably doesn't belong to Layout and Rendering (thanks to Mitz Pettel for pointing this out).

Jérome Foucher

Comment 3 2005-11-02 01:05:35 PST

It has to be noted that with version 416.12 (part of the 10.4.3 update), the bug occurs with a different behavior : - The page http://shakespeer.sourceforge.net/ loaded normally with version 412.5 - With version 416.12, the words/sentences are not duplicated, but the links About, News.... that should appear in the left column are drawn as plain text bracketed with [[ ]] Hope this additional comments help.

Darin Adler

Comment 4 2005-11-02 13:20:10 PST

The problem mentioned by Jérome above in 10.4.3 is bug 5597.

Darin Adler

Comment 5 2005-11-02 20:21:12 PST

With my patch for bug 5602 in place, the page doesn't have all the duplicated words, but it still doesn't look right.

mitz

Comment 6 2005-11-03 07:52:40 PST

As I said in bug 5597, it looks like TOT has a problem with characters >255 in RegExp (but one that bug 5597's testcase doesn't detect). With the current patch for bug 5602 in place, if you replace all such characters in regex patterns in the page's source with \u00ff, it works.

Geoffrey Garen

Comment 7 2005-11-05 00:54:36 PST

Looks like 'data' gets corrupted somehow -- probably when the expression compiles. 'data' ends up holding a value of 128 when, as far as I can tell, it should only hold enumerated values of the range 0-4. Why that ends up causing false positives is a thrilling jaunt through pcre_xclass.c that I'll leave as an exercise for the reader.

Geoffrey Garen

Comment 8 2005-11-05 11:00:25 PST

OK. I don't think data is corrupted per se. Rather, we're interpreting data's bits incorrectly. data stores a UTF-8 encoded character class, but we interpret its characters in UTF-16 mode. This has to do with the macros we override to handle UTF-16 input strings. So 128 isn't actually a corrupted opcode; it's just the misinterpreted tail end of the UTF-8 encoding of 0x0100. Say that 10 times fast. I dare you.

Geoffrey Garen

Comment 9 2005-11-06 11:00:18 PST

Created attachment 4613 [details] Patch

Geoffrey Garen

Comment 10 2005-11-06 11:01:28 PST

Created attachment 4614 [details] Layout test Also renamed an existing layout test for consistency. fast/js is getting kinda crowded.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P1

Severity Critical

Classification Unclassified

Version 420+

Hardware Mac

OS OS X 10.4

Product WebKit

Component JavaScriptCore

Assignee

Geoffrey Garen

Reported

2005-10-31 02:55 PST

Modified

2005-11-06 12:43 PST History

CC List

1 user Show

URL

http://shakespeer.sourceforge.net/

Keywords

Depends on

5597 5602

Blocks

Dependencies

tree graph