WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
5571
REGRESSION (412.5-TOT): duplicated words/sentences at shakespeer.sourceforge.net
https://bugs.webkit.org/show_bug.cgi?id=5571
Summary
REGRESSION (412.5-TOT): duplicated words/sentences at shakespeer.sourceforge.net
Jérome Foucher
Reported
2005-10-31 02:55:32 PST
I'm running the nightly-build of WebKit version 420, from sunday october 30th The page
http://shakespeer.sourceforge.net/
loads very strangely. Many words/sentences on the page are duplicated. For instance, [[About]] appears 9 times : [[About]] [About]] About]] bout]] out]] ut]] t]] ]] ] This worked fined on the default version of WebKit that runs on 10.4.2 with the latest updates applied.
Attachments
Patch
(3.45 KB, patch)
2005-11-06 11:00 PST
,
Geoffrey Garen
darin
: review+
Details
Formatted Diff
Diff
Layout test
(6.69 KB, text/html)
2005-11-06 11:01 PST
,
Geoffrey Garen
no flags
Details
View All
Add attachment
proposed patch, testcase, etc.
Alexey Proskuryakov
Comment 1
2005-10-31 21:31:24 PST
Confirmed with ToT from October 30th. Setting priority to P1 (regression).
Alexey Proskuryakov
Comment 2
2005-10-31 22:11:35 PST
And yes, this is visible in DOM tree, so probably doesn't belong to Layout and Rendering (thanks to Mitz Pettel for pointing this out).
Jérome Foucher
Comment 3
2005-11-02 01:05:35 PST
It has to be noted that with version 416.12 (part of the 10.4.3 update), the bug occurs with a different behavior : - The page
http://shakespeer.sourceforge.net/
loaded normally with version 412.5 - With version 416.12, the words/sentences are not duplicated, but the links About, News.... that should appear in the left column are drawn as plain text bracketed with [[ ]] Hope this additional comments help.
Darin Adler
Comment 4
2005-11-02 13:20:10 PST
The problem mentioned by Jérome above in 10.4.3 is
bug 5597
.
Darin Adler
Comment 5
2005-11-02 20:21:12 PST
With my patch for
bug 5602
in place, the page doesn't have all the duplicated words, but it still doesn't look right.
mitz
Comment 6
2005-11-03 07:52:40 PST
As I said in
bug 5597
, it looks like TOT has a problem with characters >255 in RegExp (but one that bug 5597's testcase doesn't detect). With the current patch for
bug 5602
in place, if you replace all such characters in regex patterns in the page's source with \u00ff, it works.
Geoffrey Garen
Comment 7
2005-11-05 00:54:36 PST
Looks like 'data' gets corrupted somehow -- probably when the expression compiles. 'data' ends up holding a value of 128 when, as far as I can tell, it should only hold enumerated values of the range 0-4. Why that ends up causing false positives is a thrilling jaunt through pcre_xclass.c that I'll leave as an exercise for the reader.
Geoffrey Garen
Comment 8
2005-11-05 11:00:25 PST
OK. I don't think data is corrupted per se. Rather, we're interpreting data's bits incorrectly. data stores a UTF-8 encoded character class, but we interpret its characters in UTF-16 mode. This has to do with the macros we override to handle UTF-16 input strings. So 128 isn't actually a corrupted opcode; it's just the misinterpreted tail end of the UTF-8 encoding of 0x0100. Say that 10 times fast. I dare you.
Geoffrey Garen
Comment 9
2005-11-06 11:00:18 PST
Created
attachment 4613
[details]
Patch
Geoffrey Garen
Comment 10
2005-11-06 11:01:28 PST
Created
attachment 4614
[details]
Layout test Also renamed an existing layout test for consistency. fast/js is getting kinda crowded.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug