Bug 68990 - [Qt]REGRESSION(r95912): It made sputnik tests flakey
Summary: [Qt]REGRESSION(r95912): It made sputnik tests flakey
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P1 Critical
Assignee: Zoltan Herczeg
URL:
Keywords: Qt, QtTriaged
Depends on: 69102
Blocks: 68764 68860
  Show dependency treegraph
 
Reported: 2011-09-28 03:16 PDT by Csaba Osztrogonác
Modified: 2011-10-03 00:34 PDT (History)
5 users (show)

See Also:


Attachments
minor change (1.73 KB, patch)
2011-09-30 05:51 PDT, Zoltan Herczeg
ggaren: review+
zherczeg: commit-queue?
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Csaba Osztrogonác 2011-09-28 03:16:21 PDT
r95912, r95914(buildfix), r95917(buildfix) and r96068(regression fix)
made sputnik tests flakey on Qt platform.

I think it isn't a QtWebKit bug, but a hidden JSC bug revealed by QtWebKit bots.
Comment 2 Csaba Osztrogonác 2011-09-28 03:29:30 PDT
You can reproduce timeouts with Qt 4.7.4 in release mode 32-bit (Same as our official bot on build.webkit.org) with ORWT if you run all tests. (Unfortunately NRWT runs tests in different order and the bug doesn't occur with it on the official bot.)

I tried revert r95912, r95914, r95917 and r96068 locally and then all tests pass for me.
Comment 3 Csaba Osztrogonác 2011-09-28 03:38:27 PDT
One more thing: This bug only appears on 32 bit x86 platfom and on ARM.
Comment 4 Csaba Osztrogonác 2011-09-28 04:56:10 PDT
I managed to reproduce it in a small example: 

$ Tools/Scripts/old-run-webkit-tests LayoutTests/sputnik/Conformance/15_Native_Objects/15.1_The_Global_Object/15.1.3/15.1.3.1_decodeURI --exit-after-n-failures 1 --verbose

running sputnik/Conformance/15_Native_Objects/15.1_The_Global_Object/15.1.3/15.1.3.1_decodeURI/S15.1.3.1_A1.1_T1.html -> succeeded
running sputnik/Conformance/15_Native_Objects/15.1_The_Global_Object/15.1.3/15.1.3.1_decodeURI/S15.1.3.1_A1.2_T1.html -> succeeded
running sputnik/Conformance/15_Native_Objects/15.1_The_Global_Object/15.1.3/15.1.3.1_decodeURI/S15.1.3.1_A1.2_T2.html -> succeeded
running sputnik/Conformance/15_Native_Objects/15.1_The_Global_Object/15.1.3/15.1.3.1_decodeURI/S15.1.3.1_A1.3_T1.html -> timed out
Comment 5 Csaba Osztrogonác 2011-09-29 03:02:28 PDT
Any GC expert volunteer for this bug? :)
Comment 6 Geoffrey Garen 2011-09-29 11:48:04 PDT
Is there a way to reproduce this on a non-Qt system?
Comment 7 Csaba Osztrogonác 2011-09-29 11:50:34 PDT
(In reply to comment #6)
> Is there a way to reproduce this on a non-Qt system?

I don't know. But Zoltan started to fix it, he confirmed that it is a GC related bug. I think he will provide the fix tomorrow.
Comment 8 Oliver Hunt 2011-09-29 11:52:35 PDT
(In reply to comment #7)
> (In reply to comment #6)
> > Is there a way to reproduce this on a non-Qt system?
> 
> I don't know. But Zoltan started to fix it, he confirmed that it is a GC related bug. I think he will provide the fix tomorrow.

We might be able to fix it if we had any information about what is going wrong -- currently we can't repro, but zoltan has found the bug and hasn't commented on what that bug is so we can't help in any way :-/
Comment 9 Csaba Osztrogonác 2011-09-29 15:00:33 PDT
It seems http://trac.webkit.org/changeset/96354 fixed the bug. But let's wait for Zoltan's confirmation.
Comment 10 Zoltan Herczeg 2011-09-30 00:08:08 PDT
> We might be able to fix it if we had any information about what is going wrong -- currently we can't repro, but zoltan has found the bug and hasn't commented on what that bug is so we can't help in any way :-/

I will check the fix Ossy mentioned. Probably I still need to debug it to see that the fix hides the issue or really fix it. But it is a good lead at least.

ClassInfo.h:66:
 for (const ClassInfo* ci = this; ci; ci = ci->parentClass)

 in this case ci == ci->parentClass, so it is an infinite loop.

This happens after the calling of GC. The 'this' pointer contains JSDOMWindow, namely the JSDOMWindow of S15.1.3.1_A1.2_T1.html. The GC call and infinite loop happens during the run of S15.1.3.1_A1.2_T2.html. And the mentioned parentClass is the 3rd parent.

p structure()->classInfo()->parentClass->parentClass == 0xf12d3fb0
p structure()->classInfo()->parentClass->parentClass->parentClass == 0xf12d3fb0

and this repeats forever.
Comment 11 Zoltan Herczeg 2011-09-30 05:51:24 PDT
Created attachment 109288 [details]
minor change
Comment 12 Zoltan Herczeg 2011-09-30 05:57:35 PDT
I did the debugging. The Structure was freed, but still have references from a "should be freed" object (unused JSDOMWindow). The cell is allocated again and the new memory data cause the infinite loop (it could be a crash of course). After the signed chars changed to int both of them are correctly collected.
Comment 13 Csaba Osztrogonác 2011-09-30 05:59:50 PDT
Comment on attachment 109288 [details]
minor change

View in context: https://bugs.webkit.org/attachment.cgi?id=109288&action=review

> Source/JavaScriptCore/ChangeLog:8
> +        Changint signed char to int in r96354 solved the

typo: Changint -> Changing
Comment 14 Geoffrey Garen 2011-09-30 12:16:55 PDT
Comment on attachment 109288 [details]
minor change

r=me with typo fix.
Comment 15 Zoltan Herczeg 2011-10-03 00:34:44 PDT
Landed in http://trac.webkit.org/changeset/96483