Bug 91354 - Machine stack marker should not gather values already piled up on the stack before JSC working
Summary: Machine stack marker should not gather values already piled up on the stack b...
Status: UNCONFIRMED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: Other Linux
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-07-15 18:53 PDT by Hojong Han
Modified: 2012-11-28 04:09 PST (History)
4 users (show)

See Also:


Attachments
Patch (3.84 KB, patch)
2012-07-15 20:00 PDT, Hojong Han
barraclough: review-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Hojong Han 2012-07-15 18:53:19 PDT
There are already many values on stack before going through main function of WebProcess.
Those values could cause copied blocks to be pinned while GC
and the number of these unnecessary pinned blocks could increase.
Comment 1 Hojong Han 2012-07-15 20:00:06 PDT
Created attachment 152470 [details]
Patch
Comment 2 Gavin Barraclough 2012-07-16 11:41:38 PDT
Comment on attachment 152470 [details]
Patch

I don't think this change is valid.  The StackBounds will be captured on the first call into WebCore, when the thread data is initialized.  This may occur at a deep stack depth.  Subsequent calls into WebCore may occur at a much shallower stack depth.  It certainly is the case that there may be some range of the stack that we currently mark, which one could safely avoid marking – but I don't think this patch safely determines what that line is (also, I'm not sure how one would safely define this high water mark).
Comment 3 Hojong Han 2012-08-12 23:39:58 PDT
(In reply to comment #2)
> (From update of attachment 152470 [details])
> I don't think this change is valid.  The StackBounds will be captured on the first call into WebCore, when the thread data is initialized.  This may occur at a deep stack depth.  
Setting top occurs at a deep stack depth because it's followed after initializingThread(), but not deep as much as origin which is the bottom of the stack.

> Subsequent calls into WebCore may occur at a much shallower stack depth.  
I think subsequent calls at a shallower stack depth do not matter of JSC working because in most cased it's deep enough from the perspective of JSC.

> It certainly is the case that there may be some range of the stack that we currently mark, which one could safely avoid marking – but I don't think this patch safely determines what that line is (also, I'm not sure how one would safely define this high water mark).
Do you think it's hardly defined to set high water mark?
Plz give me any feedback. thanks
Comment 4 Geoffrey Garen 2012-08-13 09:28:30 PDT
> Those values could cause copied blocks to be pinned while GC
> and the number of these unnecessary pinned blocks could increase.

Is there any evidence that this actually happens? Please be specific.
Comment 5 SangGyu Lee 2012-08-13 23:27:56 PDT
It actually happens on our linux device.

Here are the specific information.

If we run V8 benchmark (http://v8.googlecode.com/svn/data/benchmarks/v7/run.html) 
and Dromaeo DOM (http://dromaeo.com/?dom) in turns repeatedly, in 4 iterations, we always find OOM.

After investigation, we found that pinned blocks are continually increased. it is pinned by MachineStackMarker.

In MachineThreads::gatherFromCurrentThread(), [stackBegin, stackEnd] is [0xbe8833c4, 0xbe884000].

Here are logs in scanning stack range.

StackBegin ---> genericAddSpan: [0xbe8833d4] = 0xfffffffb
                genericAddSpan: [0xbe8833d8] = 0xbe8834b0
                genericAddSpan: [0xbe8833dc] = 0x44e0c478
                genericAddSpan: [0xbe8833e0] = 0x44e0c4f8
                genericAddSpan: [0xbe8833e4] = (nil)

StackPointer                ...
at WebProcess
StartUp*(a)
(0xbe883824)--> genericAddSpan: [0xbe883824] = 0x40093000
                            ...
                genericAddSpan: [0xbe883b40] = 0x53005150, genericAddPointer: 0x53005150 is pinned [ 0] *(b)
                genericAddSpan: [0xbe883bc0] = 0x6f6f723d, genericAddPointer: 0x6f6f723d is pinned [ 1] 
                genericAddSpan: [0xbe883bc4] = 0x554d0074, genericAddPointer: 0x554d0074 is pinned [ 2]
                genericAddSpan: [0xbe883be0] = 0x6e735f6e, genericAddPointer: 0x6e735f6e is pinned [ 3]
                genericAddSpan: [0xbe883c2c] = 0x6d2f746f, genericAddPointer: 0x6d2f746f is pinned [ 4]
                genericAddSpan: [0xbe883c34] = 0x6972706b, genericAddPointer: 0x6972706b is pinned [ 5] 
                genericAddSpan: [0xbe883c70] = 0x653e4955, genericAddPointer: 0x653e4955 is pinned [ 6]
                genericAddSpan: [0xbe883c7c] = 0x7461643a, genericAddPointer: 0x7461643a is pinned [ 7]
                genericAddSpan: [0xbe883c94] = 0x7461643e, genericAddPointer: 0x7461643e is pinned [ 8]
                genericAddSpan: [0xbe883cbc] = 0x52415453, genericAddPointer: 0x52415453 is pinned [ 9]
                genericAddSpan: [0xbe883d1c] = 0x6f6f723d, genericAddPointer: 0x6f6f723d is pinned [10]
                genericAddSpan: [0xbe883d64] = 0x5f00656c, genericAddPointer: 0x5f00656c is pinned [11]
                genericAddSpan: [0xbe883d94] = 0x5750444c, genericAddPointer: 0x5750444c is pinned [12]
                genericAddSpan: [0xbe883dfc] = 0x4f495443, genericAddPointer: 0x4f495443 is pinned [13]
                genericAddSpan: [0xbe883e20] = 0x4f495443, genericAddPointer: 0x4f495443 is pinned [14]
                genericAddSpan: [0xbe883e5c] = 0x555f6e65, genericAddPointer: 0x555f6e65 is pinned [15]
                genericAddSpan: [0xbe883eb0] = 0x52415445, genericAddPointer: 0x52415445 is pinned [16]
                genericAddSpan: [0xbe883f20] = 0x52555341, genericAddPointer: 0x52555341 is pinned [17]
                genericAddSpan: [0xbe883f48] = 0x555f6e65, genericAddPointer: 0x555f6e65 is pinned [18]
                genericAddSpan: [0xbe883f88] = 0x52474d46, genericAddPointer: 0x52474d46 is pinned [19]
                genericAddSpan: [0xbe883f90] = 0x59545f4b, genericAddPointer: 0x59545f4b is pinned [20]
                genericAddSpan: [0xbe883fa0] = 0x52474d46, genericAddPointer: 0x52474d46 is pinned [21]
                           ...
StackEnd   -->
(0xbe884000)

        (a) I showed stack pointer (=0xbe883824) at WebProcess StartUp.
        (b) [ #] means the number of pinned pointer.

In these slots, many slots are not multiple of 4. They seems not to be exact pointers to CopiedBlock's storage.

I think (stackPointerAtWebProcessStartUp, stackEnd) = [0xbe883824, 0xbe884000) cannot contain pointer to JavaScript CopiedBlock. Because it is before constructing and initializing JavaScript Heap and WebProcessMain's main function cannot exit before WebProcess exit.

However current implementation scan these ranges, and like above, some values in these range makes number of pinned object, and finally make OOM.

If I make change to scan only [Stackpointer at WebProcessStartUp, StackBegin]. 
( apply to WebProcess's main thread only, not apply to WebWorker thread ), OOM disappears.
Comment 6 Filip Pizlo 2012-08-14 00:16:02 PDT
(In reply to comment #5)
> It actually happens on our linux device.
> 
> Here are the specific information.
> 
> If we run V8 benchmark (http://v8.googlecode.com/svn/data/benchmarks/v7/run.html) 
> and Dromaeo DOM (http://dromaeo.com/?dom) in turns repeatedly, in 4 iterations, we always find OOM.
> 
> After investigation, we found that pinned blocks are continually increased. it is pinned by MachineStackMarker.
> 
> In MachineThreads::gatherFromCurrentThread(), [stackBegin, stackEnd] is [0xbe8833c4, 0xbe884000].
> 
> Here are logs in scanning stack range.
> 
> StackBegin ---> genericAddSpan: [0xbe8833d4] = 0xfffffffb
>                 genericAddSpan: [0xbe8833d8] = 0xbe8834b0
>                 genericAddSpan: [0xbe8833dc] = 0x44e0c478
>                 genericAddSpan: [0xbe8833e0] = 0x44e0c4f8
>                 genericAddSpan: [0xbe8833e4] = (nil)
> 
> StackPointer                ...
> at WebProcess
> StartUp*(a)
> (0xbe883824)--> genericAddSpan: [0xbe883824] = 0x40093000
>                             ...
>                 genericAddSpan: [0xbe883b40] = 0x53005150, genericAddPointer: 0x53005150 is pinned [ 0] *(b)
>                 genericAddSpan: [0xbe883bc0] = 0x6f6f723d, genericAddPointer: 0x6f6f723d is pinned [ 1] 
>                 genericAddSpan: [0xbe883bc4] = 0x554d0074, genericAddPointer: 0x554d0074 is pinned [ 2]
>                 genericAddSpan: [0xbe883be0] = 0x6e735f6e, genericAddPointer: 0x6e735f6e is pinned [ 3]
>                 genericAddSpan: [0xbe883c2c] = 0x6d2f746f, genericAddPointer: 0x6d2f746f is pinned [ 4]
>                 genericAddSpan: [0xbe883c34] = 0x6972706b, genericAddPointer: 0x6972706b is pinned [ 5] 
>                 genericAddSpan: [0xbe883c70] = 0x653e4955, genericAddPointer: 0x653e4955 is pinned [ 6]
>                 genericAddSpan: [0xbe883c7c] = 0x7461643a, genericAddPointer: 0x7461643a is pinned [ 7]
>                 genericAddSpan: [0xbe883c94] = 0x7461643e, genericAddPointer: 0x7461643e is pinned [ 8]
>                 genericAddSpan: [0xbe883cbc] = 0x52415453, genericAddPointer: 0x52415453 is pinned [ 9]
>                 genericAddSpan: [0xbe883d1c] = 0x6f6f723d, genericAddPointer: 0x6f6f723d is pinned [10]
>                 genericAddSpan: [0xbe883d64] = 0x5f00656c, genericAddPointer: 0x5f00656c is pinned [11]
>                 genericAddSpan: [0xbe883d94] = 0x5750444c, genericAddPointer: 0x5750444c is pinned [12]
>                 genericAddSpan: [0xbe883dfc] = 0x4f495443, genericAddPointer: 0x4f495443 is pinned [13]
>                 genericAddSpan: [0xbe883e20] = 0x4f495443, genericAddPointer: 0x4f495443 is pinned [14]
>                 genericAddSpan: [0xbe883e5c] = 0x555f6e65, genericAddPointer: 0x555f6e65 is pinned [15]
>                 genericAddSpan: [0xbe883eb0] = 0x52415445, genericAddPointer: 0x52415445 is pinned [16]
>                 genericAddSpan: [0xbe883f20] = 0x52555341, genericAddPointer: 0x52555341 is pinned [17]
>                 genericAddSpan: [0xbe883f48] = 0x555f6e65, genericAddPointer: 0x555f6e65 is pinned [18]
>                 genericAddSpan: [0xbe883f88] = 0x52474d46, genericAddPointer: 0x52474d46 is pinned [19]
>                 genericAddSpan: [0xbe883f90] = 0x59545f4b, genericAddPointer: 0x59545f4b is pinned [20]
>                 genericAddSpan: [0xbe883fa0] = 0x52474d46, genericAddPointer: 0x52474d46 is pinned [21]
>                            ...
> StackEnd   -->
> (0xbe884000)
> 
>         (a) I showed stack pointer (=0xbe883824) at WebProcess StartUp.
>         (b) [ #] means the number of pinned pointer.
> 
> In these slots, many slots are not multiple of 4. They seems not to be exact pointers to CopiedBlock's storage.
> 
> I think (stackPointerAtWebProcessStartUp, stackEnd) = [0xbe883824, 0xbe884000) cannot contain pointer to JavaScript CopiedBlock. Because it is before constructing and initializing JavaScript Heap and WebProcessMain's main function cannot exit before WebProcess exit.
> 
> However current implementation scan these ranges, and like above, some values in these range makes number of pinned object, and finally make OOM.
> 
> If I make change to scan only [Stackpointer at WebProcessStartUp, StackBegin]. 
> ( apply to WebProcess's main thread only, not apply to WebWorker thread ), OOM disappears.

If a particular range of stack reproducibly produces increasing numbers of pointers into the copied space, then it's highly unlikely that this is due to coincidental pointers.  I think you should figure out exactly why that region of stack tends to point into the heap.

But more importantly, I don't believe that your change is sound.  You haven't proved that a pointer can't legitimately escape into the range of stack you're talking about.  If a pointer does legitimately escape and you don't pin the relevant blocks, then bad things will happen.

You can of course do a number of other things, other than your current patch, that could overcome this problem.  Two classical approaches exist:

1) Blacklisting.  Whenever you see a value that points to a region of virtual memory in which you have not yet allocated blocks but in which you may allocate blocks in the future, then mark that block as unusable due to the risk of false-positives.  See http://www.hpl.hp.com/personal/Hans_Boehm/gc/gcdescr.html and http://www.hpl.hp.com/personal/Hans_Boehm/gc/papers/pldi93.ps.Z

2) Conditional pinning for the C stack.  A pointer directly into the copied space may only exist on the C stack if there is, at time of GC, already a GC object in marked space that also points to the same copied space block.  This is not true of the JS stack, so this will only work for the C stack.  Hence, if a copied block is pinned from the C stack, you can recycle it at end of GC if no object in marked space pointed to the same copied space chunk that the C stack pointer pointed to.

Please either clarify exactly why the C stack range you are worried about has increasing numbers of pointers into the copied space, or implement one of the two other approaches.
Comment 7 Hojong Han 2012-11-28 04:09:37 PST
(In reply to comment #6)
It's been a long time since relevant comments were updated but I still want to figure it out and fix this correctly.
There are many environment variables from the top of the stack. These are sequential characters and considered as pointers during GC. For example, 0x726F6F74 means ascii "root". If it were just considered as address, It would cause a copied block made later to be pinned. This is all about what happens here.
> 
> If a particular range of stack reproducibly produces increasing numbers of pointers into the copied space, then it's highly unlikely that this is due to coincidental pointers.  I think you should figure out exactly why that region of stack tends to point into the heap.
> 
> But more importantly, I don't believe that your change is sound.  You haven't proved that a pointer can't legitimately escape into the range of stack you're talking about.  If a pointer does legitimately escape and you don't pin the relevant blocks, then bad things will happen.
> 
> You can of course do a number of other things, other than your current patch, that could overcome this problem.  Two classical approaches exist:
> 
> 1) Blacklisting.  Whenever you see a value that points to a region of virtual memory in which you have not yet allocated blocks but in which you may allocate blocks in the future, then mark that block as unusable due to the risk of false-positives.  See http://www.hpl.hp.com/personal/Hans_Boehm/gc/gcdescr.html and http://www.hpl.hp.com/personal/Hans_Boehm/gc/papers/pldi93.ps.Z
> 
> 2) Conditional pinning for the C stack.  A pointer directly into the copied space may only exist on the C stack if there is, at time of GC, already a GC object in marked space that also points to the same copied space block.  This is not true of the JS stack, so this will only work for the C stack.  Hence, if a copied block is pinned from the C stack, you can recycle it at end of GC if no object in marked space pointed to the same copied space chunk that the C stack pointer pointed to.
> 
Do you think that your second idea is still good for this?
What about setting another origin just for machine stack marker right after WebProcess main runs?
> Please either clarify exactly why the C stack range you are worried about has increasing numbers of pointers into the copied space, or implement one of the two other approaches.