Bug 79191 - [JSC] The end atom of the marked block should be considered to decide if the cell is live
Summary: [JSC] The end atom of the marked block should be considered to decide if the ...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: Other Linux
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-02-21 21:39 PST by Hojong Han
Modified: 2012-03-07 19:15 PST (History)
5 users (show)

See Also:


Attachments
Patch (1.47 KB, patch)
2012-02-22 01:09 PST, Hojong Han
no flags Details | Formatted Diff | Diff
Patch (1.48 KB, patch)
2012-03-07 16:30 PST, Hojong Han
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Hojong Han 2012-02-21 21:39:51 PST
It's necessary to check the end atom as well as the first atom to decide whether the cell is live,
and GC logic could benefit from the assertion.
Comment 1 Hojong Han 2012-02-22 01:09:08 PST
Created attachment 128150 [details]
Patch
Comment 2 Filip Pizlo 2012-02-26 19:58:00 PST
Comment on attachment 128150 [details]
Patch

This feels strange.  Either it is possible, due to the conservative nature of the stack scans, that we will see a pointer that passes the not-cell-middle test but is nonetheless beyond m_endAtom, or it isn't.  If it is, this patch will make us crash in debug mode and do the right thing in release mode.  If it is not possible, then this patch just adds noise.

So which is it?  Can you justify why you've added code that results in assertion failures for the case that you're claiming to handle?
Comment 3 Hojong Han 2012-02-26 20:55:47 PST
(In reply to comment #2)
> (From update of attachment 128150 [details])
> This feels strange.  Either it is possible, due to the conservative nature of the stack scans, that we will see a pointer that passes the not-cell-middle test but is nonetheless beyond m_endAtom, or it isn't.  If it is, this patch will make us crash in debug mode and do the right thing in release mode.  If it is not possible, then this patch just adds noise.
> 
> So which is it?  Can you justify why you've added code that results in assertion failures for the case that you're claiming to handle?

The situation you described is the reason why I've added code. 
There were some crashes, I've analyzed, caused by a pointer beyond m_endAtom.
That unusual pointer was added to registerFileRoots while gathering conservative roots, but I've not figured out how the pointers beyond m_endAtom were used and could be existed in the register file.
I expect this patch will make crash with more information in debug mode and do the right this in release.
Comment 4 Filip Pizlo 2012-02-26 21:02:58 PST
(In reply to comment #3)
> (In reply to comment #2)
> > (From update of attachment 128150 [details] [details])
> > This feels strange.  Either it is possible, due to the conservative nature of the stack scans, that we will see a pointer that passes the not-cell-middle test but is nonetheless beyond m_endAtom, or it isn't.  If it is, this patch will make us crash in debug mode and do the right thing in release mode.  If it is not possible, then this patch just adds noise.
> > 
> > So which is it?  Can you justify why you've added code that results in assertion failures for the case that you're claiming to handle?
> 
> The situation you described is the reason why I've added code. 
> There were some crashes, I've analyzed, caused by a pointer beyond m_endAtom.
> That unusual pointer was added to registerFileRoots while gathering conservative roots, but I've not figured out how the pointers beyond m_endAtom were used and could be existed in the register file.
> I expect this patch will make crash with more information in debug mode and do the right this in release.

Why not just do the right thing in both debug and release?

I don't like the idea of our GC having ASSERT()'s that are motivated by us not knowing what our GC is doing.  Seems like a bad precedent.
Comment 5 Hojong Han 2012-02-26 21:09:59 PST
(In reply to comment #4)
> (In reply to comment #3)
> > (In reply to comment #2)
> > > (From update of attachment 128150 [details] [details] [details])
> > > This feels strange.  Either it is possible, due to the conservative nature of the stack scans, that we will see a pointer that passes the not-cell-middle test but is nonetheless beyond m_endAtom, or it isn't.  If it is, this patch will make us crash in debug mode and do the right thing in release mode.  If it is not possible, then this patch just adds noise.
> > > 
> > > So which is it?  Can you justify why you've added code that results in assertion failures for the case that you're claiming to handle?
> > 
> > The situation you described is the reason why I've added code. 
> > There were some crashes, I've analyzed, caused by a pointer beyond m_endAtom.
> > That unusual pointer was added to registerFileRoots while gathering conservative roots, but I've not figured out how the pointers beyond m_endAtom were used and could be existed in the register file.
> > I expect this patch will make crash with more information in debug mode and do the right this in release.
> 
> Why not just do the right thing in both debug and release?
> 
> I don't like the idea of our GC having ASSERT()'s that are motivated by us not knowing what our GC is doing.  Seems like a bad precedent.

I got it. 
Then, is it fine that the conditional check for m_endAtom only without ASSERT()'s ??
Comment 6 Filip Pizlo 2012-02-26 21:12:45 PST
(In reply to comment #5)
> (In reply to comment #4)
> > (In reply to comment #3)
> > > (In reply to comment #2)
> > > > (From update of attachment 128150 [details] [details] [details] [details])
> > > > This feels strange.  Either it is possible, due to the conservative nature of the stack scans, that we will see a pointer that passes the not-cell-middle test but is nonetheless beyond m_endAtom, or it isn't.  If it is, this patch will make us crash in debug mode and do the right thing in release mode.  If it is not possible, then this patch just adds noise.
> > > > 
> > > > So which is it?  Can you justify why you've added code that results in assertion failures for the case that you're claiming to handle?
> > > 
> > > The situation you described is the reason why I've added code. 
> > > There were some crashes, I've analyzed, caused by a pointer beyond m_endAtom.
> > > That unusual pointer was added to registerFileRoots while gathering conservative roots, but I've not figured out how the pointers beyond m_endAtom were used and could be existed in the register file.
> > > I expect this patch will make crash with more information in debug mode and do the right this in release.
> > 
> > Why not just do the right thing in both debug and release?
> > 
> > I don't like the idea of our GC having ASSERT()'s that are motivated by us not knowing what our GC is doing.  Seems like a bad precedent.
> 
> I got it. 
> Then, is it fine that the conditional check for m_endAtom only without ASSERT()'s ??

Only if we know that we actually need that check.

Do you have some case where you believe that this check would be necessary?  Should be easy enough to try to reproduce it with an ASSERT() placed in your local copy.  If you can demonstrate that the check is necessary (and I'm willing to believe that it is if I have evidence) then certainly we should commit the check, along with a test case that demonstrates its need!

But I don't want a check that isn't known to be necessary, because its presence may confuse us into later thinking that it *is* necessary.  Makes maintenance hard.
Comment 7 Hojong Han 2012-02-26 21:22:15 PST
(In reply to comment #6)
> Only if we know that we actually need that check.
> 
> Do you have some case where you believe that this check would be necessary?  Should be easy enough to try to reproduce it with an ASSERT() placed in your local copy.  If you can demonstrate that the check is necessary (and I'm willing to believe that it is if I have evidence) then certainly we should commit the check, along with a test case that demonstrates its need!
> 
> But I don't want a check that isn't known to be necessary, because its presence may confuse us into later thinking that it *is* necessary.  Makes maintenance hard.

I'm really appreciated for your kind review and comment.
I'll find out the evidence you need and update status. Thank you.
Comment 8 Hojong Han 2012-02-28 01:04:40 PST
I've tried to find the steps to reproduce the same problem,
but it's very difficult to artificially manipulate GC and memory allocation related to the system.

Concurrently I've been analyzing the crash already occurred.
It tells me that there were many abandoned addresses in the register file.

Do you have ideas about the possibility that unnecessary addresses or values remain in the register file?

The register file grows while processing in JIT code without shrink.
I cannot find any shrink in JIT code as I've looked into it.
If it just grows, there would be many unnecessary things after one of the callframes is done in JIT code.
At last those would affect GC works.

Any comment, advice, and feedback welcomed.
Comment 9 Filip Pizlo 2012-02-28 12:51:21 PST
(In reply to comment #8)
> I've tried to find the steps to reproduce the same problem,
> but it's very difficult to artificially manipulate GC and memory allocation related to the system.
> 
> Concurrently I've been analyzing the crash already occurred.
> It tells me that there were many abandoned addresses in the register file.
> 
> Do you have ideas about the possibility that unnecessary addresses or values remain in the register file?

My immediate instinct is that this is totally unsurprising.  The register file can be quite sparse; for example a function may never touch its temporaries when it executes because it takes a path that doesn't need them.  So stale values can remain on the register file for quite some time.

> 
> The register file grows while processing in JIT code without shrink.
> I cannot find any shrink in JIT code as I've looked into it.

We rarely shrink the register file.  Though its growth should not be unbounded - it should reach steady state.  If you've found cases of unbounded stack growth, then that's a bug.

> If it just grows, there would be many unnecessary things after one of the callframes is done in JIT code.
> At last those would affect GC works.
> 
> Any comment, advice, and feedback welcomed.

The GC should be impervious to stale pointers in the register file.  That's why the stack scan has various checks.  I think the relevant question for this bug is: is the fact that we don't have a special check for a pointer being beyond m_endAtom lead to GC state being corrupted, or is it the case that either (a) the GC doesn't care or (b) the check would be redundant because our other checks (like in-middle-of-cell) already cover that case.
Comment 10 Hojong Han 2012-02-28 17:19:30 PST
Here is some information brought from the core dump.

(JSC::MarkedBlock *) 0x47210000 = {
  <WTF::DoublyLinkedListNode<JSC::MarkedBlock>> = {<No data fields>}, 
  members of JSC::MarkedBlock: 
  static atomSize = 16, 
  static atomShift = 5, 
  static blockSize = 16384, 
  static blockMask = 4294950912, 
  static atomsPerBlock = 1024, 
  static atomMask = 1023, 
  static cardShift = 8, 
  static bytesPerCard = 256, 
  static cardCount = 64, 
  static cardMask = 63, 
  static atomAlignmentMask = 15, 
  m_atomsPerCell = 7, 
  m_endAtom = 1018, 
  m_marks = {
    static wordSize = 32, 
    static words = 32, 
    static one = <optimized out>, 
    bits = {
      m_data = {33818624, 270549121, 2164392968, 135274560, 1082196484, 67637280, 541098242, 33818640, 270549121, 2164392968, 135274560, 1082196484, 67637280, 541098242, 33818640, 
        270549121, 2164392968, 135274560, 1082196484, 67637280, 541098242, 33818640, 270549121, 2164392968, 135274560, 1082196484, 67637280, 541098242, 33818640, 270549121, 
        2164392968, 135274560}
    }
  }, 
  m_state = JSC::MarkedBlock::Marked, 
  m_allocation = {
    <WTF::PageBlock> = {
      m_realBase = 0x47210000, 
      m_base = 0x47210000, 
      m_size = 16384
    }, 
    members of WTF::PageAllocationAligned: 
    m_reservation = {
      m_realBase = 0x4720e000, 
      m_base = 0x4720e000, 
      m_size = 28672
    }
  }, 
  m_heap = 0x5b4be8, 
  m_prev = 0x46508000, 
  m_next = 0x434b4000
}

[Memory dump of the register file]
0x48550700:     0x432533f0	0xfffffffb	0x4640c514	0xfffffffb
0x48550710:     0x00000000	0xfffffffb	0x4640c458	0xfffffffb
0x48550720:     0x026004d0	0xfffffffb	0x428bbeb0	0xfffffffb
0x48550730:     0x428bbeb0	0xfffffffb	0x428bbef0	0xfffffffb
0x48550740:     0x428bbeb0	0xfffffffb	0x00000001	0xffffffff
0x48550750:     0x00000001	0xffffffff	0x41baf3b0	0xfffffffb
0x48550760:     0x426301b0	0xfffffffb	0x428bbe70	0xfffffffb
0x48550770:     0x428bbe30	0xfffffffb	0x00000000	0xfffffffc
0x48550780:     0x426301b0	0xfffffffb	0x00000001	0xffffffff
0x48550790:     0x47213fb0	0xfffffffb	0x00000000	0xfffffffc
0x485507a0:     0x465a88b0	0xfffffffb	0x00000001	0xffffffff
0x485507b0:     0x43eb46b0	0xfffffffb	0x438f2af0	0xfffffffb
0x485507c0:     0x452fd89c	0xfffffffb	0x00c988c8	0xfffffffc
0x485507d0:     0x43eb4670	0xfffffffb	0x43eb4670	0xfffffffb

What I mentioned is 0x47213fb0 in the register file.
This address is fit to the 1019th atom of that MarkedBlock, 
and cannot be filtered by in-middle-of-cell check.
Comment 11 Hojong Han 2012-03-04 16:51:06 PST
(In reply to comment #9)
> (In reply to comment #8)
> The GC should be impervious to stale pointers in the register file.  That's why the stack scan has various checks.  I think the relevant question for this bug is: is the fact that we don't have a special check for a pointer being beyond m_endAtom lead to GC state being corrupted, or is it the case that either (a) the GC doesn't care or (b) the check would be redundant because our other checks (like in-middle-of-cell) already cover that case.

My point is that we don't have a special check for a pointer being beyond m_endAtom lead to GC state being corrupted. 
I wrote some information at comment #10. (I didn't attach the full dump of register file, because it's quite big.)
Could you take a look into that, and let me know what is needed or missed?
Comment 12 Geoffrey Garen 2012-03-07 11:35:56 PST
Comment on attachment 128150 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=128150&action=review

> Source/JavaScriptCore/heap/MarkedBlock.h:324
> +        if (atomNumber > m_endAtom) { // Filters pointers into invalid cells out of the range.

You need to check ">=", not ">". See the definition of m_endAtom:

        size_t m_endAtom; // This is a fuzzy end. Always test for < m_endAtom.

> Source/JavaScriptCore/heap/MarkedBlock.h:325
> +            ASSERT_NOT_REACHED();

Your data say this condition is possible and not an error, so please remove this ASSERT.
Comment 13 Hojong Han 2012-03-07 16:30:55 PST
Created attachment 130720 [details]
Patch
Comment 14 Hojong Han 2012-03-07 16:34:29 PST
(In reply to comment #12)
> (From update of attachment 128150 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=128150&action=review
> 
> > Source/JavaScriptCore/heap/MarkedBlock.h:324
> > +        if (atomNumber > m_endAtom) { // Filters pointers into invalid cells out of the range.
> 
> You need to check ">=", not ">". See the definition of m_endAtom:
> 
>         size_t m_endAtom; // This is a fuzzy end. Always test for < m_endAtom.
> 
> > Source/JavaScriptCore/heap/MarkedBlock.h:325
> > +            ASSERT_NOT_REACHED();
> 
> Your data say this condition is possible and not an error, so please remove this ASSERT.

I applied your review to the patch. Thank you very much.
Comment 15 Geoffrey Garen 2012-03-07 18:21:00 PST
Comment on attachment 130720 [details]
Patch

r=me
Comment 16 WebKit Review Bot 2012-03-07 19:15:16 PST
Comment on attachment 130720 [details]
Patch

Clearing flags on attachment: 130720

Committed r110134: <http://trac.webkit.org/changeset/110134>
Comment 17 WebKit Review Bot 2012-03-07 19:15:20 PST
All reviewed patches have been landed.  Closing bug.