Bug 12586 - PAC file: malloc deadlock sometimes causes a hang @ www.apple.com/pro/profiles/ (12586)
Summary: PAC file: malloc deadlock sometimes causes a hang @ www.apple.com/pro/profile...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore JavaScript (show other bugs)
Version: 420+
Hardware: Macintosh OS X 10.4
: P1 Normal
Assignee: Nobody
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2007-02-04 11:00 PST by Maciej Stachowiak
Modified: 2007-03-07 07:50 PST (History)
0 users

See Also:


Attachments
patch (8.60 KB, patch)
2007-03-06 23:09 PST, Geoffrey Garen
mjs: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Maciej Stachowiak 2007-02-04 11:00:00 PST
2006-06-06 17:10:07 Geoff Garen:
Reason for clone:
Must fix in Leopard, although tcmalloc makes this issue less likely.

<original text: begin>

6/5/06 12:46 PM Chris Petersen:
* SUMMARY
I have encountered a random hang with both 8J123 and 8J2122 when attempting to load  http://www.apple.com/pro/profiles/ when I use a PAC file

* STEPS TO REPRODUCE
1. Download the attached test.pac file
2. Go to System Prefs - Network -Built in Ethernet.  Click Proxies tab and check "Automatic Proxy Configuration".  Click Choose file button and select the test.pac file. Click Apply Now.
3. Go to http://www.apple.com/pro/profiles/ 
4. As the page is loading, Safari will hang for me ( CPU usage shows 191.5 in TOP)

This hang occurs for me on both a Dual G5 and MBP .

* RESULTS
Page should load and no hang should occur but does.

* REGRESSION
Yes in 10.4.7


* NOTES
I have attached the shark sample for the hang.


</original text: end>

2006-06-06 17:10:08 Cloned from problemID rdar://problem/4573918 by: Geoff Garen.

2006-06-06 17:10:07 Geoff Garen:
<original text: begin>

2006-06-05 12:52:17 Geoff Garen:
Am I right in thinking that szone_malloc and szone_free have deadlocked in the attached sample?

2006-06-05 13:03:00 Alice Liu:
that does make sense since the problem is not always reproducible. 

2006-06-05 13:08:43 Geoff Garen:
You can also see that KJS::PropertyMap::mark has called KWQPtrDictImpl::remove on an alternate thread, which is independently bad, because

(1) KJS::PropertyMap::mark doesn't call KWQPtrDictImpl::remove -- maybe something got inlined, though.
(2) This is likely a modification to a global data structure.

2006-06-05 13:20:00 Geoff Garen:
Alice, what I was really asking was, how can szone_malloc/free be to blame here -- wouldn't we see this problem across all threaded applications?

2006-06-05 13:52:37 Brenda Cicerone:
does not hang for me.

2006-06-05 15:08:58 Chris Petersen:
I can reproduce on 8J123 with 25 - 50% of the time with using the attached PAC file setting.
To make the steps easier try this:
1) Add  http://www.apple.com/pro/profiles/  to bookmark bar.
2) Reset Safari, quit and relaunch
3) After default page loads, click on bookmark
4) This page does take a few second to load. If hang does occur while it's loading, repeat Steps 2 -3.

Like I said, this problem occurs anywhere from 25 - 50% of time for me with these steps under 8J123 on my Dual G5.


2006-06-05 15:23:24 Timothy Hatcher:
This will be easier to reproduce on a multi-processor machine since it is a thread deadlock bug.

2006-06-05 15:57:58 Chris Petersen:
FWIW, This isn't occur on TOT on the same Dual G5.

2006-06-05 16:45:00 Chris Petersen:
Attaching samples from two separate hangs on my Dual G5.

2006-06-05 17:23:59 David Harrison:
It looks to me like several threads are hung waiting for the lock that the first thread holds and will never relinquish because it is infinite-looping inside of szone_free. 

A plausible explanation for not leaving szone_free is that the heap is corrupt.

Also, it turns out that DOMNode::mark() calls remove(), so it looks like Sample just missed including DOMNode::mark() in the backtrace.

2006-06-05 17:26:46 David Harrison:
Chris' new samples show the first thread stuck in malloc rather than free.   Does not otherwise change my comments.

2006-06-05 18:17:57 Geoff Garen:
I'm going to try some MallocScribbling on my dual g5 machine tomorrow. If only Safari could run under libgmalloc...

2006-06-06 09:59:36 Timothy Hatcher:
I have tried reproducing with MallocScribble, MallocCheckHeapStart and the GUI MallocDebug. The hang still happens in all of these cases, none caused a crash. MallocCheckHeapStart caused Safari to hang in szone_check called from malloc, lending to Harrison's trashed heap theory. I also tried libgmalloc a couple of times, but the hang never reproduced. Running MallocDebug with the "zero freed memory" option on, the hang never reproduces. Turning this option off and it will reproduce under MallocDebug, but never crashes.

2006-06-06 11:05:40 Geoff Garen:
With the steps above on a dual g5, I actually got a crash in this function:

void *KWQPtrDictImpl::find(void *key) const
{
    return (void *)CFDictionaryGetValue(d->cfdict, key);
}

#3  0x907d8bc4 in __CFDictionaryFindBuckets1a () <--- crash
#4  0x907cda10 in CFDictionaryGetValue ()
#5  0x01710738 in KWQPtrDictImpl::find (this=0x1c3e048, key=0xc8e1300) at /Volumes/Home/Users/ggaren/Labyrinth-Branch/WebCore/kwq/KWQPtrDictImpl.mm:150
#6  0x01ac0858 in QPtrDict<DOM::NodeImpl>::find (this=0x1c3e040, key=0xc8e1300) at /Volumes/Home/Users/ggaren/Labyrinth-Branch/WebCore/kwq/KWQPtrDict.h:51
#7  0x017a86c8 in KJS::DOMNode::mark (this=0xc4be070) at /Volumes/Home/Users/ggaren/Labyrinth-Branch/WebCore/khtml/ecma/kjs_dom.cpp:128
#8  0x01799788 in KJS::DOMObjectsMarker::markOnAlternateThread (this=0xc438c78) at /Volumes/Home/Users/ggaren/Labyrinth-Branch/WebCore/khtml/ecma/kjs_binding.cpp:190

CFShow(this->d->cfdict) made gdb very unhappy, which leads me to believe that the dictionary was either (1) corrupt or (2) uninitialized/prematurely destroyed.

2006-06-06 11:22:06 Geoff Garen:
Got the crash above again. Both times, I quit Safari while the profiles page was loading.

2006-06-06 11:23:17 Geoff Garen:
p this->d
$1 = (KWQPtrDictPrivate *) 0xc5f5ac0
Current language:  auto; currently objective-c++
(gdb) p *$
$2 = {
  cfdict = 0x0, 
  del = 0, 
  iterators = 0x0
}
(gdb) p this
$3 = (const KWQPtrDictImpl * const) 0x1c3e048
(gdb) p *$
$4 = {
  d = 0xc5f5ac0
}

[ up one frame ]
(gdb) p this
$5 = (QPtrDict<DOM::NodeImpl> * const) 0x1c3e040
Current language:  auto; currently c++
(gdb) p *$
$6 = {
  <QPtrCollection> = {
    _vptr$QPtrCollection = 0x1bbdb58, 
    del_item = false
  }, 
  members of QPtrDict<DOM::NodeImpl>: 
  impl = {
    d = 0xbfffc7cc
  }
}

[ up one frame ]
(gdb) p markingRoots
$7 = {
  <QPtrCollection> = {
    _vptr$QPtrCollection = 0x1bbdb58, 
    del_item = false
  }, 
  members of QPtrDict<DOM::NodeImpl>: 
  impl = {
    d = 0xc5f5ac0
  }
}
(gdb) 

2006-06-06 12:42:49 Geoff Garen:
Despite the seeming absurdity, I've confirmed that in the crashing case, the QPtrDict (along with its KWQPtrDictImpl and KWQPtrDictPrivate) has been deallocated (and has not been reallocated at the same address). The QPtrDict is a function static, so that's pretty bizarre.

2006-06-06 14:00:47 Geoff Garen:
We've seen a case of a crash here, where parent() has been deleted:

void InlineBox::remove()
{ 
    if (parent())
        parent()->removeChild(this);
}

This may be the ultimate cause of the heap corruption. Or not. Yay.

2006-06-06 16:48:29 Geoff Garen:
We have a fix. Yay.

2006-06-06 17:08:36 Geoff Garen:
Committed revision 14752.

2006-06-06 17:09:28 Geoff Garen:
'patch.txt' attached.

</original text: end>

Reason for clone:
Must fix in Leopard, although tcmalloc makes this issue less likely.

2006-06-06 17:10:42 Geoff Garen:
Things to do on TOT:
in malloc and other potentially locking functions, assert we're not marking
search for instances of malloc in ::mark
apply branch fix to tot

2006-06-20 10:17:20 Alice Liu:
Safari Leopard BRB Reviewed

2006-06-20 11:14:39 Alice Liu:
Safari WWDC BRB Reviewed

<rdar://problem/4576242>
Comment 1 Geoffrey Garen 2007-03-06 23:09:10 PST
Created attachment 13509 [details]
patch
Comment 2 Maciej Stachowiak 2007-03-07 00:50:09 PST
fastMallocLock / fastMallocUnlock could maybe use better names, like fastMallocAllowed / fastMallocForbidden or somethig - there's no actual locking going on.

+public:
+    bool m_inSubtreeMark : 1;
+

It might be worth adding a comment that now no more flags can be added to Node now without increasing the size of all DOM nodes on 32-bit systems. (There would be 16 in the bitfield plus the short above).

I also wonder if some of the current flags can be removed to give us breathing room again.

r=me
Comment 3 Maciej Stachowiak 2007-03-07 01:00:35 PST
Comment on attachment 13509 [details]
patch

r=me
Comment 4 Geoffrey Garen 2007-03-07 07:50:54 PST
Added a comment to Node.h. Changed names to "fastMallocForbid" and "fastMallocAllow." 

Committed revision 20019.