RESOLVED FIXED 18183
Crashes when saving webpage to Web Archive format .webarchive file
https://bugs.webkit.org/show_bug.cgi?id=18183
Summary Crashes when saving webpage to Web Archive format .webarchive file
nobody
Reported 2008-03-28 12:08:49 PDT
Crashes when saving webpage to Web Archive format .webarchive file first noticed in r31370 problem remains in r31388
Attachments
Bug 18183- crash log (63.22 KB, text/plain)
2008-03-28 18:34 PDT, nobody
no flags
Bug 18183- crash log #2 (70.65 KB, text/plain)
2008-03-28 20:09 PDT, nobody
no flags
crash log #3 (33.38 KB, text/plain)
2008-03-28 21:57 PDT, nobody
no flags
bug 18183- 10.5.2 Safe Boot crash log (34.29 KB, text/plain)
2008-03-28 22:10 PDT, nobody
no flags
Bug 18183- no unsanity.txt (58.34 KB, text/plain)
2008-03-28 23:12 PDT, nobody
no flags
Bug 18183- no default folder.txt (56.10 KB, text/plain)
2008-03-28 23:14 PDT, nobody
no flags
another crash log (50.60 KB, text/plain)
2008-03-28 23:15 PDT, nobody
no flags
Bug 18183- screen capture (404.80 KB, image/png)
2008-03-29 07:44 PDT, nobody
no flags
hosts file (modified) (490 bytes, text/plain)
2008-03-29 17:55 PDT, nobody
no flags
Proposed fix (no layout test...) (1.21 KB, patch)
2008-03-30 17:09 PDT, Brady Eidson
mitz: review+
hosts.txt (4.48 KB, text/plain)
2008-04-01 15:48 PDT, nobody
no flags
nobody
Comment 1 2008-03-28 14:28:53 PDT
crash does not occur when saving every webpage. but if crash does occur when saving a particular webpage, it seems to be consistently repeatable with that webpage. example: http://www.time.com/time/politics/article/0,8599,1725514,00.html .html saves OK save as .webarchive causes crash
Matt Lilek
Comment 2 2008-03-28 16:17:13 PDT
This doesn't crash for me. Can you please attach a crash log - http://webkit.org/quality/crashlogs.html
Brady Eidson
Comment 3 2008-03-28 17:00:20 PDT
I can't repro either... if you can, a crash log is critical to explore this...
nobody
Comment 4 2008-03-28 18:34:22 PDT
Created attachment 20178 [details] Bug 18183- crash log Bug 18183- crash log
nobody
Comment 5 2008-03-28 20:09:09 PDT
Created attachment 20180 [details] Bug 18183- crash log #2 crash log from trying to save webpage as .webarchive file here's the webpage: http://www.time.com/time/politics/article/0,8599,1725514,00.html
Brady Eidson
Comment 6 2008-03-28 21:13:34 PDT
I see some haxies in your crashlog, and all such 3rd party extensions are unsupported. Can you try removing them and then reproducing?
nobody
Comment 7 2008-03-28 21:57:32 PDT
Created attachment 20182 [details] crash log #3 here is crash log for same error, this time under 10.5.2, same webpage
nobody
Comment 8 2008-03-28 22:10:03 PDT
Created attachment 20183 [details] bug 18183- 10.5.2 Safe Boot crash log crash when saving same webpage, this time under 10.5.2 with Safe Boot (startup holding "shift" key) (I don't know if this is what you need? hope this helps... thanks)
nobody
Comment 9 2008-03-28 22:35:26 PDT
(In reply to comment #6) > I see some haxies in your crashlog, and all such 3rd party extensions are > unsupported. Can you try removing them and then reproducing? > uh-oh. I did not consciously install any haxies... I don't know where they came from or how to remove them. Is this what I am trying to remove?: com.unsanity.smartcrashreports com.unsanity.menuextraenabler 1.0.3 yuck I feel like my computer has been infected
nobody
Comment 10 2008-03-28 23:12:54 PDT
Created attachment 20186 [details] Bug 18183- no unsanity.txt removed unsanity software
nobody
Comment 11 2008-03-28 23:14:01 PDT
Created attachment 20187 [details] Bug 18183- no default folder.txt removed Default Folder X
nobody
Comment 12 2008-03-28 23:15:02 PDT
Created attachment 20188 [details] another crash log another crash log this bug is very repeatable for me
nobody
Comment 13 2008-03-28 23:41:01 PDT
The crash occurs when saving webpages that contain items that could not be loaded. On my computer, some domains are filtered out. (in my case, these domain names point to localhost rather than their correct host IP address). If the domains are unfiltered, allowed to resolve naturally without blocking the IP, and all items are loaded, then the page can be saved to .webarchive without crashing. I have not tested what happens if the domain name or IP address is merely blocked, or if any item fails to load for any other reason. So far it seems that this is consistent to explain which pages save correctly and which pages cause a crash when saving as .webarchive file I hope this should help to pinpoint the nature of the bug...
nobody
Comment 14 2008-03-29 01:11:12 PDT
for example; http://www.time.com/time/politics/article/0,8599,1725514,00.html looking at Activity window, we see the page loads items from other domains, such as: ad.doubleclick.net ad.insightexpressai.com an.tacoda.net ar.atwola.com bin.clearspring.com cdn1.sphere.com and so on
nobody
Comment 15 2008-03-29 07:44:25 PDT
Created attachment 20191 [details] Bug 18183- screen capture screen capture of Activity window shows items that are not loaded because "cannot connect to host" for certain domains. this behavior is intentional, and expected to occur based on the custom configuration by user. however, this condition seems to be the cause of crashing when attempting to save the webpage as .webarchive file.
nobody
Comment 16 2008-03-29 17:55:56 PDT
Created attachment 20202 [details] hosts file (modified) to block unwanted domains from loading, modify the hosts file as shown in the attachment the hosts file is located at /private/etc/hosts on your Mac note the filename is "hosts" and not "hosts.txt" add a line to the hosts file such as "127.0.0.1 www.someunwanteddomainnamehere.com" you can add as many such lines as you wish this blocks the unwanted domain by resolving to localhost instead of looking up the domain name in DNS. HTH
Matt Lilek
Comment 17 2008-03-29 18:30:18 PDT
Modifying my hosts file still doesn't reproduce this crash - I wonder if this is Tiger-only?
nobody
Comment 18 2008-03-30 00:30:57 PDT
(In reply to comment #17) > Modifying my hosts file still doesn't reproduce this crash - I wonder if this > is Tiger-only? I wondered about it, but Leopard crashes the same as Tiger (for me) Leopard crash log: http://bugs.webkit.org/attachment.cgi?id=20183 (posted previously)
nobody
Comment 19 2008-03-30 10:05:52 PDT
(In reply to comment #13) > I have not tested what happens if the domain name or IP address is merely > blocked, or if any item fails to load for any other reason. I tried saving some web pages that contained items that were not loaded (because they were firewalled normally), and those pages saved OK. Thus, I posted the hosts file info in case that might be helpful, thinking that the hosts blocking technique might be related to the crashing.
Matt Lilek
Comment 20 2008-03-30 10:37:08 PDT
What other web pages does this crash on for you? Does it happen on something as simple as Google?
Matt Lilek
Comment 21 2008-03-30 10:39:28 PDT
(In reply to comment #20) > What other web pages does this crash on for you? Does it happen on something > as simple as Google? > Also, when you say some things were "firewalled normally" what exactly do you mean? Are you going thru a proxy that blocks certain things (like ads) or how extensive is your hosts file? Does this still happen when you have a straight thru, unfiltered connection to these sites?
nobody
Comment 22 2008-03-30 12:29:32 PDT
(In reply to comment #20) > What other web pages does this crash on for you? Does it happen on something > as simple as Google? no, Google saves OK. The Acid 3 page saves OK, also. Drudge Report (a simple page) with no blocking saves OK. but Drudge Report with the adgardener.com domains blocked in hosts crashes. The www.time.com page above behaves the same way. So far, every page without hosts blocking saves OK. Every page that has crashed had items from domains that were blocked in hosts. I found a page that has hosts blocking but saves OK, though: http://www.cnn.com/2008/US/03/30/dith.pran.obit.ap/index.html with servedby.advertising.com and view.atmdt.com blocked by hosts saves OK. These blocked items are shown in Activity folded under outline triangles, I don't know if that means anything or makes a difference.
nobody
Comment 23 2008-03-30 12:45:00 PDT
(In reply to comment #21) If some item(s) is not loaded (can't connect, or whatever) due to "natural causes", the page saves OK. If some item(s) is not loaded due to hosts blocking, saving the page to .webarchive format crashes (almost always). If some item(s) is not loaded due to firewall blocking (tried Little Snitch to block specific domains), the page saves OK. My hosts file lists approx 50 domains to block. But the results were the same when I tested it with just one domain, in the attempt to test the case which I hoped you and others might be able to repeat.
Brady Eidson
Comment 24 2008-03-30 17:06:58 PDT
Sorry I haven't responded to this one in a few days. While I still can't reproduce under any circumstances, I finally got a chance to look at the code - and it's a simple null dereference.
Brady Eidson
Comment 25 2008-03-30 17:09:42 PDT
Created attachment 20227 [details] Proposed fix (no layout test...) Attached the obvious fix - but since I can't repro, I don't know how to make a layout test for this...
nobody
Comment 26 2008-03-31 07:20:52 PDT
(In reply to comment #25) > Attached the obvious fix - but since I can't repro, I don't know how to make a > layout test for this... thanks! is it possible to download a build containing this fix? i would be happy to give it a try.
Matt Lilek
Comment 27 2008-03-31 07:25:56 PDT
(In reply to comment #26) > (In reply to comment #25) > > Attached the obvious fix - but since I can't repro, I don't know how to make a > > layout test for this... > > thanks! > > is it possible to download a build containing this fix? i would be happy to > give it a try. > Not yet - the patch should be reviewed and landed today and will hopefully appear in a nightly soon.
mitz
Comment 28 2008-03-31 10:02:41 PDT
Comment on attachment 20227 [details] Proposed fix (no layout test...) r=me
Brady Eidson
Comment 29 2008-03-31 10:07:20 PDT
Landed in r31467
nobody
Comment 30 2008-04-01 15:44:33 PDT
tested r31535 the crashing problem is gone (as expected). thanks! but blocked items are not appearing in the Activity window. (a small percentage of blocked items are shown, however). also, the Status bar is underreporting errors, because blocked items that do not appear in the Activity window are not reported as errors. a few examples: on www.drudgereport.com: everything blocked other than drudgereport.com and d.yimg.com harvest.adgardener.com items appear in Activity correctly ("can't connect to host") and is reported correctly as an error in Status Bar. other blocked domains do not appear in Activity and are not counted as errors. on www.cnn.com: mostly everything blocked, www.cnn.com, i.cdn.turner.com, i.l.cnn.net not blocked. metrics.cnn.com is blocked- it is the only blocked domain that appears in Activity window and counted as error. http://www.time.com/time/politics/article/0,8599,1725514,00.html : everything blocked except www.time.com and img.timeinc.net no blocked items appear in Activity window, no errors
nobody
Comment 31 2008-04-01 15:48:53 PDT
Created attachment 20274 [details] hosts.txt here is sample hosts file contains domains blocked for examples described in #30 above.
Brady Eidson
Comment 32 2008-04-01 17:14:14 PDT
You are now describing a completely different issue. If you could, please write up a new bug with the new issue. Also, please make sure you compare the behavior of shipping Safari 3.1 to the latest nightly to see if they differ. Thanks!
nobody
Comment 33 2008-04-01 17:20:47 PDT
(In reply to comment #32) > If you could, please > write up a new bug with the new issue. OK, will do... Thanks! (In reply to comment #32) > Also, please make sure you compare the behavior of shipping Safari 3.1 to the > latest nightly to see if they differ. Yes, Safari 3.1 reports the Activity ("can't connect to host") correctly.
nobody
Comment 34 2008-04-01 18:11:35 PDT
> write up a new bug with the new issue. added as bug #18267
Note You need to log in before you can comment on or make changes to this bug.