Bug 18183 - Crashes when saving webpage to Web Archive format .webarchive file
Summary: Crashes when saving webpage to Web Archive format .webarchive file
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: Mac OS X 10.4
: P1 Blocker
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-03-28 12:08 PDT by nobody
Modified: 2008-04-01 18:11 PDT (History)
1 user (show)

See Also:


Attachments
Bug 18183- crash log (63.22 KB, text/plain)
2008-03-28 18:34 PDT, nobody
no flags Details
Bug 18183- crash log #2 (70.65 KB, text/plain)
2008-03-28 20:09 PDT, nobody
no flags Details
crash log #3 (33.38 KB, text/plain)
2008-03-28 21:57 PDT, nobody
no flags Details
bug 18183- 10.5.2 Safe Boot crash log (34.29 KB, text/plain)
2008-03-28 22:10 PDT, nobody
no flags Details
Bug 18183- no unsanity.txt (58.34 KB, text/plain)
2008-03-28 23:12 PDT, nobody
no flags Details
Bug 18183- no default folder.txt (56.10 KB, text/plain)
2008-03-28 23:14 PDT, nobody
no flags Details
another crash log (50.60 KB, text/plain)
2008-03-28 23:15 PDT, nobody
no flags Details
Bug 18183- screen capture (404.80 KB, image/png)
2008-03-29 07:44 PDT, nobody
no flags Details
hosts file (modified) (490 bytes, text/plain)
2008-03-29 17:55 PDT, nobody
no flags Details
Proposed fix (no layout test...) (1.21 KB, patch)
2008-03-30 17:09 PDT, Brady Eidson
mitz: review+
Details | Formatted Diff | Diff
hosts.txt (4.48 KB, text/plain)
2008-04-01 15:48 PDT, nobody
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description nobody 2008-03-28 12:08:49 PDT
Crashes when saving webpage to Web Archive format .webarchive file

first noticed in r31370

problem remains in r31388
Comment 1 nobody 2008-03-28 14:28:53 PDT
crash does not occur when saving every webpage.

but if crash does occur when saving a particular webpage, it seems to be consistently repeatable with that webpage.

example:
http://www.time.com/time/politics/article/0,8599,1725514,00.html

.html saves OK
save as .webarchive causes crash
Comment 2 Matt Lilek 2008-03-28 16:17:13 PDT
This doesn't crash for me.  Can you please attach a crash log - http://webkit.org/quality/crashlogs.html
Comment 3 Brady Eidson 2008-03-28 17:00:20 PDT
I can't repro either...  if you can, a crash log is critical to explore this...
Comment 4 nobody 2008-03-28 18:34:22 PDT
Created attachment 20178 [details]
Bug 18183- crash log

Bug 18183- crash log
Comment 5 nobody 2008-03-28 20:09:09 PDT
Created attachment 20180 [details]
Bug 18183- crash log #2

crash log from trying to save webpage as .webarchive file

here's the webpage:
http://www.time.com/time/politics/article/0,8599,1725514,00.html
Comment 6 Brady Eidson 2008-03-28 21:13:34 PDT
I see some haxies in your crashlog, and all such 3rd party extensions are unsupported.  Can you try removing them and then reproducing?
Comment 7 nobody 2008-03-28 21:57:32 PDT
Created attachment 20182 [details]
crash log #3

here is crash log for same error, this time under 10.5.2, same webpage
Comment 8 nobody 2008-03-28 22:10:03 PDT
Created attachment 20183 [details]
bug 18183- 10.5.2 Safe Boot crash log

crash when saving same webpage, this time under 10.5.2 with Safe Boot (startup holding "shift" key)

(I don't know if this is what you need? hope this helps... thanks)
Comment 9 nobody 2008-03-28 22:35:26 PDT
(In reply to comment #6)
> I see some haxies in your crashlog, and all such 3rd party extensions are
> unsupported.  Can you try removing them and then reproducing?
> 

uh-oh. I did not consciously install any haxies... I don't know where they came from or how to remove them.

Is this what I am trying to remove?:
com.unsanity.smartcrashreports
com.unsanity.menuextraenabler 1.0.3

yuck I feel like my computer has been infected

Comment 10 nobody 2008-03-28 23:12:54 PDT
Created attachment 20186 [details]
Bug 18183- no unsanity.txt

removed unsanity software
Comment 11 nobody 2008-03-28 23:14:01 PDT
Created attachment 20187 [details]
Bug 18183- no default folder.txt

removed Default Folder X
Comment 12 nobody 2008-03-28 23:15:02 PDT
Created attachment 20188 [details]
another crash log

another crash log

this bug is very repeatable for me
Comment 13 nobody 2008-03-28 23:41:01 PDT
The crash occurs when saving webpages that contain items that could not be loaded.

On my computer, some domains are filtered out.

 (in my case, these domain names point to localhost rather than their correct host IP address).

If the domains are unfiltered, allowed to resolve naturally without blocking the IP, and all items are loaded, then the page can be saved to .webarchive without crashing.

I have not tested what happens if the domain name or IP address is merely blocked, or if any item fails to load for any other reason.

So far it seems that this is consistent to explain which pages save correctly and which pages cause a crash when saving as .webarchive file

I hope this should help to pinpoint the nature of the bug...



Comment 14 nobody 2008-03-29 01:11:12 PDT
for example;

http://www.time.com/time/politics/article/0,8599,1725514,00.html

looking at Activity window, we see the page loads items from other domains, such as:

ad.doubleclick.net
ad.insightexpressai.com
an.tacoda.net
ar.atwola.com
bin.clearspring.com
cdn1.sphere.com

and so on

Comment 15 nobody 2008-03-29 07:44:25 PDT
Created attachment 20191 [details]
Bug 18183- screen capture

screen capture of Activity window shows items that are not loaded because "cannot connect to host" for certain domains.

this behavior is intentional, and expected to occur based on the custom configuration by user.

however, this condition seems to be the cause of crashing when attempting to save the webpage as .webarchive file.
Comment 16 nobody 2008-03-29 17:55:56 PDT
Created attachment 20202 [details]
hosts file (modified)

to block unwanted domains from loading, modify the hosts file as shown in the attachment

the hosts file is located at /private/etc/hosts on your Mac

note the filename is "hosts" and not "hosts.txt"

add a line to the hosts file such as "127.0.0.1 www.someunwanteddomainnamehere.com" 

you can add as many such lines as you wish

this blocks the unwanted domain by resolving to localhost instead of looking up the domain name in DNS.


HTH
Comment 17 Matt Lilek 2008-03-29 18:30:18 PDT
Modifying my hosts file still doesn't reproduce this crash - I wonder if this is Tiger-only?
Comment 18 nobody 2008-03-30 00:30:57 PDT
(In reply to comment #17)
> Modifying my hosts file still doesn't reproduce this crash - I wonder if this
> is Tiger-only?

I wondered about it, but Leopard crashes the same as Tiger  (for me)
Leopard crash log: http://bugs.webkit.org/attachment.cgi?id=20183 (posted previously)
Comment 19 nobody 2008-03-30 10:05:52 PDT
(In reply to comment #13)
> I have not tested what happens if the domain name or IP address is merely
> blocked, or if any item fails to load for any other reason.

I tried saving some web pages that contained items that were not loaded (because they were firewalled normally), and those pages saved OK.

Thus, I posted the hosts file info in case that might be helpful, thinking that the hosts blocking technique might be related to the crashing.

Comment 20 Matt Lilek 2008-03-30 10:37:08 PDT
What other web pages does this crash on for you?  Does it happen on something as simple as Google?
Comment 21 Matt Lilek 2008-03-30 10:39:28 PDT
(In reply to comment #20)
> What other web pages does this crash on for you?  Does it happen on something
> as simple as Google?
> 

Also, when you say some things were "firewalled normally" what exactly do you mean?  Are you going thru a proxy that blocks certain things (like ads) or how extensive is your hosts file?  Does this still happen when you have a straight thru, unfiltered connection to these sites?
Comment 22 nobody 2008-03-30 12:29:32 PDT
(In reply to comment #20)
> What other web pages does this crash on for you?  Does it happen on something
> as simple as Google?

no, Google saves OK. The Acid 3 page saves OK, also.

Drudge Report (a simple page) with no blocking saves OK.
but Drudge Report with the adgardener.com domains blocked in hosts crashes.

The www.time.com page above behaves the same way.

So far, every page without hosts blocking saves OK.
Every page that has crashed had items from domains that were blocked in hosts.

I found a page that has hosts blocking but saves OK, though:
http://www.cnn.com/2008/US/03/30/dith.pran.obit.ap/index.html with servedby.advertising.com and view.atmdt.com blocked by hosts saves OK. These blocked items are shown in Activity folded under outline triangles, I don't know if that means anything or makes a difference.
Comment 23 nobody 2008-03-30 12:45:00 PDT
(In reply to comment #21)

If some item(s) is not loaded (can't connect, or whatever) due to "natural causes", the page saves OK.

If some item(s) is not loaded due to hosts blocking, saving the page to .webarchive format crashes (almost always).

If some item(s) is not loaded due to firewall blocking (tried Little Snitch to block specific domains), the page saves OK. 

My hosts file lists approx 50 domains to block. But the results were the same when I tested it with just one domain, in the attempt to test the case which I hoped you and others might be able to repeat.



Comment 24 Brady Eidson 2008-03-30 17:06:58 PDT
Sorry I haven't responded to this one in a few days.

While I still can't reproduce under any circumstances, I finally got a chance to look at the code - and it's a simple null dereference.
Comment 25 Brady Eidson 2008-03-30 17:09:42 PDT
Created attachment 20227 [details]
Proposed fix (no layout test...)

Attached the obvious fix - but since I can't repro, I don't know how to make a layout test for this...
Comment 26 nobody 2008-03-31 07:20:52 PDT
(In reply to comment #25)
> Attached the obvious fix - but since I can't repro, I don't know how to make a
> layout test for this...

thanks!

is it possible to download a build containing this fix?  i would be happy to give it a try.
Comment 27 Matt Lilek 2008-03-31 07:25:56 PDT
(In reply to comment #26)
> (In reply to comment #25)
> > Attached the obvious fix - but since I can't repro, I don't know how to make a
> > layout test for this...
> 
> thanks!
> 
> is it possible to download a build containing this fix?  i would be happy to
> give it a try.
> 

Not yet - the patch should be reviewed and landed today and will hopefully appear in a nightly soon.
Comment 28 mitz 2008-03-31 10:02:41 PDT
Comment on attachment 20227 [details]
Proposed fix (no layout test...)

r=me
Comment 29 Brady Eidson 2008-03-31 10:07:20 PDT
Landed in r31467
Comment 30 nobody 2008-04-01 15:44:33 PDT
tested r31535 

the crashing problem is gone (as expected). thanks!


but blocked items are not appearing in the Activity window. (a small percentage of blocked items are shown, however).

also, the Status bar is underreporting errors, because blocked items that do not appear in the Activity window are not reported as errors.


a few examples:

on www.drudgereport.com:
everything blocked other than drudgereport.com and d.yimg.com
harvest.adgardener.com items appear in Activity correctly ("can't connect to host") and is reported correctly as an error in Status Bar. 
other blocked domains do not appear in Activity and are not counted as errors.

on www.cnn.com:
mostly everything blocked, www.cnn.com, i.cdn.turner.com, i.l.cnn.net not blocked.
metrics.cnn.com is blocked- it is the only blocked domain that appears in Activity window and counted as error.

http://www.time.com/time/politics/article/0,8599,1725514,00.html :
everything blocked except www.time.com and img.timeinc.net
no blocked items appear in Activity window, no errors






Comment 31 nobody 2008-04-01 15:48:53 PDT
Created attachment 20274 [details]
hosts.txt

here is sample hosts file 

contains domains blocked for examples described in #30 above.
Comment 32 Brady Eidson 2008-04-01 17:14:14 PDT
You are now describing a completely different issue.  If you could, please write up a new bug with the new issue.

Also, please make sure you compare the behavior of shipping Safari 3.1 to the latest nightly to see if they differ.

Thanks!
Comment 33 nobody 2008-04-01 17:20:47 PDT
(In reply to comment #32)
>  If you could, please
> write up a new bug with the new issue.

OK, will do...  

Thanks!


(In reply to comment #32)
> Also, please make sure you compare the behavior of shipping Safari 3.1 to the
> latest nightly to see if they differ.

Yes, Safari 3.1 reports the Activity ("can't connect to host") correctly.
Comment 34 nobody 2008-04-01 18:11:35 PDT
> write up a new bug with the new issue.

added as bug #18267