Bug 30303 - <img src=""> requests main document resource unnecessarily
: <img src=""> requests main document resource unnecessarily
Status: RESOLVED FIXED
: WebKit
Page Loading
: 528+ (Nightly build)
: All All
: P3 Minor
Assigned To:
:
:
:
:
  Show dependency treegraph
 
Reported: 2009-10-12 04:06 PST by
Modified: 2011-12-14 05:03 PST (History)


Attachments
patch (30.33 KB, patch)
2010-06-30 10:19 PST, Nate Chapin
ap: review-
Review Patch | Details | Formatted Diff | Diff
patch #2 (30.56 KB, patch)
2011-08-30 14:20 PST, Nate Chapin
no flags Review Patch | Details | Formatted Diff | Diff
Patch for landing (31.01 KB, patch)
2011-08-31 11:10 PST, Nate Chapin
no flags Review Patch | Details | Formatted Diff | Diff


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2009-10-12 04:06:27 PST
When Safari isnt able to load an image at first page load (reason can be that no filename is provided in HTML <img src=""> or a wrong filename) it automatically reloads the whole page, not the images only, to get the images in a second try. For a normal static HTML page no problem.
But lets see what happens when a form with a POST- variable is used:

Safari - or Webkit, I dont know - reloads the page but <b>without</b> a value for the POST variable. This can generate a second order in an online-shop. There is no alert of Safari asking if a reload shall be made.
And, instead of displaying the new version of the page Safari/Webkit combines the first loaded HTML (from its disc cache) with the graphics from the second try.

Its not really a bug but its not nice either. I would suggest no automatic reload when POST or GET variables are used as a first fix?

You can see Safaris/Webkits reload when looking at the browser window. First time the image is missing, after unsuccessful auto reload the missing graphic symbol is shown.
------- Comment #1 From 2009-10-12 11:20:08 PST -------
Safari doesn't ever reload pages automatically, certainly not when images are missing. If you see weird behavior on some or all sites, please describe what you are doing and seeing step by step, and re-open the bug. If you have a test case, that would be even more helpful.
------- Comment #2 From 2009-10-13 04:58:26 PST -------
With the help of PHP-Aquarium yesterday I was able to isolate and investigate Safaris behaviour. Unfortunately today after downloading the latest nightly r49434 and restart I can not reproduce this bahaviour any more. Not in Safari 4.0.3 and not in the nightly. So for now this bug can be closed.
------- Comment #3 From 2009-10-13 05:10:04 PST -------
Now I can repdroduce it but have to program something to let you see/experience the same thing.
------- Comment #4 From 2009-10-13 08:06:49 PST -------
Place this php file on a webserver:

<?php
if(isset($_GET['pfad'])) $pfad = $_GET['pfad']; else $pfad = '';
$datei = 'autoreload.txt';
if(!file_exists($datei))
 {
  $dh = fopen($datei,'w');
  flock($dh,LOCK_EX);
  fwrite($dh,'');
  fclose($dh);
 }
$dh = fopen($datei,'r+');
flock($dh,LOCK_EX);
$liste = fread($dh,filesize($datei));
$term = explode("\n",$liste);
if($liste == '' OR count($term) > 20) $liste = time();
else $liste .= "\n".time();
rewind($dh);
ftruncate($dh,0);
fwrite($dh,$liste);
fclose($dh);
$liste = explode("\n",$liste);
$anz = count($liste);
echo '<HTML>
<HEAD>
</HEAD>
<BODY>
 <P>Hier ist die Grafik ohne Pfadangabe <img src="'.$pfad.'">.</P>
 <P>Hier kommen die Aufrufzeitpunkte:<br>'."\n";
for($i = 0; $i < $anz; $i++) echo date('j.n.Y g:i:s',$liste[$i])."<br>\n";
echo '</P>
</BODY>
</HTML>';
?>

Opening this file with Safari with pfad=xyz adds one access time per call.
Opening this file with Safari with pfad= adds two access times per call.

The only thing we have to investigate now is, what is necessary to bring Safari in such a behaviour.
------- Comment #5 From 2009-10-13 08:28:02 PST -------
When in "autoreload mode" turning off JavaScript makes no difference. I know that Javascript is not used, just to isolate more what has an effect on this strange behaviour.
Java already was off.
Yesterday "autoreload mode" was seen before and today now after "Security Update 2009-005 (PowerPC)" (1.0).
------- Comment #6 From 2009-10-13 08:40:31 PST -------
I can not bring Firefox 3.5.1 in this mode.
------- Comment #7 From 2009-10-13 08:59:36 PST -------
Thank you for the nice test case, I see the issue now.

Here is what happens: <img src=""> has a relative source which resolves to main document's URL for loading. This is in accordance with normal relative URL resolution rules, and this is the same in Firefox, for example. You can check images[0].src from JavaScript or from Web Inspector/Firebug to see this.

So, a document whose URL is "http://127.0.0.1:8000/test.php?pfad=" contains an image with an identical URL, "http://127.0.0.1:8000/test.php?pfad=". Naturally, the browser tries to load it, and then image decoding fails, because it's actually HTML, and not any image type.

I do not think that Safari/WebKit behavior is buggy in any way - it just does what the author asked for. But Firefox seems to have an optimization that prevents this attempt to load main resource as an image, and it probably wouldn't hurt to add such an optimization to WebKit. We should check how this works for other types of resources, such as CSS stylesheets.
------- Comment #8 From 2009-10-13 09:38:31 PST -------
Wow! What an easy explanation! I wouldnt have thought that src= nothing means src=self.
I have tried it with iframe, embed and object yesterday to understand if it is specific to img or not. They didnt work.

Ok, its not a bug but maybe not good?
------- Comment #9 From 2009-11-26 20:56:11 PST -------
*** Bug 31922 has been marked as a duplicate of this bug. ***
------- Comment #10 From 2009-11-26 20:57:21 PST -------
Per the duplicate, this also affects <link> and <script>.
------- Comment #11 From 2009-11-26 22:00:05 PST -------
Despite obeying normative URL rules, I still think this is a bug. I can't think of a reason why anyone would write <img src="">, <script src="">, or <link href=""> except by accident (most likely, the failing of a dynamically-created page). I'd submit that any empty string for an external resource is an invalid URI and therefore should be ignored. There are enough alternatives that if the author intends to load the current page, it can still easy be done, such as <img src="/"> or <img src="#">.

<iframe src=""> ignores the empty source in WebKit currently, so it seems that this behavior isn't evenly applied even within WebKit. Opera doesn't have this problem with <img>, <script>, or <link>. Firefox behaves the same as WebKit. IE only has this problem for <img> but not for <script> or <link>.

This issue causes a lot of problems for developers, so it would be good to address it.
------- Comment #12 From 2009-11-27 00:11:42 PST -------
I agree that it would likely be good to change this behavior (and we track this as an open bug). But you are saying that this causes actual problems - could you please give some examples?

Thank you for the extensive research of other browsers' behavior, it's very useful.
------- Comment #13 From 2009-11-27 11:49:22 PST -------
Absolutely. I can give a few problem scenarios.

The first is very easy. If you have one of these offending examples on the page, it doubles your traffic (two triples your traffic, etc.). This isn't a big deal for small sites, but for large sites that have millions of page views per day, this becomes a serious capacity issue. Standard server-side filters are well-equipped to deal with bots and DOS attacks, but not for this type of traffic which, for all intents and purposes, comes across as normal traffic and so is very hard to filter out.

The second problem is related to the first and has to do with measurement. Page views are taken as one measurement of a site's traffic (together with unique users). In reporting how popular a site is, companies report both page views and unique users. Even if unique users remains the same and page views increase, the site is said to be gaining popularity because each unique user is coming back more frequently. If you have one of these patterns on your page, you artificially inflate the page view number, which can really cause a lot of trouble for companies from a reporting point of view.

The third problem is user state corruption. This happens because even though the response is thrown away for <img src="">, the cookies that come along with it are honored. I personally ran into this issue with a signed-out state of a page. If we detected that the user is signed out, we'd put them through a setup step, which alters the cookies via JavaScript. However, a <img src=""> ended up injected into the page, which created another / request, which brought along the default signed-out cookies. The user's changes were lost.
------- Comment #14 From 2010-03-09 17:18:38 PST -------
HTML5 has now been updated explicitly stating that empty URLs should not be fetched in this case:

http://html5.org/tools/web-apps-tracker?from=4833&to=4834

Can this be fixed now?
------- Comment #15 From 2010-06-30 10:19:13 PST -------
Created an attachment (id=60129) [details]
patch

This patch changes the results of a few existing layout tests, because they assumed empty urls were resolved and loaded.

Included in this change, I converted the following fast/tokenizer/ tests to dumpAsText() tests: 002.html, external-script-document-write_2.html, script_extra_close.html.  The substantive change in their expected results is the removal of a SyntaxError console message, which used to occur because the tests have a <script src=""> tag and the current document can't be parsed as valid JS.
------- Comment #16 From 2010-06-30 11:52:58 PST -------
(From update of attachment 60129 [details])
Could you please investigate and list what exactly changes with this patch?

From the tests, I can see that <script src="">,<img src="">, <link rel="stylesheet" href=""> are affected. What about other types of links, appcache manifest, iframes, applets, embeds, objects, audio, video, video poster, processing instructions?

The new tests don't check how load fails. Will there be an error event dispatched?

r-, because we clearly need onerror testing, and further review is complicated by lack of information about expected behavior changes. One useful thing to investigate would be checking mailing list discussions that led to this spec change, and related Mozilla bugs, if any.
------- Comment #17 From 2010-06-30 13:45:48 PST -------
(In reply to comment #16)
> (From update of attachment 60129 [details] [details])
> Could you please investigate and list what exactly changes with this patch?
> 
> From the tests, I can see that <script src="">,<img src="">, <link rel="stylesheet" href=""> are affected. What about other types of links, appcache manifest, iframes, applets, embeds, objects, audio, video, video poster, processing instructions?
> 
> The new tests don't check how load fails. Will there be an error event dispatched?
> 
> r-, because we clearly need onerror testing, and further review is complicated by lack of information about expected behavior changes. One useful thing to investigate would be checking mailing list discussions that led to this spec change, and related Mozilla bugs, if any.

Good points.  I'll do some more research and see if I can get this straightened out.
------- Comment #18 From 2010-07-12 11:36:21 PST -------
FWIW, we have bug 42001 tracking media elements specifically.
------- Comment #19 From 2010-07-12 14:18:42 PST -------
(From update of attachment 60129 [details])
This patch changes loading behavior without changing the corresponding DOM reflected attributes. Both should change.

Bug 42087 has my cut at a similar change, and my attempt does deal with the DOM reflected attributes, but I believe it has other defects.
------- Comment #20 From 2010-07-12 14:19:13 PST -------
(From update of attachment 60129 [details])
This patch doesn't include a fix for fast/images/load-img-with-empty-src.html, which is a test that will be affected by this change.
------- Comment #21 From 2011-06-20 02:11:45 PST -------
Hey guys,

as of today June 20th, 2011 this bug is still hanging around.

Are there any news on this? Nate, Darin?

If I understand well, there are still 2 things missing in the proposed patch:
- Changing the DOM reflected attributes (btw, what does "reflected attributes" means exactly?) that have their loading behavior changed.
- A fix for the fast/images/load-img-with-empty-src.html test which is affected by the patch.

Thanks in advance for the reply!

Please note: I don't know the Webkit code at all, but I'm eager to help if needed.

(In reply to comment #20)
> (From update of attachment 60129 [details] [details])
> This patch doesn't include a fix for fast/images/load-img-with-empty-src.html, which is a test that will be affected by this change.
------- Comment #22 From 2011-08-30 14:20:17 PST -------
Created an attachment (id=105692) [details]
patch #2

Sorry for letting this sit for so long.

I condensed my test cases into a single layout test that tries empty urls with a whole bunch of different tags (I compiled my list from the whatwg thread that started with http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-December/024357.html).  The test also enforces the specced error event behaviors for those tags.
------- Comment #23 From 2011-08-30 15:00:29 PST -------
(From update of attachment 105692 [details])
View in context: https://bugs.webkit.org/attachment.cgi?id=105692&action=review

> Source/WebCore/dom/ScriptElement.cpp:259
> +    if (!stripLeadingAndTrailingHTMLSpaces(sourceUrl).isEmpty()) {
> +        ResourceRequest request(m_element->document()->completeURL(sourceUrl));

The empty check strips leading and trailing space. But the code that actually passes the URL to completeURL does not. That’s definitely a bug, although clearly not the bug you are fixing here. That bug needs to be fixed too.

> Source/WebCore/loader/ImageLoader.cpp:172
> +        if (stripLeadingAndTrailingHTMLSpaces(attr).isEmpty()) {
> +            m_element->dispatchEvent(Event::create(eventNames().errorEvent, false, false));
> +            return;
> +        }
> +
>          ResourceRequest request = ResourceRequest(document->completeURL(sourceURI(attr)));

It’s not good style that the emptiness check uses one strip of the leading and trailing HTML spaces, and the sourceURI function does a second strip. The code should be organized so that the work is done only once.
------- Comment #24 From 2011-08-30 15:01:32 PST -------
(From update of attachment 105692 [details])
Would be much better if the test cases also covered cases other than src="", such as src=" " or other types of whitespace.
------- Comment #25 From 2011-08-30 16:31:37 PST -------
(From update of attachment 105692 [details])
Rejecting attachment 105692 [details] from commit-queue.

Failed to run "['/mnt/git/webkit-commit-queue/Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '-..." exit_code: 2

Last 500 characters of output:
Image.image.incomplete.empty.html = TEXT

Regressions: Unexpected image mismatch : (5)
  fast/text/atsui-multiple-renderers.html = IMAGE
  fast/text/international/danda-space.html = IMAGE
  fast/text/international/thai-baht-space.html = IMAGE
  fast/text/international/thai-line-breaks.html = IMAGE
  platform/chromium-linux/fast/text/international/complex-joining-using-gpos.html = IMAGE

Regressions: Unexpected image and text mismatch : (1)
  svg/custom/svg-fonts-word-spacing.html = IMAGE+TEXT



Full output: http://queues.webkit.org/results/9572368
------- Comment #26 From 2011-08-30 19:24:43 PST -------
(From update of attachment 105692 [details])
Attachment 105692 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/9572418

New failing tests:
canvas/philip/tests/2d.drawImage.image.incomplete.empty.html
------- Comment #27 From 2011-08-31 11:10:18 PST -------
Created an attachment (id=105797) [details]
Patch for landing
------- Comment #28 From 2011-08-31 12:00:17 PST -------
(From update of attachment 105797 [details])
Clearing flags on attachment: 105797

Committed r94213: <http://trac.webkit.org/changeset/94213>
------- Comment #29 From 2011-08-31 12:00:24 PST -------
All reviewed patches have been landed.  Closing bug.
------- Comment #30 From 2011-12-14 05:03:56 PST -------
Some cases missed in this change:
https://bugs.webkit.org/show_bug.cgi?id=74500: URLs like "#" should not be loaded as well.
https://bugs.webkit.org/show_bug.cgi?id=69211: url() in CSS.