Bug 30303

Summary: <img src=""> requests main document resource unnecessarily
Product: WebKit Reporter: Rüdiger Cordes <rc>
Component: Page LoadingAssignee: Nate Chapin <japhet>
Status: RESOLVED FIXED    
Severity: Minor CC: ap, dglazkov, kangax, mathias, paulirish, remy, vsevik, wanliyou, webkit, webkit.review.bot
Priority: P3    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Attachments:
Description Flags
patch
ap: review-
patch #2
none
Patch for landing none

Description Rüdiger Cordes 2009-10-12 04:06:27 PDT
When Safari isnt able to load an image at first page load (reason can be that no filename is provided in HTML <img src=""> or a wrong filename) it automatically reloads the whole page, not the images only, to get the images in a second try. For a normal static HTML page no problem.
But lets see what happens when a form with a POST- variable is used:

Safari - or Webkit, I dont know - reloads the page but <b>without</b> a value for the POST variable. This can generate a second order in an online-shop. There is no alert of Safari asking if a reload shall be made.
And, instead of displaying the new version of the page Safari/Webkit combines the first loaded HTML (from its disc cache) with the graphics from the second try.

Its not really a bug but its not nice either. I would suggest no automatic reload when POST or GET variables are used as a first fix?

You can see Safaris/Webkits reload when looking at the browser window. First time the image is missing, after unsuccessful auto reload the missing graphic symbol is shown.
Comment 1 Alexey Proskuryakov 2009-10-12 11:20:08 PDT
Safari doesn't ever reload pages automatically, certainly not when images are missing. If you see weird behavior on some or all sites, please describe what you are doing and seeing step by step, and re-open the bug. If you have a test case, that would be even more helpful.
Comment 2 Rüdiger Cordes 2009-10-13 04:58:26 PDT
With the help of PHP-Aquarium yesterday I was able to isolate and investigate Safaris behaviour. Unfortunately today after downloading the latest nightly r49434 and restart I can not reproduce this bahaviour any more. Not in Safari 4.0.3 and not in the nightly. So for now this bug can be closed.
Comment 3 Rüdiger Cordes 2009-10-13 05:10:04 PDT
Now I can repdroduce it but have to program something to let you see/experience the same thing.
Comment 4 Rüdiger Cordes 2009-10-13 08:06:49 PDT
Place this php file on a webserver:

<?php
if(isset($_GET['pfad'])) $pfad = $_GET['pfad']; else $pfad = '';
$datei = 'autoreload.txt';
if(!file_exists($datei))
 {
  $dh = fopen($datei,'w');
  flock($dh,LOCK_EX);
  fwrite($dh,'');
  fclose($dh);
 }
$dh = fopen($datei,'r+');
flock($dh,LOCK_EX);
$liste = fread($dh,filesize($datei));
$term = explode("\n",$liste);
if($liste == '' OR count($term) > 20) $liste = time();
else $liste .= "\n".time();
rewind($dh);
ftruncate($dh,0);
fwrite($dh,$liste);
fclose($dh);
$liste = explode("\n",$liste);
$anz = count($liste);
echo '<HTML>
<HEAD>
</HEAD>
<BODY>
 <P>Hier ist die Grafik ohne Pfadangabe <img src="'.$pfad.'">.</P>
 <P>Hier kommen die Aufrufzeitpunkte:<br>'."\n";
for($i = 0; $i < $anz; $i++) echo date('j.n.Y g:i:s',$liste[$i])."<br>\n";
echo '</P>
</BODY>
</HTML>';
?>

Opening this file with Safari with pfad=xyz adds one access time per call.
Opening this file with Safari with pfad= adds two access times per call.

The only thing we have to investigate now is, what is necessary to bring Safari in such a behaviour.
Comment 5 Rüdiger Cordes 2009-10-13 08:28:02 PDT
When in "autoreload mode" turning off JavaScript makes no difference. I know that Javascript is not used, just to isolate more what has an effect on this strange behaviour.
Java already was off.
Yesterday "autoreload mode" was seen before and today now after "Security Update 2009-005 (PowerPC)" (1.0).
Comment 6 Rüdiger Cordes 2009-10-13 08:40:31 PDT
I can not bring Firefox 3.5.1 in this mode.
Comment 7 Alexey Proskuryakov 2009-10-13 08:59:36 PDT
Thank you for the nice test case, I see the issue now.

Here is what happens: <img src=""> has a relative source which resolves to main document's URL for loading. This is in accordance with normal relative URL resolution rules, and this is the same in Firefox, for example. You can check images[0].src from JavaScript or from Web Inspector/Firebug to see this.

So, a document whose URL is "http://127.0.0.1:8000/test.php?pfad=" contains an image with an identical URL, "http://127.0.0.1:8000/test.php?pfad=". Naturally, the browser tries to load it, and then image decoding fails, because it's actually HTML, and not any image type.

I do not think that Safari/WebKit behavior is buggy in any way - it just does what the author asked for. But Firefox seems to have an optimization that prevents this attempt to load main resource as an image, and it probably wouldn't hurt to add such an optimization to WebKit. We should check how this works for other types of resources, such as CSS stylesheets.
Comment 8 Rüdiger Cordes 2009-10-13 09:38:31 PDT
Wow! What an easy explanation! I wouldnt have thought that src= nothing means src=self.
I have tried it with iframe, embed and object yesterday to understand if it is specific to img or not. They didnt work.

Ok, its not a bug but maybe not good?
Comment 9 Alexey Proskuryakov 2009-11-26 20:56:11 PST
*** Bug 31922 has been marked as a duplicate of this bug. ***
Comment 10 Alexey Proskuryakov 2009-11-26 20:57:21 PST
Per the duplicate, this also affects <link> and <script>.
Comment 11 Nicholas C. Zakas 2009-11-26 22:00:05 PST
Despite obeying normative URL rules, I still think this is a bug. I can't think of a reason why anyone would write <img src="">, <script src="">, or <link href=""> except by accident (most likely, the failing of a dynamically-created page). I'd submit that any empty string for an external resource is an invalid URI and therefore should be ignored. There are enough alternatives that if the author intends to load the current page, it can still easy be done, such as <img src="/"> or <img src="#">.

<iframe src=""> ignores the empty source in WebKit currently, so it seems that this behavior isn't evenly applied even within WebKit. Opera doesn't have this problem with <img>, <script>, or <link>. Firefox behaves the same as WebKit. IE only has this problem for <img> but not for <script> or <link>.

This issue causes a lot of problems for developers, so it would be good to address it.
Comment 12 Alexey Proskuryakov 2009-11-27 00:11:42 PST
I agree that it would likely be good to change this behavior (and we track this as an open bug). But you are saying that this causes actual problems - could you please give some examples?

Thank you for the extensive research of other browsers' behavior, it's very useful.
Comment 13 Nicholas C. Zakas 2009-11-27 11:49:22 PST
Absolutely. I can give a few problem scenarios.

The first is very easy. If you have one of these offending examples on the page, it doubles your traffic (two triples your traffic, etc.). This isn't a big deal for small sites, but for large sites that have millions of page views per day, this becomes a serious capacity issue. Standard server-side filters are well-equipped to deal with bots and DOS attacks, but not for this type of traffic which, for all intents and purposes, comes across as normal traffic and so is very hard to filter out.

The second problem is related to the first and has to do with measurement. Page views are taken as one measurement of a site's traffic (together with unique users). In reporting how popular a site is, companies report both page views and unique users. Even if unique users remains the same and page views increase, the site is said to be gaining popularity because each unique user is coming back more frequently. If you have one of these patterns on your page, you artificially inflate the page view number, which can really cause a lot of trouble for companies from a reporting point of view.

The third problem is user state corruption. This happens because even though the response is thrown away for <img src="">, the cookies that come along with it are honored. I personally ran into this issue with a signed-out state of a page. If we detected that the user is signed out, we'd put them through a setup step, which alters the cookies via JavaScript. However, a <img src=""> ended up injected into the page, which created another / request, which brought along the default signed-out cookies. The user's changes were lost.
Comment 14 Nicholas C. Zakas 2010-03-09 17:18:38 PST
HTML5 has now been updated explicitly stating that empty URLs should not be fetched in this case:

http://html5.org/tools/web-apps-tracker?from=4833&to=4834

Can this be fixed now?
Comment 15 Nate Chapin 2010-06-30 10:19:13 PDT
Created attachment 60129 [details]
patch

This patch changes the results of a few existing layout tests, because they assumed empty urls were resolved and loaded.

Included in this change, I converted the following fast/tokenizer/ tests to dumpAsText() tests: 002.html, external-script-document-write_2.html, script_extra_close.html.  The substantive change in their expected results is the removal of a SyntaxError console message, which used to occur because the tests have a <script src=""> tag and the current document can't be parsed as valid JS.
Comment 16 Alexey Proskuryakov 2010-06-30 11:52:58 PDT
Comment on attachment 60129 [details]
patch

Could you please investigate and list what exactly changes with this patch?

From the tests, I can see that <script src="">,<img src="">, <link rel="stylesheet" href=""> are affected. What about other types of links, appcache manifest, iframes, applets, embeds, objects, audio, video, video poster, processing instructions?

The new tests don't check how load fails. Will there be an error event dispatched?

r-, because we clearly need onerror testing, and further review is complicated by lack of information about expected behavior changes. One useful thing to investigate would be checking mailing list discussions that led to this spec change, and related Mozilla bugs, if any.
Comment 17 Nate Chapin 2010-06-30 13:45:48 PDT
(In reply to comment #16)
> (From update of attachment 60129 [details])
> Could you please investigate and list what exactly changes with this patch?
> 
> From the tests, I can see that <script src="">,<img src="">, <link rel="stylesheet" href=""> are affected. What about other types of links, appcache manifest, iframes, applets, embeds, objects, audio, video, video poster, processing instructions?
> 
> The new tests don't check how load fails. Will there be an error event dispatched?
> 
> r-, because we clearly need onerror testing, and further review is complicated by lack of information about expected behavior changes. One useful thing to investigate would be checking mailing list discussions that led to this spec change, and related Mozilla bugs, if any.

Good points.  I'll do some more research and see if I can get this straightened out.
Comment 18 Alexey Proskuryakov 2010-07-12 11:36:21 PDT
FWIW, we have bug 42001 tracking media elements specifically.
Comment 19 Darin Adler 2010-07-12 14:18:42 PDT
Comment on attachment 60129 [details]
patch

This patch changes loading behavior without changing the corresponding DOM reflected attributes. Both should change.

Bug 42087 has my cut at a similar change, and my attempt does deal with the DOM reflected attributes, but I believe it has other defects.
Comment 20 Darin Adler 2010-07-12 14:19:13 PDT
Comment on attachment 60129 [details]
patch

This patch doesn't include a fix for fast/images/load-img-with-empty-src.html, which is a test that will be affected by this change.
Comment 21 Rémy Coutable 2011-06-20 02:11:45 PDT
Hey guys,

as of today June 20th, 2011 this bug is still hanging around.

Are there any news on this? Nate, Darin?

If I understand well, there are still 2 things missing in the proposed patch:
- Changing the DOM reflected attributes (btw, what does "reflected attributes" means exactly?) that have their loading behavior changed.
- A fix for the fast/images/load-img-with-empty-src.html test which is affected by the patch.

Thanks in advance for the reply!

Please note: I don't know the Webkit code at all, but I'm eager to help if needed.

(In reply to comment #20)
> (From update of attachment 60129 [details])
> This patch doesn't include a fix for fast/images/load-img-with-empty-src.html, which is a test that will be affected by this change.
Comment 22 Nate Chapin 2011-08-30 14:20:17 PDT
Created attachment 105692 [details]
patch #2

Sorry for letting this sit for so long.

I condensed my test cases into a single layout test that tries empty urls with a whole bunch of different tags (I compiled my list from the whatwg thread that started with http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-December/024357.html).  The test also enforces the specced error event behaviors for those tags.
Comment 23 Darin Adler 2011-08-30 15:00:29 PDT
Comment on attachment 105692 [details]
patch #2

View in context: https://bugs.webkit.org/attachment.cgi?id=105692&action=review

> Source/WebCore/dom/ScriptElement.cpp:259
> +    if (!stripLeadingAndTrailingHTMLSpaces(sourceUrl).isEmpty()) {
> +        ResourceRequest request(m_element->document()->completeURL(sourceUrl));

The empty check strips leading and trailing space. But the code that actually passes the URL to completeURL does not. That’s definitely a bug, although clearly not the bug you are fixing here. That bug needs to be fixed too.

> Source/WebCore/loader/ImageLoader.cpp:172
> +        if (stripLeadingAndTrailingHTMLSpaces(attr).isEmpty()) {
> +            m_element->dispatchEvent(Event::create(eventNames().errorEvent, false, false));
> +            return;
> +        }
> +
>          ResourceRequest request = ResourceRequest(document->completeURL(sourceURI(attr)));

It’s not good style that the emptiness check uses one strip of the leading and trailing HTML spaces, and the sourceURI function does a second strip. The code should be organized so that the work is done only once.
Comment 24 Darin Adler 2011-08-30 15:01:32 PDT
Comment on attachment 105692 [details]
patch #2

Would be much better if the test cases also covered cases other than src="", such as src=" " or other types of whitespace.
Comment 25 WebKit Review Bot 2011-08-30 16:31:37 PDT
Comment on attachment 105692 [details]
patch #2

Rejecting attachment 105692 [details] from commit-queue.

Failed to run "['/mnt/git/webkit-commit-queue/Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '-..." exit_code: 2

Last 500 characters of output:
Image.image.incomplete.empty.html = TEXT

Regressions: Unexpected image mismatch : (5)
  fast/text/atsui-multiple-renderers.html = IMAGE
  fast/text/international/danda-space.html = IMAGE
  fast/text/international/thai-baht-space.html = IMAGE
  fast/text/international/thai-line-breaks.html = IMAGE
  platform/chromium-linux/fast/text/international/complex-joining-using-gpos.html = IMAGE

Regressions: Unexpected image and text mismatch : (1)
  svg/custom/svg-fonts-word-spacing.html = IMAGE+TEXT



Full output: http://queues.webkit.org/results/9572368
Comment 26 WebKit Review Bot 2011-08-30 19:24:43 PDT
Comment on attachment 105692 [details]
patch #2

Attachment 105692 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/9572418

New failing tests:
canvas/philip/tests/2d.drawImage.image.incomplete.empty.html
Comment 27 Nate Chapin 2011-08-31 11:10:18 PDT
Created attachment 105797 [details]
Patch for landing
Comment 28 WebKit Review Bot 2011-08-31 12:00:17 PDT
Comment on attachment 105797 [details]
Patch for landing

Clearing flags on attachment: 105797

Committed r94213: <http://trac.webkit.org/changeset/94213>
Comment 29 WebKit Review Bot 2011-08-31 12:00:24 PDT
All reviewed patches have been landed.  Closing bug.
Comment 30 Vsevolod Vlasov 2011-12-14 05:03:56 PST
Some cases missed in this change:
https://bugs.webkit.org/show_bug.cgi?id=74500: URLs like "#" should not be loaded as well.
https://bugs.webkit.org/show_bug.cgi?id=69211: url() in CSS.