Summary: | <img src=""> requests main document resource unnecessarily | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Rüdiger Cordes <rc> | ||||||||
Component: | Page Loading | Assignee: | Nate Chapin <japhet> | ||||||||
Status: | RESOLVED FIXED | ||||||||||
Severity: | Minor | CC: | ap, dglazkov, kangax, mathias, paulirish, remy, vsevik, wanliyou, webkit, webkit.review.bot | ||||||||
Priority: | P3 | ||||||||||
Version: | 528+ (Nightly build) | ||||||||||
Hardware: | All | ||||||||||
OS: | All | ||||||||||
Attachments: |
|
Description
Rüdiger Cordes
2009-10-12 04:06:27 PDT
Safari doesn't ever reload pages automatically, certainly not when images are missing. If you see weird behavior on some or all sites, please describe what you are doing and seeing step by step, and re-open the bug. If you have a test case, that would be even more helpful. With the help of PHP-Aquarium yesterday I was able to isolate and investigate Safaris behaviour. Unfortunately today after downloading the latest nightly r49434 and restart I can not reproduce this bahaviour any more. Not in Safari 4.0.3 and not in the nightly. So for now this bug can be closed. Now I can repdroduce it but have to program something to let you see/experience the same thing. Place this php file on a webserver: <?php if(isset($_GET['pfad'])) $pfad = $_GET['pfad']; else $pfad = ''; $datei = 'autoreload.txt'; if(!file_exists($datei)) { $dh = fopen($datei,'w'); flock($dh,LOCK_EX); fwrite($dh,''); fclose($dh); } $dh = fopen($datei,'r+'); flock($dh,LOCK_EX); $liste = fread($dh,filesize($datei)); $term = explode("\n",$liste); if($liste == '' OR count($term) > 20) $liste = time(); else $liste .= "\n".time(); rewind($dh); ftruncate($dh,0); fwrite($dh,$liste); fclose($dh); $liste = explode("\n",$liste); $anz = count($liste); echo '<HTML> <HEAD> </HEAD> <BODY> <P>Hier ist die Grafik ohne Pfadangabe <img src="'.$pfad.'">.</P> <P>Hier kommen die Aufrufzeitpunkte:<br>'."\n"; for($i = 0; $i < $anz; $i++) echo date('j.n.Y g:i:s',$liste[$i])."<br>\n"; echo '</P> </BODY> </HTML>'; ?> Opening this file with Safari with pfad=xyz adds one access time per call. Opening this file with Safari with pfad= adds two access times per call. The only thing we have to investigate now is, what is necessary to bring Safari in such a behaviour. When in "autoreload mode" turning off JavaScript makes no difference. I know that Javascript is not used, just to isolate more what has an effect on this strange behaviour. Java already was off. Yesterday "autoreload mode" was seen before and today now after "Security Update 2009-005 (PowerPC)" (1.0). I can not bring Firefox 3.5.1 in this mode. Thank you for the nice test case, I see the issue now. Here is what happens: <img src=""> has a relative source which resolves to main document's URL for loading. This is in accordance with normal relative URL resolution rules, and this is the same in Firefox, for example. You can check images[0].src from JavaScript or from Web Inspector/Firebug to see this. So, a document whose URL is "http://127.0.0.1:8000/test.php?pfad=" contains an image with an identical URL, "http://127.0.0.1:8000/test.php?pfad=". Naturally, the browser tries to load it, and then image decoding fails, because it's actually HTML, and not any image type. I do not think that Safari/WebKit behavior is buggy in any way - it just does what the author asked for. But Firefox seems to have an optimization that prevents this attempt to load main resource as an image, and it probably wouldn't hurt to add such an optimization to WebKit. We should check how this works for other types of resources, such as CSS stylesheets. Wow! What an easy explanation! I wouldnt have thought that src= nothing means src=self. I have tried it with iframe, embed and object yesterday to understand if it is specific to img or not. They didnt work. Ok, its not a bug but maybe not good? *** Bug 31922 has been marked as a duplicate of this bug. *** Per the duplicate, this also affects <link> and <script>. Despite obeying normative URL rules, I still think this is a bug. I can't think of a reason why anyone would write <img src="">, <script src="">, or <link href=""> except by accident (most likely, the failing of a dynamically-created page). I'd submit that any empty string for an external resource is an invalid URI and therefore should be ignored. There are enough alternatives that if the author intends to load the current page, it can still easy be done, such as <img src="/"> or <img src="#">. <iframe src=""> ignores the empty source in WebKit currently, so it seems that this behavior isn't evenly applied even within WebKit. Opera doesn't have this problem with <img>, <script>, or <link>. Firefox behaves the same as WebKit. IE only has this problem for <img> but not for <script> or <link>. This issue causes a lot of problems for developers, so it would be good to address it. I agree that it would likely be good to change this behavior (and we track this as an open bug). But you are saying that this causes actual problems - could you please give some examples? Thank you for the extensive research of other browsers' behavior, it's very useful. Absolutely. I can give a few problem scenarios. The first is very easy. If you have one of these offending examples on the page, it doubles your traffic (two triples your traffic, etc.). This isn't a big deal for small sites, but for large sites that have millions of page views per day, this becomes a serious capacity issue. Standard server-side filters are well-equipped to deal with bots and DOS attacks, but not for this type of traffic which, for all intents and purposes, comes across as normal traffic and so is very hard to filter out. The second problem is related to the first and has to do with measurement. Page views are taken as one measurement of a site's traffic (together with unique users). In reporting how popular a site is, companies report both page views and unique users. Even if unique users remains the same and page views increase, the site is said to be gaining popularity because each unique user is coming back more frequently. If you have one of these patterns on your page, you artificially inflate the page view number, which can really cause a lot of trouble for companies from a reporting point of view. The third problem is user state corruption. This happens because even though the response is thrown away for <img src="">, the cookies that come along with it are honored. I personally ran into this issue with a signed-out state of a page. If we detected that the user is signed out, we'd put them through a setup step, which alters the cookies via JavaScript. However, a <img src=""> ended up injected into the page, which created another / request, which brought along the default signed-out cookies. The user's changes were lost. HTML5 has now been updated explicitly stating that empty URLs should not be fetched in this case: http://html5.org/tools/web-apps-tracker?from=4833&to=4834 Can this be fixed now? Created attachment 60129 [details]
patch
This patch changes the results of a few existing layout tests, because they assumed empty urls were resolved and loaded.
Included in this change, I converted the following fast/tokenizer/ tests to dumpAsText() tests: 002.html, external-script-document-write_2.html, script_extra_close.html. The substantive change in their expected results is the removal of a SyntaxError console message, which used to occur because the tests have a <script src=""> tag and the current document can't be parsed as valid JS.
Comment on attachment 60129 [details]
patch
Could you please investigate and list what exactly changes with this patch?
From the tests, I can see that <script src="">,<img src="">, <link rel="stylesheet" href=""> are affected. What about other types of links, appcache manifest, iframes, applets, embeds, objects, audio, video, video poster, processing instructions?
The new tests don't check how load fails. Will there be an error event dispatched?
r-, because we clearly need onerror testing, and further review is complicated by lack of information about expected behavior changes. One useful thing to investigate would be checking mailing list discussions that led to this spec change, and related Mozilla bugs, if any.
(In reply to comment #16) > (From update of attachment 60129 [details]) > Could you please investigate and list what exactly changes with this patch? > > From the tests, I can see that <script src="">,<img src="">, <link rel="stylesheet" href=""> are affected. What about other types of links, appcache manifest, iframes, applets, embeds, objects, audio, video, video poster, processing instructions? > > The new tests don't check how load fails. Will there be an error event dispatched? > > r-, because we clearly need onerror testing, and further review is complicated by lack of information about expected behavior changes. One useful thing to investigate would be checking mailing list discussions that led to this spec change, and related Mozilla bugs, if any. Good points. I'll do some more research and see if I can get this straightened out. FWIW, we have bug 42001 tracking media elements specifically. Comment on attachment 60129 [details] patch This patch changes loading behavior without changing the corresponding DOM reflected attributes. Both should change. Bug 42087 has my cut at a similar change, and my attempt does deal with the DOM reflected attributes, but I believe it has other defects. Comment on attachment 60129 [details]
patch
This patch doesn't include a fix for fast/images/load-img-with-empty-src.html, which is a test that will be affected by this change.
Hey guys, as of today June 20th, 2011 this bug is still hanging around. Are there any news on this? Nate, Darin? If I understand well, there are still 2 things missing in the proposed patch: - Changing the DOM reflected attributes (btw, what does "reflected attributes" means exactly?) that have their loading behavior changed. - A fix for the fast/images/load-img-with-empty-src.html test which is affected by the patch. Thanks in advance for the reply! Please note: I don't know the Webkit code at all, but I'm eager to help if needed. (In reply to comment #20) > (From update of attachment 60129 [details]) > This patch doesn't include a fix for fast/images/load-img-with-empty-src.html, which is a test that will be affected by this change. Created attachment 105692 [details] patch #2 Sorry for letting this sit for so long. I condensed my test cases into a single layout test that tries empty urls with a whole bunch of different tags (I compiled my list from the whatwg thread that started with http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-December/024357.html). The test also enforces the specced error event behaviors for those tags. Comment on attachment 105692 [details] patch #2 View in context: https://bugs.webkit.org/attachment.cgi?id=105692&action=review > Source/WebCore/dom/ScriptElement.cpp:259 > + if (!stripLeadingAndTrailingHTMLSpaces(sourceUrl).isEmpty()) { > + ResourceRequest request(m_element->document()->completeURL(sourceUrl)); The empty check strips leading and trailing space. But the code that actually passes the URL to completeURL does not. That’s definitely a bug, although clearly not the bug you are fixing here. That bug needs to be fixed too. > Source/WebCore/loader/ImageLoader.cpp:172 > + if (stripLeadingAndTrailingHTMLSpaces(attr).isEmpty()) { > + m_element->dispatchEvent(Event::create(eventNames().errorEvent, false, false)); > + return; > + } > + > ResourceRequest request = ResourceRequest(document->completeURL(sourceURI(attr))); It’s not good style that the emptiness check uses one strip of the leading and trailing HTML spaces, and the sourceURI function does a second strip. The code should be organized so that the work is done only once. Comment on attachment 105692 [details]
patch #2
Would be much better if the test cases also covered cases other than src="", such as src=" " or other types of whitespace.
Comment on attachment 105692 [details] patch #2 Rejecting attachment 105692 [details] from commit-queue. Failed to run "['/mnt/git/webkit-commit-queue/Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '-..." exit_code: 2 Last 500 characters of output: Image.image.incomplete.empty.html = TEXT Regressions: Unexpected image mismatch : (5) fast/text/atsui-multiple-renderers.html = IMAGE fast/text/international/danda-space.html = IMAGE fast/text/international/thai-baht-space.html = IMAGE fast/text/international/thai-line-breaks.html = IMAGE platform/chromium-linux/fast/text/international/complex-joining-using-gpos.html = IMAGE Regressions: Unexpected image and text mismatch : (1) svg/custom/svg-fonts-word-spacing.html = IMAGE+TEXT Full output: http://queues.webkit.org/results/9572368 Comment on attachment 105692 [details] patch #2 Attachment 105692 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/9572418 New failing tests: canvas/philip/tests/2d.drawImage.image.incomplete.empty.html Created attachment 105797 [details]
Patch for landing
Comment on attachment 105797 [details] Patch for landing Clearing flags on attachment: 105797 Committed r94213: <http://trac.webkit.org/changeset/94213> All reviewed patches have been landed. Closing bug. Some cases missed in this change: https://bugs.webkit.org/show_bug.cgi?id=74500: URLs like "#" should not be loaded as well. https://bugs.webkit.org/show_bug.cgi?id=69211: url() in CSS. |