Bug 32423

Summary: Incorrect cache behavior with dynamic scripts after page-refresh
Product: WebKit Reporter: Kyle Simpson <getify>
Component: HistoryAssignee: Nobody <webkit-unassigned>
Status: UNCONFIRMED    
Severity: Normal CC: ap, jjjaquanrice, john.david.dalton, rvargas, tonyg
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: PC   
OS: Windows XP   
URL: http://test.getify.com/webkit-cache-bug/test-1.html
Attachments:
Description Flags
revised test case. none

Kyle Simpson
Reported 2009-12-11 04:31:22 PST
I originally filed this bug with Chromium bug tracker believing it to only affect Chrome. However, it has come to my attention that the behavior in question affects both Chrome and Safari, and the Chrome developers believe it's actually inside the Webkit code, so they suggested I post here instead. For reference, that Chromium bug report is here: http://code.google.com/p/chromium/issues/detail?id=23162 The way I originally described and presented the bug turns out to have just been pretty confusing. My bad. So, here, I'm gonna present it in a different, hopefully more concrete/clear way. So, that's why it sounds different here from how I originally described over on the Chromium site. Please bear with me as I know this is very long bug report with lots of detail. -------------- Let me first describe a scenario and my observations/conclusions about it, then, down below, I'll link to and explain some proof test cases so you can confirm the behavior/bug I describe for yourself. Page View 1 (initial page view in browser) a) Download "script1.php?3333" with type "text/javascript" b) Download "script2.php?5555" with type "script/preload" c) After 1b completes, re-attach "script2.php?5555" to page (via script tag) with type "text/javascript" Page View 2 (regular refresh view of same page in browser) a) Download "script1.php?7777" with type "text/javascript" b) Download "script2.php?9999" with type "script/preload" c) After 2b completes, re-attach "script2.php?9999" to page (via script tag) with type "text/javascript" ---------- Notice how 1b and 1c have the SAME URL as each other... meaning the browser should be able to pull from cache of 1b for 1c. Also notice how 2b and 2c have the SAME URL (same as each other, but obviously different from 1b and 1c)... meaning the browser should be able to pull from cache of 2b for 2c. ---------- What actually happens: * browser does a full request for 1b, but pulls from cache for 1c. -- Good! * browser does a full request for 2b, but cannot pull from cache for 2c, so must full download again. -- Bad! ---------- 2b and 2c have same URL as each other, why is the cache INVALID for 2c? Moreover, why is the cache VALID for 1c but INVALID for 2c? The difference between 1 and 2 is really why I'm claiming a bug. In my mind, there shouldn't be a difference in how the browser treats 1 and 2, because both start with freshly generated dynamic URL's (to keep the test clean). 1b should result in a valid cache for 1c, and it does! But by the same logic, 2b should result in a valid cache for 2c, but it doesn't. :( ***I understand*** that my .php scripts are only sending out a valid "Expires" header, but not sending out an Etag or last-modified header. But that still hasn't explained to me sensibly why behavior is different between 1 and 2, since the same headers get sent in both cases. ---------- Other things to note about the quirky/buggy behavior: 1. if 1b, 1c, 2b, and 2c all happen AFTER their respective page loads, then BOTH 1c and 2c find valid cache hits. 2. if there are no ?xxxx type parameters added to the above URL's (but you clear the cache after initial page load before hitting "refresh"), then BOTH 1c and 2c find valid cache hits. 3. if the URL's are ".js" instead of ".php", then BOTH 1c and 2c find valid cache hits. 4. If 1a and 1b are not part of the scenario (meaning, we only try 1b, 1c, 2b, and 2c), then BOTH 1c and 2c always find valid cache hits. Only when another regular script download is introduced to the scenario does this quirk present itself. ---------- Test cases: http://test.getify.com/webkit-cache-bug/test-1.html http://test.getify.com/webkit-cache-bug/test-1-alt.html http://test.getify.com/webkit-cache-bug/test-2.html http://test.getify.com/webkit-cache-bug/test-3.html http://test.getify.com/webkit-cache-bug/test-4.html http://test.getify.com/webkit-cache-bug/test-5.html ---------- Only "test-1" shows the bug I'm asserting, the rest of the tests show variations which make the bug *not* appear. In all of these tests, essentially the same basic logic occurs, which maps closely to the scenarios and assertions described above. Each test starts out by loading "script1" URL with a valid script type of "text/javascript", and "script2" with a fake script type of "script/preload". Then, once the first download of "script2" finishes, the test tries to reload the exact same "script2" URL again (so should pull from cache), but this time using the valid "text/javascript" script type. The "log" output that you'll see in each test is to demonstrate the elapsed time between when a script URL is added and when it finishes loading. The .php scripts are set to intentionally create a 3-4 second delay to make it very obvious in the elapsed time log output whether the browser is requesting from the server or pulling from cache. But even in the .js test (#4), which has no artifical delays, you'll see a 300-400ms time for the full load, and a 30-50ms time for the pull from cache. Basically, the key thing to look for in the log output, to demonstrate the bug I'm asserting, is the two elapsed times for the "script2" URL's. You *should* see the first "script2" attempt be long (since it pulls from server), and the second "script2" attempt be short (since it should pull from cache). Also, remember, the bug assertion I make is that behavior is different between an initial page view and when you refresh the page. ---------- Specifics notes/steps for each test case: * test-1 : base test case. when you first load this test, you'll see that "script2" has first a long elapsed time, and then on second attempt a short elapsed time. This proves it pulled from cache the second time. Good! but, if you then refresh the page (F5, refresh button, etc), you'll see that both attempts for "script2" will be long elapsed times, indicating the cache is not used for the second attempt like it should, and like it did in the initial page view. Bad! * test-1-alt : variation on test-1, the ONLY difference being that the scripts don't auto-load during page load, but are loaded on-demand when you click the button. In this test, you'll note that the "script2" elapsed times are correct for both an initial page load and after you refresh the page. Good! * test-2 : before running this test, clear your cache. this test doesn't do the re-attempt of "script2" at all. So in the logs, you'll only see one listing for "script2". Inspect your cache entries after running this test, and you'll prove that "test2" really IS in the cache (even though "script/preload" was used as the type). Good! then, clear your cache again, and refresh the page. then reinspect the cache entries, you'll see "script2" there again. So it comes to the cache just fine, both on initial page view and on refresh. Good! * test-3 : before running this test, clear your cache. this test is the same as test-1 except that the _=0.54353453 "cache busting" params are NOT generated for the "script1" and "script2" URLs. Run the test, notice the "script2" elapsed times are correct. Good! then, clear your cache again, and refresh the page. notice the "script2" elapsed times are still correct. Good! * test-4 : this test is the same as test-1 except that the URL's are ".js" instead of ".php". Run the test, you'll notice the "script2" elapsed times are correct. Good! then, refresh the page. notice the "script2" elapsed tiems are still correct. Good! * test-5 : this test only loads "script2" (both attempts), but not "script1". You'll note that the elapsed times are correct for "script2" on both initial page view *and* after a refresh. Good!
Attachments
revised test case. (48 bytes, text/plain)
2009-12-14 23:30 PST, John-David Dalton
no flags
Alexey Proskuryakov
Comment 1 2009-12-14 10:33:39 PST
See also: bug 30862. I'm having a lot of difficulty trying to understand this bug or the attached test cases. Clearly, there are some amusing differences in behavior depending on how exactly script elements are inserted, but is there any practical problem that needs to be fixed?
Kyle Simpson
Comment 2 2009-12-14 17:01:24 PST
@Alexey -- I appreciate you taking a look. I'm sorry that it is confusing, but it should be obvious with the amount of time I took to explain and the several test cases that this is far more than an "amusing" observation of quirks. The behavior I've identified represents a real problem that I'm very interested in doing whatever it takes to help address. The bug I describe has been independently confirmed by a number of other developers, and as I mentioned, was also confirmed (and even an attempt at an explanation) by the Chromium developers over on the other bug thread. I would be *more* than happy/willing to discuss this bug to any level of detail necessary to help explain it fully and help find the cause and solution. I can do so via messages here in this thread, IM, twitter, separate email, even phone. Please just let me know what channel will best help explain and clear up confusion over the bug and test cases. --------------- I didn't default to cluttering up this thread with a detailed explanation of exactly WHY i would be doing something so "crazy" as trying to load the same script URL multiple times in the same page (moreover, loading it with different mime-types each time), but it is in fact a big problem I've run into and I currently have no workarounds or solutions other than hoping Webkit will address the bug. The behavior/bug was discovered when developing and testing my project LABjs (http://labjs.com) which is a dynamic script loader (for improving page load performance) developed in cooperation with Steve Souders. LABjs uses a variety of different "tricks" (depending on browser) to be able to "load" scripts in parallel, but prevent their automatic execution, so that they can be executed later in specific order. Specifically, one of these tricks, a key critical piece of functionality in LABjs, is to load a script URL with a fake mime-type like "script/preload", which will cause the browser to fetch the script (into the cache) but will NOT execute it. Then later, when LABjs determines it's time to execute the script, it adds another script dom element with the EXACT same URL, but with a proper mime-type "text/javascript". This should pull the script nearly immediately from cache (from the first attempt) and of course execute it basically right away. --------------- The test cases I presented don't use LABjs (although the LABjs test suite will also show the same symptoms), they just represent the simple parts of the logic necessary to demonstrate the bug, extracted and presented (hopefully more clearly) to help demonstrate that it's not a code logic error but a behavior of the browser. The "test-1" test case behaves PROPERLY on first page load, but starts misbehaving after a page refresh (of various kinds). The same is true of LABjs. Because the exact same code logic behaves differently between first pageview and refreshed pageview, it should show the behavior lies within the browser internals. So what happens is that on *first pageview*, a particular script URL is only loaded ONCE from server, even though two adds/attempts are made. This is because the script is correctly cached after the first request, as desired, and thus the second attempt later in that page's lifetime pulls from cache rather than from the server. However, when you then refresh that page, you see clearly that the same two attempts to load a script URL result in TWO full loads from server, because the script isn't properly cached with the first request like it is on first-pageview. The other tests show slight variations which, for whatever reason, cause the bug to NOT appear. My hope with presenting these other negative test cases would be that the by seeing what is NOT buggy along with what IS buggy, it would help identify exactly where/why the bug is. --------------- To recap, the very real "problem" this bug presents is costly unnecessary extra loads of scripts into a page (upon a page refresh), which not only creates more server/network traffic load, but also can significantly slow down the page view performance itself in the browser. Clearly, the point of LABjs being to improve page load performance, it's a very bad thing that this bug actually causes LABjs to make page performance much worse (only for Webkit browser) than if it hadn't even been used at all! So, I'm very interested in helping get this addressed.
John-David Dalton
Comment 3 2009-12-14 23:30:22 PST
Created attachment 44849 [details] revised test case.
John-David Dalton
Comment 4 2009-12-14 23:31:42 PST
Ok so basically the first time the page loads in Chrome 3+, or Safari 4.0.4 you get (use "revised test case" url): // First attempt (everything works great) added script/preload (script #2) added text/javascript (script #1) loaded script/preload in 3265ms (script #2) <- should now be cached added text/javascript (script #2) <-- inserted as text/javascript loaded text/javascript in 14ms (script #2) <-- loads instantly from cache loaded text/javascript in 4275ms (script #1) // Refresh the page added script/preload (script #2) added text/javascript (script #1) loaded script/preload in 3265ms (script #2) <- should now be cached added text/javascript (script #2) <-- inserted as text/javascript loaded text/javascript in 4275ms (script #1) loaded text/javascript in 3210ms (script #2) <-- should've loaded from cache // Note: you have to kill the tab or browser session for the "First attempt" to work again.
Alexey Proskuryakov
Comment 5 2010-03-09 15:04:48 PST
Duplicate of bug 30862?
Kyle Simpson
Comment 6 2010-03-09 15:31:07 PST
Actually, I believe this to be the opposite behavior to 30862. My bug is demonstrating a scenario when the cache SHOULD be used for two loads of the exact same URL resource in the same page-view (one before page load, one on-demand, later), and it is failing to do so (meaning it re-requests the resource a second time incorrectly, even though the first load did put the resource into the cache). 30862 seems to be about resources staying cached when they shouldn't be, which is the opposite to this bug.
rvargas
Comment 7 2011-02-03 18:32:20 PST
@Alexey: Note that http://code.google.com/p/chromium/issues/detail?id=23162#c6 has a description of where exactly this bug resides (although with an old version of the code).
Tony Gentilcore
Comment 8 2012-05-09 17:07:24 PDT
@getify: I'm working on related bug 84614 and suspect it's a duplicate. However, your test case doesn't work any more for me. Would you mind double checking it? I'm very interested to get to the bottom of this.
Kyle Simpson
Comment 9 2012-05-09 18:41:09 PDT
(In reply to comment #8) Tony- The previous test cases no longer work because Webkit (and thus Chrome) have changed since this bug was initial submitted. Those test cases were based on then-behavior that script elements with fake/unrecognized mime-types would be requested by the browser into the cache, but not executed. Webkit no longer fetches such scripts, so the test cases all fail to execute fully. --------------------- I have created a new test case which I believe illustrates still the same problem, which is that the cache is not used during a page reload even though caching headers ostensibly suggests the item should be fetched from cache. Steps to reproduce: 1. with a clean cache, go to: http://test.getify.com/webkit-bug-32423/ 2. wait for the "initial" 2 scripts to load/finish. then, click the button to re-request them, and wait for the "on-demand" script requests (same URLs!) to finish (should be nearly immediate, because they come from cache, as desired). 3. normal-click the refresh button. note that the scripts again load nearly immediately, from cache, as desired! 4. finally, normal-click the refresh the button a second time (third total page load), and notice that the scripts no longer come from cache, but get re-requested, and thus take several seconds each. Further subsequent loads continue to re-request the scripts from the server (not the cache) every time.
Tony Gentilcore
Comment 10 2012-05-10 10:15:53 PDT
(In reply to comment #9) > (In reply to comment #8) > Tony- > > The previous test cases no longer work because Webkit (and thus Chrome) have changed since this bug was initial submitted. Those test cases were based on then-behavior that script elements with fake/unrecognized mime-types would be requested by the browser into the cache, but not executed. Webkit no longer fetches such scripts, so the test cases all fail to execute fully. > > --------------------- > > I have created a new test case which I believe illustrates still the same problem, which is that the cache is not used during a page reload even though caching headers ostensibly suggests the item should be fetched from cache. > > Steps to reproduce: > > 1. with a clean cache, go to: > > http://test.getify.com/webkit-bug-32423/ > > 2. wait for the "initial" 2 scripts to load/finish. then, click the button to re-request them, and wait for the "on-demand" script requests (same URLs!) to finish (should be nearly immediate, because they come from cache, as desired). > > 3. normal-click the refresh button. note that the scripts again load nearly immediately, from cache, as desired! > > 4. finally, normal-click the refresh the button a second time (third total page load), and notice that the scripts no longer come from cache, but get re-requested, and thus take several seconds each. Further subsequent loads continue to re-request the scripts from the server (not the cache) every time. This still repros at ToT so not a dupe. However, #3 seems like the bug, not #4. When the user presses the browser's refresh button, we are supposed to revalidate all the subresources with the server. The desired heuristic is to revalidate anything which is started prior to the load event. Resources lazily loaded after the page loads shouldn't continue to be revalidated.
Kyle Simpson
Comment 11 2012-05-10 12:13:56 PDT
(In reply to comment #10) > This still repros at ToT so not a dupe. However, #3 seems like the bug, not #4. I designed the test so that the first request for script1 and script2 comes during/before load. When you subsequently request them a second time, by clicking the button, you're pulling from the cache that's already primed from during load, not dynamically loading them after page load. This is why, I think, #3 is desired and not a bug, because when you refresh the page, and it again goes to load the scripts during page load, it's reloading resources that were previously loaded during a page load. Or am I misunderstanding? If I'm correct in that assertion, that's why #4 is the bug, because the assertion should continue to hold true on subsequent page loads where they are always first requested during a page load. Or no? > When the user presses the browser's refresh button, we are supposed to revalidate all the subresources with the server. The desired heuristic is to revalidate anything which is started prior to the load event. Resources lazily loaded after the page loads shouldn't continue to be revalidated. I understand that browsers desire that heuristic, but as Bug #30862's discussion thread (and other related ones) shows, I think developers have a more complex heuristic they desire: when they **shift+reload** the page, they want any hard resource that might have changed (like script files they are developing) to be revalidated, regardless of when it got loaded onto the page (race conditions!), whereas some soft resources (like perhaps XHR requests), I can see why those maybe shouldn't be re-validated (just like you don't necessarily want to re-submit form posts, etc). But anyway, that's a rabbit trail best left for discussion in that other bug.
Note You need to log in before you can comment on or make changes to this bug.