Bug 8475

Summary: document.lastModified gives no output if the response doesn't have a Last-Modified header
Product: WebKit Reporter: Rüdiger Cordes <rc>
Component: WebKit Misc.Assignee: Adam Barth <abarth>
Status: RESOLVED FIXED    
Severity: Major CC: abarth, ap, darin, eric, ian, myrdred, webkit.review.bot
Priority: P2    
Version: 420+   
Hardware: Mac   
OS: OS X 10.4   
Attachments:
Description Flags
test case
none
Patch to fix last modified when header is missing.
none
Fix for lastModified when there is no header from server
ap: review-
Incorporates the style and other feedback from the review.
none
Patch
none
Patch
none
Patch for landing none

Description Rüdiger Cordes 2006-04-19 03:17:58 PDT
document.lastModified gives no output at all

<html>
<body>
<script language="JavaScript">
 document.write("-" + document.lastModified + "-");
</script>
</body>
</html>
Comment 1 Alexey Proskuryakov 2006-04-19 09:17:08 PDT
Created attachment 7832 [details]
test case
Comment 2 Alexey Proskuryakov 2006-04-19 09:22:56 PDT
document.lastModified works fine in Safari when the server provides "Last-Modified" HTTP header, see <http://nypop.com/~ap/webkit/last-modified.html>. Although Firefox syntesizes the time if this header is missing (maybe it uses "Date" header?), it's not obvious to me if this is correct behavior.
Comment 3 Karl Schramm 2006-05-17 23:22:34 PDT
This seems to be a duplicate of bug <A HREF="http://bugzilla.opendarwin.org/show_bug.cgi?id=4363">4363</A>
Comment 4 Alexey Proskuryakov 2006-05-18 04:47:39 PDT
Bug 4363 is related, but different - it is only about preserving the exact value of the Last-Modified header, while this bug asks to synthesize Last-Modified if it's not present.

I'm confirming this bug as a difference from Firefox, but additional research is nedded in order to decide if we want to fix it.
Comment 5 Jonathan Haas 2008-08-06 11:40:46 PDT
A really easy way to repro this bug is to stick in a file:

<html><head><script>alert(document.lastModified);</script></head><body/></html>

Open it from a file:/// URL, you'll get a blank alert. Put it on a Web server, you'll get the actual last modified date.

document.lastModified is not part of the HTML4 specification, but it did make it into HTML5. HTML5 is very specific about how the value should be returned (as a precisely formatted string, which seems odd to me... why not use a DOMTimestamp or whatever it's called?) and mandates that document.lastModified returns the "source file's last modification", which seems pretty clear to me that it should return the correct date for local files.
Comment 6 Alexey Proskuryakov 2008-08-10 07:46:34 PDT
Ok. But please note that HTTP responses without a Last-Modified header are different - HTML5 says that the attribute should return "01/01/1970 00:00:00", and that's not what Firefox did when I tested.

It is likely that fixes for HTTP and for local files will be separate.
Comment 7 Jon@Chromium 2009-01-30 15:41:21 PST
Created attachment 27200 [details]
Patch to fix last modified when header is missing.

I recognize that IE and FireFox both return the current time when the last modified is empty.  However, the spec says to return the Unix Epoc.
Comment 8 Jon@Chromium 2009-01-30 16:16:10 PST
Created attachment 27201 [details]
Fix for lastModified when there is no header from server

Dimitry noticed that unixEpoc should have been static.
Comment 9 Jonathan Haas 2009-01-30 16:25:45 PST
Firefox and IE don't return the current date and time for file:// URLs, they return the file's last modified datetime from the OS. This complies with the HTML5 spec at 2.1.2:

"The Document's source file's last modification date and time must be derived from relevant features of the networking protocols used, e.g. from the value of the HTTP Last-Modified header of the document, or from metadata in the file system for local files."

Returning the epoch is not in compliance when there exists metadata in the filesystem to indicate the actual last modified datetime.
Comment 10 Alexey Proskuryakov 2009-01-31 00:13:46 PST
(In reply to comment #7)
> I recognize that IE and FireFox both return the current time when the last
> modified is empty.  However, the spec says to return the Unix Epoc.

Ian, could you confirm if this is intentional, or a bug in HTML5?
Comment 11 Alexey Proskuryakov 2009-01-31 00:21:34 PST
Comment on attachment 27201 [details]
Fix for lastModified when there is no header from server

> +    static String unixEpoc = "01/01/1970 00:00:00";

A static String will be destroyed at process exit time, affecting performance. You can use a DEFINE_STATIC_LOCAL macro to create the variable in a way that won't affect performance.

> +    if (!header || !header.length()) {
> +        return unixEpoc;
> +    }
> +    return header;

We don't put braces around single line statements. Also, the check can be just header.isEmpty().

This patch needs to include an automated test, and a ChangeLog modification.

r- due to the reasons given above, and because it's not clear if the specified behavior is intentional.
Comment 12 Ian 'Hixie' Hickson 2009-01-31 01:01:44 PST
The idea is for the spec to exactly match IE here. If that's not what the spec says, please e-mail the list with the test case showing the difference. Thanks!
Comment 13 Darin Adler 2009-01-31 07:29:45 PST
Note that "epoch" is how the word is spelled, not "epoc".
Comment 14 Jon@Chromium 2009-02-03 13:42:23 PST
Created attachment 27287 [details]
Incorporates the style and other feedback from the review.

I have incorporated the feedback from the review.  I still need to figure out how to create a test that returns a custom header.

I understand that this behavior is in conflict with what IE and FF do when the web server does not return a value for the Last-Modified header.  However, this is what the specification requires.  Although returning the current time may seem like the right thing to do, and 1970 is very unlikely, the fact is that scripts which are returning content should return a valid date and 1970 is more likely to catch someone's eye and lead to a fix for their incomplete headers.  This is certainly better than the current behavior of an empty String.

This code has nothing to do with the last modified date of local files returned with file://.  I don't know where that URL is handled so I don't know where it is generating the Document.  It is true that when the browser is reading a file it should properly set the last-modified date from the file system information.  When that is fixed this fix will not longer return 1970.

I appreciate your time in reviewing this and helping me match the WebKit style.
Comment 15 Darin Adler 2009-02-03 14:08:52 PST
(In reply to comment #14)
> Although returning the current time may
> seem like the right thing to do, and 1970 is very unlikely, the fact is that
> scripts which are returning content should return a valid date and 1970 is more
> likely to catch someone's eye and lead to a fix for their incomplete headers.

Sure, but helping someone notice they're doing something wrong and getting them to fix their website is a secondary goal. Ensuring compatibility with websites that have been tested only with the two top market share web browsers is a primary goal.
Comment 16 Rüdiger Cordes 2009-02-03 14:29:03 PST
Maybe its not a bug?

When the header of the webserver contains no entry for last.Modified its the pure truth when JavaScript has no value.

Rüdiger

Why are providers not giving this information?
Comment 17 Jon@Chromium 2009-02-03 15:03:05 PST
Rüdiger, scripts are not providing it when they are generating their own headers.  It is probably just an oversight on the author's part.  I don't think an empty string is ever intentional.  The header is required by HTTP so it is just bad form to not have it at all.

Darin, I will change the fix to match IE and FireFox.  It would be nice to give this feedback to the HTML5 team.  It seems to me that the current time is a better answer given the number of files online that were generated in 1970.  :)

Can someone point me to a test that mocks a custom header so I can create a test?  Are their mock time objects since the new behavior will be showing the current time without one?
Comment 18 Rüdiger Cordes 2009-02-03 15:15:36 PST
Within JavaScript you probably want to know if the given time is from the server in the header and true or if its just a mock of the browser. What is easier: checking if its empty, if its 1970 or if its the actual time?
Comment 19 Rüdiger Cordes 2009-02-03 15:19:18 PST
The last modified time for a static HTML file is easy. What is the time for a PHP file?
Comment 20 Rüdiger Cordes 2009-02-03 15:21:50 PST
Maybe when the time of a PHP file   I S   the actual time you will not be able to decide if it is a mock of the browser or the true value?
Comment 21 Jonathan Haas 2009-02-03 15:38:51 PST
> Note that "epoch" is how the word is spelled, not "epoc".

Epoch fail!

> This code has nothing to do with the last modified date 
> of local files returned with file://

In my view, the problem with file:// URLs *is* the bug. In comparison, the problem when an http:// URL does not return a Last-Modified header is trivial. I mean, sure, WebKit doesn't comply with the spec in that case. Nor does it comply with IE's nor Firefox's breed of noncompliance. And sure, this fix is necessary to bring WebKit closer to full spec compliance. But really, how often does this happen? What modern web server doesn't return a Last-Modified header as a matter of course? 

The case of opening a file:// URL on the other hand is common, and WebKit fails to comply with the spec every single time a file:// URL is opened. Both Firefox and IE do comply with this very common scenario.

I think a good fix would be to have an m_lastModified member of Document itself, and have the loader be responsible for filling this value. The HTTP loader can fill the value with the contents of a Last-Modified header, or the Unix epoch if the header is missing, while the file loader can do whatever OS-specific stuff is needed to extract the file's last modified date and fill in the document's field appropriately. It's not good for Document to refer to HTTP headers anyway; Document is supposed to be scheme-agnostic.
Comment 22 Rüdiger Cordes 2009-02-03 16:01:43 PST
That sounds great! ;-)
Comment 23 Alexey Proskuryakov 2009-02-04 00:12:56 PST
To provide a custom header, you can use PHP, see e.g. <http://trac.webkit.org/browser/trunk/LayoutTests/http/tests/misc/resources/404image.php>. The simplest example is just "<? header("Date: Thu, 01 Jun 2006 06:09:43 GMT"); ?><script>alert(document.lastModified)</script>".

This is really about compatibility with other browsers, not deciding what an ideal behavior would be. As Ian asked to, please tell the WHATWG mailing list if the spec doesn't match IE and Firefox (and which versions).

While I agree that we should fix document.lastModified for local files, too, but it's not clear whether this can even be done without modifications to underlying libraries such as CFNetwork on the Mac. It will be more practical to track fixing local files as a separate bug.
Comment 24 Alexey Proskuryakov 2009-02-04 00:20:35 PST
Comment on attachment 27287 [details]
Incorporates the style and other feedback from the review.

r-, because we want to match other browsers, not the draft spec.

You have tabs in ChangeLog, please use spaces.

+    DEFINE_STATIC_LOCAL(AtomicString,unixEpoch,("01/01/1970 00:00:00"));

Not that this matters any more if we are not going to use the UNIX epoch, but our coding style asks for spaces after commas.

+    // See http://www.w3.org/TR/html5/dom.html#lastmodified
+    // See http://crbug.com/2400

I'm not sure if these comments add much - once we settle on a certain behavior, and the spec is fixed, it will hopefully be pretty clear why we chose it, even without bug references. And there's always svn blame.
Comment 25 Ian 'Hixie' Hickson 2009-02-12 19:32:27 PST
What was the conclusion for HTML5?
Comment 26 Alexey Proskuryakov 2009-02-12 23:40:25 PST
Still needs testing in IE/Firefox. One thing I'm unsure of is whether the local date, or Date header is used to synthesize Last-Modified.
Comment 27 Ian 'Hixie' Hickson 2009-04-25 23:40:47 PDT
I did some testing and changed the spec to return the current time instead of 1970.
Comment 28 Rüdiger Cordes 2009-04-26 03:51:24 PDT
At the moment even with generating a header in PHP the Javascript alert gives no last.Modified time.

<?php
header("Date: Thu, 01 Jun 2006 06:09:43 GMT");
?>
<script>alert(document.lastModified)</script>
Comment 29 Rüdiger Cordes 2009-04-26 03:59:13 PDT
Firefox 3 gives a time but its not the time specified in the php command.
Its the time of requesting the file.

So seems that the biggest web hosting provider in Germany transmitts no last.Modified time at all?
Or its deleted while transfered over DSL?
Comment 30 Rüdiger Cordes 2009-04-26 04:05:55 PDT
Whereas
http://www.topster.de/http-header
reports:

HTTP/1.1 302 Found
Date: Sun, 26 Apr 2009 11:01:23 GMT
Server: Apache
Location: https://macshot.de/php-aquarium/aquarium/code.php
Connection: close
Content-Type: text/html; charset=iso-8859-1
Comment 31 Rüdiger Cordes 2009-04-26 04:46:40 PDT
The same provider and a different domain WITH last.Modified:
(file not mentioned: http://britenweb.de/menue.php)

HTTP/1.1 200 OK
Date: Sun, 26 Apr 2009 11:42:03 GMT
Server: Apache
X-Powered-By: PHP/4.4.9
Last-Modified: Sun, 05 Oct 2008 16:56:23 GMT
Connection: close
Content-Type: text/html
Comment 32 Adam Barth 2010-09-14 01:11:07 PDT
Ok.  It's time for this bug to come to Jesus.
Comment 33 Adam Barth 2010-09-14 01:12:10 PDT
Sayth the spec:

The Document's source file's last modification date and time must be derived from relevant features of the networking protocols used, e.g. from the value of the HTTP Last-Modified header of the document, or from metadata in the file system for local files. If the last modification date and time are not known, the attribute must return the current date and time in the above format.

http://www.whatwg.org/specs/web-apps/current-work/#dom-document-lastmodified
Comment 34 Adam Barth 2010-09-14 03:16:09 PDT
Created attachment 67532 [details]
Patch
Comment 35 Adam Barth 2010-09-14 03:16:45 PDT
Yes, ladies and gentlemen.  It appears there were previously zero tests for this feature.
Comment 36 Adam Barth 2010-09-14 03:18:31 PDT
Created attachment 67533 [details]
Patch
Comment 37 Darin Adler 2010-09-14 09:30:28 PDT
Comment on attachment 67533 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=67533&action=prettypatch

> WebCore/dom/Document.cpp:3623
> +    // FIXME: If this document came from the file system, HTML5 tells us to
> +    // read the last modification date from the file system.
I’m not sure HTML5 is the best colloquial name for the HTML specification.

> WebCore/dom/Document.cpp:3628
> +    return String::format("%02d/%02d/%04d %02d:%02d:%02d",
> +                          date.month()+1, date.monthDay(), date.fullYear(),
> +                          date.hour(), date.minute(), date.second());
We normally do not line subsequent lines up with parentheses like this.

We normally put spaces around operators like the "+" here in "+1".
Comment 38 Adam Barth 2010-09-14 10:30:49 PDT
Comment on attachment 67533 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=67533&action=prettypatch

> WebCore/dom/Document.cpp:3623
> +    // read the last modification date from the file system.
Ok.  What name would you prefer?  My thought is that this comment won't be addressed for a long time, so I wanted to be specific in case future versions of HTML do something different.

> WebCore/dom/Document.cpp:3628
> +                          date.hour(), date.minute(), date.second());
Okiedokes.
Comment 39 Adam Barth 2010-09-14 11:52:06 PDT
Created attachment 67582 [details]
Patch for landing
Comment 40 Adam Barth 2010-09-14 16:29:50 PDT
Comment on attachment 67582 [details]
Patch for landing

Clearing flags on attachment: 67582

Committed r67513: <http://trac.webkit.org/changeset/67513>
Comment 41 Adam Barth 2010-09-14 16:29:57 PDT
All reviewed patches have been landed.  Closing bug.
Comment 42 WebKit Review Bot 2010-09-14 17:00:50 PDT
http://trac.webkit.org/changeset/67513 might have broken Qt Linux Release
The following changes are on the blame list:
http://trac.webkit.org/changeset/67513
http://trac.webkit.org/changeset/67514
Comment 43 Darin Adler 2010-09-14 21:46:27 PDT
Looks like the regression test added in this patch is failing on one GTK and one Qt bot.
Comment 44 Adam Barth 2010-09-14 21:58:20 PDT
(In reply to comment #43)
> Looks like the regression test added in this patch is failing on one GTK and one Qt bot.

It might time zone thing.  I'll try to fix it.