Bug 47035 - Application cache selection algorithm should only be invoked after navigation
Summary: Application cache selection algorithm should only be invoked after navigation
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Alexey Proskuryakov
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-10-01 17:57 PDT by Alexey Proskuryakov
Modified: 2010-10-05 10:08 PDT (History)
4 users (show)

See Also:


Attachments
test case (http/tests/appcache) (736 bytes, text/html)
2010-10-01 17:58 PDT, Alexey Proskuryakov
no flags Details
proposed fix (8.47 KB, patch)
2010-10-01 22:58 PDT, Alexey Proskuryakov
abarth: review-
Details | Formatted Diff | Diff
proposed fix (22.98 KB, patch)
2010-10-04 11:56 PDT, Alexey Proskuryakov
no flags Details | Formatted Diff | Diff
with Qt build fix (22.97 KB, patch)
2010-10-04 12:14 PDT, Alexey Proskuryakov
abarth: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alexey Proskuryakov 2010-10-01 17:57:08 PDT
We don't implement this part of HTML5:

------------------
A start tag whose tag name is "html"
Create an element for the token in the HTML namespace. Append it to the Document object. Put this element in the stack of open elements.

If the Document is being loaded as part of navigation of a browsing context, then: if the newly created element has a manifest attribute whose value is not the empty string, then resolve the value of that attribute to an absolute URL, relative to the newly created element, and if that is successful, run the application cache selection algorithm with the resulting absolute URL with any <fragment> component removed; otherwise, if there is no such attribute, or its value is the empty string, or resolving its value fails, run the application cache selection algorithm with no manifest. The algorithm must be passed the Document object.
------------------

Instead, we always run the application cache selection algorithm, even if not loaded as part of navigation of a browsing context. That causes assertion failures and generally incorrect behavior.
Comment 1 Alexey Proskuryakov 2010-10-01 17:58:26 PDT
Created attachment 69551 [details]
test case (http/tests/appcache)

Firefox 3.6.10 has really weird behavior on this test - it passes at first, but fails when reloading the page.
Comment 2 Alexey Proskuryakov 2010-10-01 18:00:03 PDT
An assertion fails if you e.g. just document.write("foobar") in a document associated with an appcache.
Comment 3 Alexey Proskuryakov 2010-10-01 22:58:57 PDT
Created attachment 69565 [details]
proposed fix
Comment 4 Adam Barth 2010-10-02 15:32:41 PDT
Comment on attachment 69565 [details]
proposed fix

View in context: https://bugs.webkit.org/attachment.cgi?id=69565&action=review

Please add a test case for an HTMLHtmlElement created by createElement being added to the document while the parser is running.

> WebCore/dom/DocumentParser.h:92
> +    void setWasLoadedAsPartOfNavigation() { m_wasLoadedAsPartOfNavigation = true; }
> +    bool wasLoadedAsPartOfNavigation() const { return m_wasLoadedAsPartOfNavigation; }

Can we make this an argument to the constructor?  This API is much wider than we need.  Also, the getter can be protected if we the parser itself make the check.  Adding it to the constructor will also let us scope the bit to the HTMLDocumentParser, where it seems to belong according to the spec.

> WebCore/html/HTMLHtmlElement.cpp:65
> -    if (!document()->parsing())
> +    if (!document()->parser() || !document()->parser()->wasLoadedAsPartOfNavigation())
>          return;

This code looks wrong.  It only supposed to trigger for HTMLHtmlElement created by the parser.  It's entirely possible to createElement('html') and insert it into the document while the document has a parser that was loaded as part of navigation.

Instead of putting this code in insertedIntoDocument, we should put the code in a new method of HTMLHtmlElement that the parser calls explicitly in the proper state.

> WebCore/loader/FrameLoader.cpp:614
> +    m_frame->document()->parser()->setWasLoadedAsPartOfNavigation();

This works should be done by the DocumentWriter.  It's responsible for interacting with the parser.
Comment 5 Alexey Proskuryakov 2010-10-02 20:08:49 PDT
> Please add a test case for an HTMLHtmlElement created by createElement being added to the document while the parser is running.

Will do.

> Can we make this an argument to the constructor?  This API is much wider than we need.  Also, the getter can be protected if we the parser itself make the check.  Adding it to the constructor will also let us scope the bit to the HTMLDocumentParser, where it seems to belong according to the spec.

XHTML is not different at all. Image documents etc also need this function to run to completion, for "select cache without manifest URL" branch to be taken. See section 6.5.3 "Page load processing model for XML files" and below for non-HTML content.

In fact, this is why I made it a separate function, not a constructor argument.

> This code looks wrong.  It only supposed to trigger for HTMLHtmlElement created by the parser.  It's entirely possible to createElement('html') and insert it into the document while the document has a parser that was loaded as part of navigation.
> 
> Instead of putting this code in insertedIntoDocument, we should put the code in a new method of HTMLHtmlElement that the parser calls explicitly in the proper state.

Well, that would mean moving all of insertedIntoDocument() into this special function, for every document kind... Appcache cache selection is the only reason for its existence.

Perhaps that's what has to be be done indeed.

> > WebCore/loader/FrameLoader.cpp:614
> > +    m_frame->document()->parser()->setWasLoadedAsPartOfNavigation();
> 
> This works should be done by the DocumentWriter.  It's responsible for interacting with the parser.

Why is the parser exposed on Document if we can't talk to it? Sure, I could pass the arguments all the way through DocumentWriter constructor, and all Document derived class constructors, but I doubt that we gain more than we lose that way.
Comment 6 Adam Barth 2010-10-02 22:25:24 PDT
> > This code looks wrong.  It only supposed to trigger for HTMLHtmlElement created by the parser.  It's entirely possible to createElement('html') and insert it into the document while the document has a parser that was loaded as part of navigation.
> > 
> > Instead of putting this code in insertedIntoDocument, we should put the code in a new method of HTMLHtmlElement that the parser calls explicitly in the proper state.
> 
> Well, that would mean moving all of insertedIntoDocument() into this special function, for every document kind... Appcache cache selection is the only reason for its existence.
> 
> Perhaps that's what has to be be done indeed.

Is there another way to get the right behavior?  Whether to trigger this code depends on who's inserting the element into the document.

> > > WebCore/loader/FrameLoader.cpp:614
> > > +    m_frame->document()->parser()->setWasLoadedAsPartOfNavigation();
> > 
> > This works should be done by the DocumentWriter.  It's responsible for interacting with the parser.
> 
> Why is the parser exposed on Document if we can't talk to it? Sure, I could pass the arguments all the way through DocumentWriter constructor, and all Document derived class constructors, but I doubt that we gain more than we lose that way.

The abstraction boundaries in FrameLoader are very poor.  I have a long-term plan for cleaning them up, but that first requires moving all the code to the right place and then limiting the visibility of things you're not supposed to see.

This code shouldn't go in FrameLoader because FrameLoader shouldn't interact with the parser at all.  If you look at this diagram:

https://docs1.google.com/drawings/edit?id=1ko0LFteYpoXdmfYO1rYme6t-QXLPQdI1Z_ysejpOVYk&hl=en

You'll see that it's DocumentLoader's job to interact with the parser.  FrameLoader's job is to shuffle DocumentLoaders around for the provisional/committed lifecycle.  DocumentLoader's job is to actually shove bytes into the document.  I've managed to package most of FrameLoader's interaction with the parser up into DocumentWriter.  The next step is to move DocumentWrite from FrameLoader to DocumentLoader (and probably give it a better name).

I know all this stuff isn't super visible in the code at the moment, but hopefully it's on the road to improvement.
Comment 7 Alexey Proskuryakov 2010-10-04 11:56:38 PDT
Created attachment 69658 [details]
proposed fix
Comment 8 Early Warning System Bot 2010-10-04 12:11:38 PDT
Attachment 69658 [details] did not build on qt:
Build output: http://queues.webkit.org/results/4252002
Comment 9 Alexey Proskuryakov 2010-10-04 12:14:42 PDT
Created attachment 69663 [details]
with Qt build fix
Comment 10 Adam Barth 2010-10-04 12:24:02 PDT
Comment on attachment 69663 [details]
with Qt build fix

View in context: https://bugs.webkit.org/attachment.cgi?id=69663&action=review

I'm not super happy with how the bit ends up getting set on the parser, but I'm not sure I have a better idea for how to do it.  Thanks for iterating on the patch.

> WebCore/html/ImageDocument.cpp:200
>      RefPtr<Element> rootElement = Document::createElement(htmlTag, false);
>      appendChild(rootElement, ec);
> +#if ENABLE(OFFLINE_WEB_APPLICATIONS)
> +    static_cast<HTMLHtmlElement*>(rootElement.get())->insertedByParser();
> +#endif

I should clean up all this stuff to go through the HTMLDocumentParser.

> WebCore/html/parser/HTMLConstructionSite.h:-57
> -    void insertHTMLHtmlElement(AtomicHTMLToken&);

I take it this was unused.

> WebCore/loader/DocumentWriter.cpp:254
> +void DocumentWriter::setDocumentWasLoadedAsPartOfNavigation()
> +{
> +    m_frame->document()->parser()->setDocumentWasLoadedAsPartOfNavigation();
> +}

This does kind of look silly.  :)
Comment 11 Alexey Proskuryakov 2010-10-04 12:49:51 PDT
Committed <http://trac.webkit.org/changeset/69026>.
Comment 12 WebKit Review Bot 2010-10-04 13:16:43 PDT
http://trac.webkit.org/changeset/69026 might have broken Qt Windows 32-bit Release
Comment 13 Alexey Proskuryakov 2010-10-05 10:08:40 PDT
For the record, loaded as part of navigations means being loaded by the "Navigating across documents" algorithm: <http://www.whatwg.org/specs/web-apps/current-work/complete/history.html#navigate>