Bug 57794

Summary: REGRESSION (r64712): Microsoft Outlook 2011: original message contents not included when replying to an email.
Product: WebKit Reporter: Andy Estes <aestes>
Component: DOMAssignee: Andy Estes <aestes>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, aestes, eric, hsivonen
Priority: P1 Keywords: InRadar, NeedsReduction, Regression
Version: 528+ (Nightly build)   
Hardware: Mac   
OS: OS X 10.6   
Bug Depends on:    
Bug Blocks: 41115    
Attachments:
Description Flags
Patch darin: review+

Description Andy Estes 2011-04-04 14:56:13 PDT
When Microsoft Outlook 2011 is running with a version of WebKit >= r64712, reply messages do not include the original message content.

I have a copy of Outlook 2011 installed and am working on creating a reduction.
Comment 1 Andy Estes 2011-04-04 14:56:24 PDT
<rdar://problem/9065834>
Comment 2 Andy Estes 2011-04-04 15:05:24 PDT
Caused by http://trac.webkit.org/changeset/64712
Comment 3 Andy Estes 2011-04-04 16:00:22 PDT
This is fixed by flipping the parser quirks bit for Outlook.
Comment 4 Adam Barth 2011-04-04 16:14:54 PDT
How does Firefox 4 behave?
Comment 5 Andy Estes 2011-04-04 16:15:44 PDT
(In reply to comment #4)
> How does Firefox 4 behave?

This is a application that embeds WebKit.framework.
Comment 6 Adam Barth 2011-04-04 16:18:46 PDT
Oh, I assumed you meant OWA.
Comment 7 Andy Estes 2011-04-04 20:08:00 PDT
(In reply to comment #3)
> This is fixed by flipping the parser quirks bit for Outlook.

s/is fixed/is *not* fixed/g

:(
Comment 8 Andy Estes 2011-04-05 02:14:37 PDT
I'm not quite sure what to make of this yet. I sent myself an HTML email with this content:

This text is <b>bold</b>

When I select this message and click reply, I see the roughly the following sequence of ObjC DOM API calls:

[DOMHTMLElement setInnerHTML:@"This text is <b>bold</b>"]
...
[DOMHTMLElement innerHMTL]
...
[WebFrame loadHTMLString:@"<stringThatIncludesMessageHeadersAndContent>"]

<stringThatIncludesMessageHeadersAndContent> contains "This text is <b>bold</b>", prefixed with message headers (From, Date, To, Subject). Basically, it's the content you'd expect to see below your cursor when replying to an email.

When the legacy parser is used, [DOMHTMLElement innerHTML] returns the string that was set in the previous call to [DOMHTMLElement setInnerHTML]. When the new parser is enabled, [DOMHTMLElement innerHTML] returns an empty string.

Presumably Outlook does the setInnerHTML/innerHTML combination so that it can do some DOM manipulation on the message contents, and I'm guessing it expects a certain quirk of the old parser to do this manipulation correctly. Maybe it expects a particular DOM layout that is no longer guaranteed due to creation of implicit tags like HEAD.

I'll set some more breakpoints in HTMLElement. Maybe I can catch it removing a node or something.
Comment 9 Andy Estes 2011-04-05 02:16:45 PDT
I should clarify that <stringThatIncludesMessageHeadersAndContent> is also only valid when the legacy parser is used. With the new parser, it contains the message headers but not the body. It appears to be constructed based in part on the previous call to [DOMHTMLElement innerHTML].
Comment 10 Adam Barth 2011-04-05 10:03:49 PDT
> [DOMHTMLElement setInnerHTML:@"This text is <b>bold</b>"]
> ...
> [DOMHTMLElement innerHMTL]

Are these called on the same element?  What is the DOM below that element?
Comment 11 Andy Estes 2011-04-05 17:36:33 PDT
(In reply to comment #10)
> > [DOMHTMLElement setInnerHTML:@"This text is <b>bold</b>"]
> > ...
> > [DOMHTMLElement innerHMTL]
> 
> Are these called on the same element?  What is the DOM below that element?

They are the same element. If I walk up to the document element and call innerHTML, here is what I see:

<head></head><body></body><body>this text is <b>bold</b></body>

No doubt it's that extra BODY that is causing the problem. They probably later ask for document.body, which has no children, when they really want the second body element.
Comment 12 Adam Barth 2011-04-05 17:39:51 PDT
Interesting!  I don't think you're supposed to be able to end with with two body elements...
Comment 13 Andy Estes 2011-04-05 17:42:29 PDT
(In reply to comment #11)
> (In reply to comment #10)
> > > [DOMHTMLElement setInnerHTML:@"This text is <b>bold</b>"]
> > > ...
> > > [DOMHTMLElement innerHMTL]
> > 
> > Are these called on the same element?  What is the DOM below that element?
> 
> They are the same element. If I walk up to the document element and call innerHTML, here is what I see:
> 
> <head></head><body></body><body>this text is <b>bold</b></body>
> 

The legacy parser gives us:

<body>this text is <b>bold</b></body>

The problem could possibly be that they assume document.firstChild.firstChild == document.body, in which case the implicit HEAD is a problem. It seems more likely to be related to the duplicate BODY element though.
Comment 14 Andy Estes 2011-04-05 19:02:12 PDT
(In reply to comment #12)
> Interesting!  I don't think you're supposed to be able to end with with two body elements...

It's pretty simple to end up with two BODY elements:

<script>
document.firstChild.innerHTML = "";
</script>

will give you:

<head></head><body></body><body></body>

This is because the fragment will start in BeforeHeadMode so will get implicit HEAD and BODY tags which get moved to the main document. Then when regular parsing resumes a second implicit BODY is added. Firefox 4 does this too.
Comment 15 Adam Barth 2011-04-05 22:02:55 PDT
> It's pretty simple to end up with two BODY elements:

Crazy!  I'm not sure if that's an intentional behavior.
Comment 16 Andy Estes 2011-04-05 23:21:28 PDT
I agree, it doesn't seem right to behave like this. I'll file a spec bug.
Comment 17 Henri Sivonen 2011-04-06 00:51:01 PDT
(In reply to comment #15)
> > It's pretty simple to end up with two BODY elements:
> 
> Crazy!  I'm not sure if that's an intentional behavior.

The behavior is entirely logical and makes sense, because
 1) DOM operations (including the innerHTML setter) don't care about the parser that's parsing the network stream.
AND
 2) The parser doesn't read back from the DOM.

If you don't like the outcome, don't write code that sets innerHTML of the document element before the network stream parser has finished.
Comment 18 Andy Estes 2011-04-06 01:03:33 PDT
(In reply to comment #15)
> > It's pretty simple to end up with two BODY elements:
> 
> Crazy!  I'm not sure if that's an intentional behavior.

It also turns out that this behavior is unrelated to the Outlook bug. I found that Outlook is explicitly creating a BODY node that parents the message fragment, so this bug is all about the new parser creating a duplicate implicit BODY. The parser needs a new quirk to accommodate Outlook that doesn't create these implicit nodes for empty documents.

Apologies for the red herring.
Comment 19 Andy Estes 2011-04-06 01:06:58 PDT
Even better, we might be able to do this with an injected script a la <https://bugs.webkit.org/show_bug.cgi?id=45693>.
Comment 20 Adam Barth 2011-04-06 01:07:41 PDT
I was going to suggest that, but we got a mid-air collision.  :)
Comment 21 Andy Estes 2011-04-06 04:21:35 PDT
<http://trac.webkit.org/changeset/71339> made it such that we can't inject scripts into the initial document, which prevents me from removing the implicit BODY from the initial empty document before Outlook adds its own BODY.
Comment 22 Eric Seidel (no email) 2011-04-06 05:24:13 PDT
I suspect we could double-quirk it for mail. :)  i.e. turn back on the always-inject feature only for outlook and then write such an early-inject script for outlook. :Donno.  The inject-based quirks are just so clean to write!
Comment 23 Andy Estes 2011-04-06 20:11:57 PDT
Created attachment 88560 [details]
Patch
Comment 24 Darin Adler 2011-04-06 20:24:08 PDT
Comment on attachment 88560 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=88560&action=review

> Source/WebCore/loader/FrameLoader.cpp:-796
> +    m_frame->injectUserScripts(InjectAtDocumentEnd);
> +
>      if (m_stateMachine.creatingInitialEmptyDocument())
>          return;
>  
> -    m_frame->injectUserScripts(InjectAtDocumentEnd);

No explanation of this change. This is why per-file/per-function comments in ChangeLog are better. You could have said why you changed this.

> Source/WebCore/page/Frame.cpp:522
> +    if (loader()->stateMachine()->creatingInitialEmptyDocument()
> +        && !settings()->injectUserScriptsInInitialEmptyDocument())

I suggest combining into one line.

> Source/WebCore/page/Settings.cpp:175
> +    , m_injectUserScriptsInInitialEmptyDocument(false)

SInce this is a verb phrase, I think you should add the word “should” to its name.

> Source/WebCore/page/Settings.h:390
> +        void setInjectUserScriptsInInitialEmptyDocument(bool flag) { m_injectUserScriptsInInitialEmptyDocument = flag; }
> +        bool injectUserScriptsInInitialEmptyDocument() { return m_injectUserScriptsInInitialEmptyDocument; }

Ditto.

> Source/WebCore/page/Settings.h:493
> +        bool m_injectUserScriptsInInitialEmptyDocument : 1;

Ditto.
Comment 25 Eric Seidel (no email) 2011-04-06 20:27:46 PDT
Very slick.
Comment 26 Andy Estes 2011-04-07 12:56:11 PDT
Comment on attachment 88560 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=88560&action=review

>> Source/WebCore/page/Settings.cpp:175
>> +    , m_injectUserScriptsInInitialEmptyDocument(false)
> 
> SInce this is a verb phrase, I think you should add the word “should” to its name.

Yea I should have known better. In my defense, I consciously rejected doing this in preference of being consistent with the other boolean members of this class.
Comment 27 Andy Estes 2011-04-07 13:23:52 PDT
Committed r83201: <http://trac.webkit.org/changeset/83201>