Bug 10313

Summary: xsl:import and document() don't work in stylesheets loaded via XMLHttpRequest
Product: WebKit Reporter: Lamar Goddard <lamargoddard>
Component: XMLAssignee: Nobody <webkit-unassigned>
Status: RESOLVED DUPLICATE    
Severity: Critical CC: abarth, aca, andreas.wictor, ap, arnaud.diederen, beidson, cafulnecky, darin, ddkilzer, don, jayvdb, jesse, liamjfoy, martin, mjs, mmitar, mundi, nbnet, srab8378, webkit.review.bot
Priority: P2 Keywords: InRadar, ReviewedForRadar
Version: 420+   
Hardware: Macintosh   
OS: OS X 10.4   
Attachments:
Description Flags
Zip archive of 2 xsl, 1 xml and 1 xhtml test files
none
Crash log (r15816)
none
partial fix
none
work in progress
none
partial fix 2
none
Fix so that frameless documents from XMLHttpRequest can load subresources
none
New proposed patch
abarth: review-
work in progress
none
New proposed patch
none
proposed patch
darin: review-
New proposed patch
darin: review-
New proposed patch
ap: review-
New patch
none
proposed patch
none
New proposed patch
none
New proposed patch abarth: review-

Description Lamar Goddard 2006-08-08 15:55:05 PDT
In my testing of the javascript XSLTProcessor object, attempts to load a page that does an xsl transform with an imported stylesheet result in a browser crash.


Steps to reproduce:
Download attached ajax_xsl.zip file and unzip
Open page.xhtml file in WebKit.

Actual Results:
WebKit crashes

Expected Results:
Contents of page to be displayed as "Works!".

Build & platform:
Tested with r15815 on 10.4.7.

Additional info:
Works as expected with FireFox 1.5.0.4 on 10.4.7
Comment 1 Lamar Goddard 2006-08-08 15:57:05 PDT
Created attachment 9949 [details]
Zip archive of 2 xsl, 1 xml and 1 xhtml test files
Comment 2 David Kilzer (:ddkilzer) 2006-08-08 16:04:04 PDT
Confirmed.  Reproducible crashes are P1.
Comment 3 David Kilzer (:ddkilzer) 2006-08-08 16:06:53 PDT
Created attachment 9950 [details]
Crash log (r15816)

Crash log from a locally-built debug build of WebKit r15816 on Mac OS X 10.4.7 (8J135/PowerPC) with Safari 2.0.4 (419.3).
Comment 4 Stephanie Lewis 2006-11-06 20:31:21 PST
radar 4823079
Comment 5 Jesse Costello-Good 2006-11-28 10:15:49 PST
This is currently the most significant blocking issue related to our progress on the Safari port of GI. Any progress would be greatly appreciated. 
Comment 6 Alexey Proskuryakov 2006-12-24 14:24:49 PST
Created attachment 12009 [details]
partial fix

This fixes the crash and makes libxslt look for the imported stylesheet at a correct URL. The test still doesn't pass, because XSLTProcessor doesn't know how to load stylesheets on its own, relying on the document loader to pre-load them. However, subresources are not supposed to be loaded when parsing a XMLHttpRequest response.

I thought that these fixes are still worth being landed on their own (stylesheet loading is trickier, and might take longer to implement).
Comment 7 Maciej Stachowiak 2006-12-25 08:32:34 PST
Comment on attachment 12009 [details]
partial fix

r=me
should also add a test case for the crash.
Comment 8 Alexey Proskuryakov 2006-12-25 10:52:42 PST
Committed revision 18413.

> should also add a test case for the crash.

When the bug is fully resolved, the attached test will cover these issues, as well. I intend to keep working on it.
Comment 9 Alexey Proskuryakov 2006-12-28 10:23:34 PST
Comment on attachment 12009 [details]
partial fix

Clearing the review flag from a landed patch.
Comment 10 Alexey Proskuryakov 2006-12-30 02:05:58 PST
Hm, I'm getting the crash again. I guess I broke that part of the fix with some last-minute change :-/
Comment 11 Alexey Proskuryakov 2006-12-31 05:09:56 PST
Created attachment 12131 [details]
work in progress

This fixes the crash for real, makes a correct absolute URL to be generated in one more place, and lets XSLImportRule know when it should load synchronously. 

A remaining architectural problem is that frameless documents such as those loaded by XMLHttpRequest can never load subresources in WebKit. Perhaps these documents should inherit FrameLoader from the document that created them or to get one of their own, I'm not sure.
Comment 12 Alexey Proskuryakov 2007-01-02 02:18:23 PST
Created attachment 12158 [details]
partial fix 2

This fixes the crash for real (by removing some XSLTProcessor logic that seems unnecessary), passes an URL in one more place and adds a test case. Please note that running the test under a debug build produces console output:

fast/xsl .....compilation error: file file:///Users/ap/WebKit/LayoutTests/fast/xsl/resources/xhr-doc.xsl line 3 element import
xsl:import : unable to load file:///Users/ap/WebKit/LayoutTests/fast/xsl/resources/xhr-doc-imported.xsl
......................
Comment 13 Jesse Costello-Good 2007-01-05 13:06:21 PST
Just tested this in 18614 and I'm still getting a crash. 

Date/Time:      2007-01-05 12:57:57.978 -0800
OS Version:     10.4.8 (Build 8L2127)
Report Version: 4

Command: Safari
Path:    /Applications/Safari.app/Contents/MacOS/Safari
Parent:  WindowServer [56]

Version: ??? (18614)

PID:    27834
Thread: 0

Exception:  EXC_BAD_ACCESS (0x0001)
Codes:      KERN_PROTECTION_FAILURE (0x0002) at 0x00000038

Thread 0 Crashed:
0   com.apple.WebCore              	0x0103d944 WebCore::XSLStyleSheet::document() + 6
1   com.apple.WebCore              	0x0103db2d WebCore::XSLStyleSheet::locateStylesheetSubResource(_xmlDoc*, unsigned char const*) + 25
2   com.apple.WebCore              	0x0103f509 WebCore::docLoaderFunc(unsigned char const*, _xmlDict*, int, void*, xsltLoadType) + 1263
3   libxslt.1.dylib                	0x95758f81 xsltParseStylesheetImport + 350
4   libxslt.1.dylib                	0x9574837e xsltParseStylesheetProcess + 915
5   libxslt.1.dylib                	0x9574928c xsltParseStylesheetImportedDoc + 493
6   libxslt.1.dylib                	0x957492fb xsltParseStylesheetDoc + 26
7   com.apple.WebCore              	0x0103d98e WebCore::XSLStyleSheet::compileStyleSheet() + 48
8   com.apple.WebCore              	0x0103e34c WebCore::XSLTProcessor::transformToString(WebCore::Node*, WebCore::DeprecatedString&, WebCore::DeprecatedString&, WebCore::DeprecatedString&) + 186
9   com.apple.WebCore              	0x0103fe48 WebCore::XSLTProcessor::transformToDocument(WebCore::Node*) + 104
10  com.apple.WebCore              	0x0122257a KJS::XSLTProcessorProtoFunc::callAsFunction(KJS::ExecState*, KJS::JSObject*, KJS::List const&) + 1648
11  com.apple.JavaScriptCore       	0x0013b934 KJS::JSObject::call(KJS::ExecState*, KJS::JSObject*, KJS::List const&) + 112
12  com.apple.JavaScriptCore       	0x0012f919 KJS::FunctionCallDotNode::evaluate(KJS::ExecState*) + 663
13  com.apple.JavaScriptCore       	0x0012dae9 KJS::VarDeclNode::evaluate(KJS::ExecState*) + 55
14  com.apple.JavaScriptCore       	0x0012da5a KJS::VarDeclListNode::evaluate(KJS::ExecState*) + 42
15  com.apple.JavaScriptCore       	0x00133a3b KJS::VarStatementNode::execute(KJS::ExecState*) + 117
16  com.apple.JavaScriptCore       	0x0013695d KJS::SourceElementsNode::execute(KJS::ExecState*) + 421
17  com.apple.JavaScriptCore       	0x00133b0d KJS::BlockNode::execute(KJS::ExecState*) + 67
18  com.apple.JavaScriptCore       	0x00122427 KJS::DeclaredFunctionImp::execute(KJS::ExecState*) + 45
19  com.apple.JavaScriptCore       	0x00121ef0 KJS::FunctionImp::callAsFunction(KJS::ExecState*, KJS::JSObject*, KJS::List const&) + 338
20  com.apple.JavaScriptCore       	0x0013b934 KJS::JSObject::call(KJS::ExecState*, KJS::JSObject*, KJS::List const&) + 112
21  com.apple.JavaScriptCore       	0x0012f919 KJS::FunctionCallDotNode::evaluate(KJS::ExecState*) + 663
22  com.apple.JavaScriptCore       	0x0013571f KJS::ReturnNode::execute(KJS::ExecState*) + 189
23  com.apple.JavaScriptCore       	0x0013685b KJS::SourceElementsNode::execute(KJS::ExecState*) + 163
24  com.apple.JavaScriptCore       	0x00133b0d KJS::BlockNode::execute(KJS::ExecState*) + 67
25  com.apple.JavaScriptCore       	0x00136545 KJS::TryNode::execute(KJS::ExecState*) + 123
26  com.apple.JavaScriptCore       	0x0013695d KJS::SourceElementsNode::execute(KJS::ExecState*) + 421
27  com.apple.JavaScriptCore       	0x00133b0d KJS::BlockNode::execute(KJS::ExecState*) + 67
28  com.apple.JavaScriptCore       	0x00122427 KJS::DeclaredFunctionImp::execute(KJS::ExecState*) + 45
29  com.apple.JavaScriptCore       	0x00121ef0 KJS::FunctionImp::callAsFunction(KJS::ExecState*, KJS::JSObject*, KJS::List const&) + 338
30  com.apple.JavaScriptCore       	0x0013b934 KJS::JSObject::call(KJS::ExecState*, KJS::JSObject*, KJS::List const&) + 112
31  com.apple.JavaScriptCore       	0x0012f919 KJS::FunctionCallDotNode::evaluate(KJS::ExecState*) + 663
32  com.apple.JavaScriptCore       	0x0012dae9 KJS::VarDeclNode::evaluate(KJS::ExecState*) + 55
33  com.apple.JavaScriptCore       	0x0012da5a KJS::VarDeclListNode::evaluate(KJS::ExecState*) + 42
34  com.apple.JavaScriptCore       	0x00133a3b KJS::VarStatementNode::execute(KJS::ExecState*) + 117
35  com.apple.JavaScriptCore       	0x0013695d KJS::SourceElementsNode::execute(KJS::ExecState*) + 421
36  com.apple.JavaScriptCore       	0x00133b0d KJS::BlockNode::execute(KJS::ExecState*) + 67
37  com.apple.JavaScriptCore       	0x00133d9a KJS::IfNode::execute(KJS::ExecState*) + 270
38  com.apple.JavaScriptCore       	0x0013695d KJS::SourceElementsNode::execute(KJS::ExecState*) + 421
39  com.apple.JavaScriptCore       	0x00133b0d KJS::BlockNode::execute(KJS::ExecState*) + 67
40  com.apple.JavaScriptCore       	0x00122427 KJS::DeclaredFunctionImp::execute(KJS::ExecState*) + 45
41  com.apple.JavaScriptCore       	0x00121ef0 KJS::FunctionImp::callAsFunction(KJS::ExecState*, KJS::JSObject*, KJS::List const&) + 338
42  com.apple.JavaScriptCore       	0x0013b934 KJS::JSObject::call(KJS::ExecState*, KJS::JSObject*, KJS::List const&) + 112
43  com.apple.JavaScriptCore       	0x0012f919 KJS::FunctionCallDotNode::evaluate(KJS::ExecState*) + 663
44  com.apple.JavaScriptCore       	0x0012dae9 KJS::VarDeclNode::evaluate(KJS::ExecState*) + 55
45  com.apple.JavaScriptCore       	0x0012da5a KJS::VarDeclListNode::evaluate(KJS::ExecState*) + 42
46  com.apple.JavaScriptCore       	0x00133a3b KJS::VarStatementNode::execute(KJS::ExecState*) + 117
47  com.apple.JavaScriptCore       	0x0013695d KJS::SourceElementsNode::execute(KJS::ExecState*) + 421
48  com.apple.JavaScriptCore       	0x00133b0d KJS::BlockNode::execute(KJS::ExecState*) + 67
49  com.apple.JavaScriptCore       	0x00122427 KJS::DeclaredFunctionImp::execute(KJS::ExecState*) + 45
50  com.apple.JavaScriptCore       	0x00121ef0 KJS::FunctionImp::callAsFunction(KJS::ExecState*, KJS::JSObject*, KJS::List const&) + 338
51  com.apple.JavaScriptCore       	0x0013b934 KJS::JSObject::call(KJS::ExecState*, KJS::JSObject*, KJS::List const&) + 112
52  com.apple.JavaScriptCore       	0x0012ffd0 KJS::FunctionCallResolveNode::evaluate(KJS::ExecState*) + 662
53  com.apple.JavaScriptCore       	0x00133bef KJS::ExprStatementNode::execute(KJS::ExecState*) + 117
54  com.apple.JavaScriptCore       	0x0013685b KJS::SourceElementsNode::execute(KJS::ExecState*) + 163
55  com.apple.JavaScriptCore       	0x00133b0d KJS::BlockNode::execute(KJS::ExecState*) + 67
56  com.apple.JavaScriptCore       	0x00133d9a KJS::IfNode::execute(KJS::ExecState*) + 270
57  com.apple.JavaScriptCore       	0x0013685b KJS::SourceElementsNode::execute(KJS::ExecState*) + 163
58  com.apple.JavaScriptCore       	0x00133b0d KJS::BlockNode::execute(KJS::ExecState*) + 67
59  com.apple.JavaScriptCore       	0x00122427 KJS::DeclaredFunctionImp::execute(KJS::ExecState*) + 45
60  com.apple.JavaScriptCore       	0x00121ef0 KJS::FunctionImp::callAsFunction(KJS::ExecState*, KJS::JSObject*, KJS::List const&) + 338
61  com.apple.JavaScriptCore       	0x0013b934 KJS::JSObject::call(KJS::ExecState*, KJS::JSObject*, KJS::List const&) + 112
62  com.apple.JavaScriptCore       	0x0012f919 KJS::FunctionCallDotNode::evaluate(KJS::ExecState*) + 663
63  com.apple.JavaScriptCore       	0x00133bef KJS::ExprStatementNode::execute(KJS::ExecState*) + 117
64  com.apple.JavaScriptCore       	0x0013685b KJS::SourceElementsNode::execute(KJS::ExecState*) + 163
65  com.apple.JavaScriptCore       	0x00133b0d KJS::BlockNode::execute(KJS::ExecState*) + 67
66  com.apple.JavaScriptCore       	0x00122ceb KJS::GlobalFuncImp::callAsFunction(KJS::ExecState*, KJS::JSObject*, KJS::List const&) + 641
67  com.apple.JavaScriptCore       	0x0013b934 KJS::JSObject::call(KJS::ExecState*, KJS::JSObject*, KJS::List const&) + 112
68  com.apple.JavaScriptCore       	0x0012ffd0 KJS::FunctionCallResolveNode::evaluate(KJS::ExecState*) + 662
69  com.apple.JavaScriptCore       	0x00133bef KJS::ExprStatementNode::execute(KJS::ExecState*) + 117
70  com.apple.JavaScriptCore       	0x0013695d KJS::SourceElementsNode::execute(KJS::ExecState*) + 421
71  com.apple.JavaScriptCore       	0x00133b0d KJS::BlockNode::execute(KJS::ExecState*) + 67
72  com.apple.JavaScriptCore       	0x00136545 KJS::TryNode::execute(KJS::ExecState*) + 123
73  com.apple.JavaScriptCore       	0x0013695d KJS::SourceElementsNode::execute(KJS::ExecState*) + 421
74  com.apple.JavaScriptCore       	0x00133b0d KJS::BlockNode::execute(KJS::ExecState*) + 67
75  com.apple.JavaScriptCore       	0x00122427 KJS::DeclaredFunctionImp::execute(KJS::ExecState*) + 45
76  com.apple.JavaScriptCore       	0x00121ef0 KJS::FunctionImp::callAsFunction(KJS::ExecState*, KJS::JSObject*, KJS::List const&) + 338
77  com.apple.JavaScriptCore       	0x0013b934 KJS::JSObject::call(KJS::ExecState*, KJS::JSObject*, KJS::List const&) + 112
78  com.apple.JavaScriptCore       	0x0012f919 KJS::FunctionCallDotNode::evaluate(KJS::ExecState*) + 663
79  com.apple.JavaScriptCore       	0x00133bef KJS::ExprStatementNode::execute(KJS::ExecState*) + 117
80  com.apple.JavaScriptCore       	0x0013695d KJS::SourceElementsNode::execute(KJS::ExecState*) + 421
81  com.apple.JavaScriptCore       	0x00133b0d KJS::BlockNode::execute(KJS::ExecState*) + 67
82  com.apple.JavaScriptCore       	0x00122427 KJS::DeclaredFunctionImp::execute(KJS::ExecState*) + 45
83  com.apple.JavaScriptCore       	0x00121ef0 KJS::FunctionImp::callAsFunction(KJS::ExecState*, KJS::JSObject*, KJS::List const&) + 338
84  com.apple.JavaScriptCore       	0x0013b934 KJS::JSObject::call(KJS::ExecState*, KJS::JSObject*, KJS::List const&) + 112
85  com.apple.JavaScriptCore       	0x0012f919 KJS::FunctionCallDotNode::evaluate(KJS::ExecState*) + 663
86  com.apple.JavaScriptCore       	0x00133bef KJS::ExprStatementNode::execute(KJS::ExecState*) + 117
87  com.apple.JavaScriptCore       	0x0013685b KJS::SourceElementsNode::execute(KJS::ExecState*) + 163
88  com.apple.JavaScriptCore       	0x00133b0d KJS::BlockNode::execute(KJS::ExecState*) + 67
89  com.apple.JavaScriptCore       	0x0012582b KJS::Interpreter::evaluate(KJS::UString const&, int, KJS::UChar const*, int, KJS::JSValue*) + 1067
90  com.apple.WebCore              	0x0123bce2 WebCore::KJSProxy::evaluate(WebCore::String const&, int, WebCore::String const&, WebCore::Node*) + 184
91  com.apple.WebCore              	0x0137f83d WebCore::FrameLoader::executeScript(WebCore::String const&, int, WebCore::Node*, WebCore::String const&) + 75
92  com.apple.WebCore              	0x0137f8c1 WebCore::FrameLoader::executeScript(WebCore::Node*, WebCore::String const&, bool) + 65
93  com.apple.WebCore              	0x012414b6 KJS::ScheduledAction::execute(KJS::Window*) + 644
94  com.apple.WebCore              	0x0124170a KJS::Window::timerFired(KJS::DOMWindowTimer*) + 336
95  com.apple.WebCore              	0x012417b5 KJS::DOMWindowTimer::fired() + 39
96  com.apple.WebCore              	0x011e43c8 WebCore::TimerBase::fireTimers(double, WTF::Vector<WebCore::TimerBase*, (unsigned long)0> const&) + 74
97  com.apple.WebCore              	0x011e44a4 WebCore::TimerBase::sharedTimerFired() + 144
98  com.apple.CoreFoundation       	0x90829bc9 CFRunLoopRunSpecific + 3341
99  com.apple.CoreFoundation       	0x90828eb5 CFRunLoopRunInMode + 61
100 com.apple.HIToolbox            	0x92dcdb90 RunCurrentEventLoopInMode + 285
101 com.apple.HIToolbox            	0x92dcd297 ReceiveNextEventCommon + 385
102 com.apple.HIToolbox            	0x92dcd0ee BlockUntilNextEventMatchingListInMode + 81
103 com.apple.AppKit               	0x9326f465 _DPSNextEvent + 572
104 com.apple.AppKit               	0x9326f056 -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] + 137
105 com.apple.Safari               	0x00006f96 0x1000 + 24470
106 com.apple.AppKit               	0x93268ddb -[NSApplication run] + 512
107 com.apple.AppKit               	0x9325cd2f NSApplicationMain + 573
108 com.apple.Safari               	0x0005f7de 0x1000 + 387038
109 com.apple.Safari               	0x0005f6f9 0x1000 + 386809
Comment 14 Alexey Proskuryakov 2007-01-06 01:55:22 PST
(In reply to comment #13)
> Just tested this in 18614 and I'm still getting a crash. 

Correct - patch #2 hasn't been landed (or reviewed) yet.
Comment 15 Darin Adler 2007-01-06 18:05:47 PST
Comment on attachment 12158 [details]
partial fix 2

r=me
Comment 16 Alexey Proskuryakov 2007-01-07 03:34:53 PST
Comment on attachment 12158 [details]
partial fix 2

Committed revision 18644; clearing review flag.
Comment 17 Alexey Proskuryakov 2007-01-07 03:36:26 PST
Reassigning to webkit-unassigned while waiting for the architectural changes need to resolve the remaining problem.
Comment 18 Maciej Stachowiak 2007-02-16 20:45:44 PST
What are the needed architectural changes?
Comment 19 Alexey Proskuryakov 2007-02-16 22:21:39 PST
From comment 11:

A remaining architectural problem is that frameless documents such as those
loaded by XMLHttpRequest can never load subresources in WebKit. Perhaps these
documents should inherit FrameLoader from the document that created them or to
get one of their own, I'm not sure.

This also makes XMLHttpRequests unable to work from onunload handlers (YouTube tries to do that to track when the user leaves a page).
Comment 20 Martin Marko 2009-01-14 02:41:04 PST
Sorry for question: 
is there any other way to load stylesheet for XSLTProcessor than 
using XMLHttpRequest ? The one that will work correctly in JavaScript and will support xsl:import,xsl:include ?

Comment 21 Alexey Proskuryakov 2009-01-14 04:05:58 PST
In order for import, include and document() to work, the document currently has be in a frame, so I don't think there is a workaround for XSLTProcessor.
Comment 22 Alexey Proskuryakov 2009-05-17 11:51:29 PDT
*** Bug 25839 has been marked as a duplicate of this bug. ***
Comment 23 Charles Fulnecky 2010-04-15 14:09:07 PDT
Status update:

Windows XP:
Safari  (4.0.5)        - Fails Test
Chrome  (4.1.249.1045) - Fails Test
IE8     (with Sarissa) - Fails Test     
Firefox (3.6.3)        - Passes Test
Opera   (10.5.1)       - Passes Test
Comment 24 Andreas Wictor 2010-05-05 06:11:20 PDT
Created attachment 55115 [details]
Fix so that frameless documents from XMLHttpRequest can load subresources

I have introduced a originFrame pointer on documents which is set by XMLHttpRequest so it can be used when there is no frame pointer. The existing frame pointer was used in too many places so it was safer to introduce a new concept instead. Frames also saves a pointer to each subresource so that the originFrame pointer can be reset when frames are destroyed. It was also necessary to pass in a ownerDocument to Document::open so that we can set SecurityOrigin and change so that xsl imports are loaded synchroniously.
Comment 25 Alexey Proskuryakov 2010-05-05 09:05:01 PDT
It would be great to have this problem finally addressed, thanks for looking into it!

One consequence of keeping a reference to originFrame is that loading from frameless documents breaks in two cases:
1) originFrame is navigated to a different origin, so we can no longer use its FrameLoader;
2) originFrame is destroyed.

Is this what happens in other browsers, too?

CC'ing some people who may have more insight into consequences of sharing originFrame in this way.

We should really be making an origin frame reference for documents created via document.implementation.createDocument() or DOMParser.parseFromString(), too. XMLHttpRequest is an important use case, but there is nothing unique about it.

The changes to XSLImportRule::loadSheet() look as if asynchronous loading was turned into synchronous, which would be a bad thing. But maybe it's only the looks, I don't remember how this function really works. A ChangeLog with a per-function explanation of changes would help a lot.

Besides a ChangeLog, a final patch would need a lot of test cases to go in - both for things that now work, and for known problems that are prevented by clever design. Please see <http://webkit.org/coding/contributing.html> for all the details.
Comment 26 Andreas Wictor 2010-05-07 01:11:27 PDT
(In reply to comment #25)
> The changes to XSLImportRule::loadSheet() look as if asynchronous loading was
> turned into synchronous, which would be a bad thing. But maybe it's only the
> looks, I don't remember how this function really works. A ChangeLog with a
> per-function explanation of changes would help a lot.

Yes, the original code was asynchronous but I could not get it to work. The asynchronous load would always complete after the failed transformation unless it was already cached.

Is there some way to suspend the transformation until all imported documents are loaded? And is it really desirable, doesn't the Javascript programmer expect the transform functions to be synchronous?
Comment 27 Alexey Proskuryakov 2010-05-07 08:41:02 PDT
Yes, transforms invoked via JavaScript API have to be synchronous. But we need to make sure that behavior for transformed documents loaded into a frame doesn't get worse - I think these functions are used for both.
Comment 28 Andreas Wictor 2010-05-21 07:46:20 PDT
Created attachment 56707 [details]
New proposed patch
Comment 29 WebKit Review Bot 2010-05-21 07:51:32 PDT
Attachment 56707 [details] did not pass style-queue:

Failed to run "['WebKitTools/Scripts/check-webkit-style', '--no-squash']" exit_code: 1
WebCore/ChangeLog:63:  Line contains tab character.  [whitespace/tab] [5]
WebCore/ChangeLog:65:  Line contains tab character.  [whitespace/tab] [5]
WebCore/dom/Document.h:465:  Use 0 instead of NULL.  [readability/null] [4]
Total errors found: 3 in 37 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 30 WebKit Review Bot 2010-05-22 06:39:32 PDT
Attachment 56707 [details] did not build on win:
Build output: http://webkit-commit-queue.appspot.com/results/2381029
Comment 31 Adam Barth 2010-05-22 08:17:56 PDT
Comment on attachment 56707 [details]
New proposed patch

WebCore/bindings/js/JSDOMImplementationCustom.cpp:30
 +  JSValue JSDOMImplementation::createDocument(ExecState* exec, const ArgList& args)
Please don't create more custom bindings.  Instead, you should use the [CallWith=ScriptExecutionContext] IDL attribute.

WebCore/dom/Document.cpp:1285
 +              ScriptExecutionContext::setSecurityOrigin(doc->securityOrigin());
I don't understand why we do this.
Comment 32 Andreas Wictor 2010-05-24 04:57:20 PDT
(In reply to comment #31)
> (From update of attachment 56707 [details])
> WebCore/bindings/js/JSDOMImplementationCustom.cpp:30
>  +  JSValue JSDOMImplementation::createDocument(ExecState* exec, const ArgList& args)
> Please don't create more custom bindings.  Instead, you should use the [CallWith=ScriptExecutionContext] IDL attribute.

I did not know about this IDL attribute, will look into it.

> WebCore/dom/Document.cpp:1285
>  +              ScriptExecutionContext::setSecurityOrigin(doc->securityOrigin());
> I don't understand why we do this.

If we do not copy the securityOrigin from the originFrame's document then it will be empty and that means calls to SecurityOrigin::canRequest(const KURL&) will return false.
Comment 33 Alexey Proskuryakov 2010-05-24 10:21:49 PDT
> If we do not copy the securityOrigin from the originFrame's document then
> it will be empty and that means calls to SecurityOrigin::canRequest(const KURL&)
> will return false.

Do any (all?) of the tests you added fail without this line?
Comment 34 Andreas Wictor 2010-05-25 00:41:16 PDT
(In reply to comment #33)
> Do any (all?) of the tests you added fail without this line?

They will all fail without this line.
Comment 35 Adam Barth 2010-05-25 00:46:40 PDT
> > WebCore/dom/Document.cpp:1285
> >  +              ScriptExecutionContext::setSecurityOrigin(doc->securityOrigin());
> > I don't understand why we do this.
> 
> If we do not copy the securityOrigin from the originFrame's document then it will be empty and that means calls to SecurityOrigin::canRequest(const KURL&) will return false.

I can't be right to unconditionally copy the security origin (and the cookie origin) to the new document.  The document should be able to make its own origin from its own URL.
Comment 36 Andreas Wictor 2010-05-25 01:25:18 PDT
(In reply to comment #35)
> The document should be able to make its own origin from its own URL.

Yes that could be possible for documents from XMLHttpRequest but not for documents from DOMParser.parseFromString() or document.implementation.createDocument().
Comment 37 Alexey Proskuryakov 2010-05-25 08:31:08 PDT
What origin do documents retrieved via XHR get in Firefox? There should really be a test for this added (we usually test cross-origin issues on 127.0.0.1:8000 vs. localhost:8000).
Comment 38 Alexey Proskuryakov 2010-05-25 08:40:43 PDT
+        (*it)->setOriginFrame(0);

With this patch, origin frame is reset when frame is destroyed, but I don't see what takes care of frame navigation. Please test the case where a subframe creates a document with DOMParser, hands it off to parent frame, and navigates to a different domain.
Comment 39 Andreas Wictor 2010-05-26 10:15:25 PDT
Created attachment 57113 [details]
work in progress

Change the code to use the [CallWith=ScriptExecutionContext] IDL attribute and added two tests, one for importing stylesheets cross origin and one for frame navigation.
Comment 40 Andreas Wictor 2010-05-26 10:28:06 PDT
(In reply to comment #37)
> What origin do documents retrieved via XHR get in Firefox? There should really be a test for this added (we usually test cross-origin issues on 127.0.0.1:8000 vs. localhost:8000).

Is the cross-origin test from my latest patch what you're looking for? I get the same behavior with this patch as I get in Firefox.
Comment 41 Andreas Wictor 2010-06-04 07:15:16 PDT
Created attachment 57871 [details]
New proposed patch
Comment 42 Alexey Proskuryakov 2010-06-04 11:00:39 PDT
> Is the cross-origin test from my latest patch what you're looking for?

Sorry, I missed the question originally. I was looking for two tests - one for blocked cross-origin load (that you added), and another for permitted same origin load in the case of cross-origin XMLHttpRequest.

The transform-xhr-doc-cross-origin test you added has console output in expected results, which I don't think is the right way to report success. Also, explanatory text makes it sound like a semi-failure - isn't it expected and correct that transformation fails?

+CONSOLE MESSAGE: line 20: TypeError: Result of expression 'doc' [undefined] is not an object.
+Test for bug 10313: xsl:import doesn't work in stylesheets loaded via XMLHttpRequest.
+
+It's nice that this hasn't crashed, but the XSL transformation has failed.
Comment 43 Andreas Wictor 2010-06-08 05:12:24 PDT
Created attachment 58131 [details]
proposed patch

A new test added and some changes to explanatory texts in tests.
Comment 44 Darin Adler 2010-06-12 18:47:01 PDT
Comment on attachment 58131 [details]
proposed patch

Thanks for taking this on.

I'm pretty sure this would work out better if it was a document that had subdocuments rather than a frame that has subdocuments. Similarly, the owner should be a document rather than a frame.

> -    return CREATE_DOM_OBJECT_WRAPPER(exec, jsConstructor->globalObject(), XSLTProcessor, XSLTProcessor::create().get());
> +    RefPtr<XSLTProcessor> xsltProcessor = XSLTProcessor::create(context);
> +    return CREATE_DOM_OBJECT_WRAPPER(exec, jsConstructor->globalObject(), XSLTProcessor, xsltProcessor.get());

Breaking this up into two lines doesn't seem like a big improvement. You could have just added the "context" to the existing line and I think it would have read more clearly.

As long as you are adding a local variable, I would suggest just calling it processor; no need to give it the more awkward name xsltProcessor.

> +    if (m_scriptExecutionContext && m_scriptExecutionContext->isDocument()) {
> +        Document* originDoc = static_cast<Document*>(m_scriptExecutionContext);
> +        if (originDoc->frame())
> +            originDoc->frame()->registerSubDocument(doc.get());
> +    }

Please don't abbreviate "document" as "doc" in new code.

> -    DOMImplementation() { }
> +    DOMImplementation(ScriptExecutionContext* context)
> +        : m_scriptExecutionContext(context) { }

Once we reach the point where the definition is not all on one line, we use a different brace style. It would be like this:

    DOMImplementation(ScriptExecutionContext* context) : m_scriptExecutionContext(context) { }

Or this:

    DOMImplementation(ScriptExecutionContext* context)
        : m_scriptExecutionContext(context)
    {
    }

> +    ScriptExecutionContext* m_scriptExecutionContext;

I don't understand why it's safe for a DOMImplementation object to hold a raw pointer to a ScriptExecutionContext. What guarantees that the context won't be deallocated? I think this needs to be a RefPtr.

> +void Document::setOriginFrame(Frame* frame, const KURL& url)

It's entirely unclear what the URL is here. I know you understand it, but you need to leave something behind to make it clear what the URL is.

> +{
> +    m_originFrame = frame;
> +    if (m_originFrame) {
> +        if (!url.isEmpty()) {
> +            m_cookieURL = url;
> +            ScriptExecutionContext::setSecurityOrigin(SecurityOrigin::create(url));
> +        } else if (Document* doc = m_originFrame->document()) {

Please don't use the abbreviation "doc" for "document" in new code.

Why is it OK to leave m_cookieURL set to the old value when the origin frame is set to 0? Why is it OK to leave the security origin when the frame is set to 0. It seems to me that the function setting the frame to 0 is really a different one than the one setting the frame. All the call sites seem to be "always setting the frame" and "always disconnecting the frame" so I suggest having two separate functions for this.

> +    Frame* originFrame() const { return m_originFrame; } // can be NULL
> +    void setOriginFrame(Frame* frame, const KURL& url = KURL());

I find the name "origin frame" to be confusing. Is there some term used for this in the HTML specification? Also, can this be a document instead of a frame? The same frame is reused for successive documents, and so having a pointer to a frame is more dangerous than a pointer to a document, because the document is not reused.

The comment saying this can be null is not that useful. A more useful comment would be one defining the term "origin frame".

> +void Frame::registerSubDocument(Document* doc, const KURL& url)

Since the word is "subdocument" and not the two words "sub document" it should be registerSubdocument, unregisterSubdocument, and m_subdocuments.

> +    if (doc) {

Why is the check for null needed? Is it legal for someone to call this with a null pointer?

> +void Frame::clearRegisteredSubDocuments()
> +{
> +    if (!m_subDocuments.isEmpty()) {

In WebKit we normally use early return for this sort of thing, rather than nesting the function body inside an if.

> +        for (HashSet<Document*>::iterator it = m_subDocuments.begin();
> +             it != m_subDocuments.end(); ++it)

This for statement should be on one line in the usual WebKit coding style.

> +        void registerSubDocument(Document* doc, const KURL& url = KURL());
> +        void unregisterSubDocument(Document* doc);

The argument names here should be omitted.

> +        // use base url from originFrame's document when parent stylesheet dont have an final url
> +        // this can for example happen when the stylesheet is created with DOMParser.parseFromString

Comments should be in sentence style with capital letter and period at the end. Also worth fixing up the grammar.

> +    if (ownerNode && ownerNode->nodeType() == Node::PROCESSING_INSTRUCTION_NODE)
> +        loadSheetAsync(docLoader, absHref);
> +    else
> +        loadSheetSync(docLoader, absHref);

These new functions don't have good names. For one thing, they don't do the entire job of loading the sheet, since they don't do the other things that loadSheet does, so the names are poor in that sense. Also, we normally use the adverbs "synchronously" and "asynchronously" as you can see in various existing function names, and you should do the same here. Maybe finishLoadingSheetAsynchronously?

It’s also unfortunate that we have copied and pasted code now. Is there a way to share more of the code?

> +    void loadSheetAsync(DocLoader* docLoader, const String& absHref);
> +    void loadSheetSync(DocLoader* docLoader, const String& absHref);

The argument names "docLoader" should be omitted here.

It does look like you added a good number of test cases, which is great.

review- largely because of the m_scriptExecutionContext raw pointer issue
Comment 45 Alexey Proskuryakov 2010-06-16 11:18:36 PDT
*** Bug 40715 has been marked as a duplicate of this bug. ***
Comment 46 Andreas Wictor 2010-06-17 01:19:34 PDT
Created attachment 58970 [details]
New proposed patch

Changes from last patch: Store an "origin document" instead of an "origin frame" on documents plus other changes suggested by Darin Adler
Comment 47 Darin Adler 2010-06-17 07:43:40 PDT
Comment on attachment 58970 [details]
New proposed patch

> Index: WebCore/dom/DOMImplementation.h
> ===================================================================
> --- WebCore/dom/DOMImplementation.h	(revision 61177)
> +++ WebCore/dom/DOMImplementation.h	(working copy)
> @@ -24,6 +24,7 @@
>  #ifndef DOMImplementation_h
>  #define DOMImplementation_h
>  
> +#include <Document.h>
>  #include <wtf/PassRefPtr.h>
>  #include <wtf/RefCounted.h>

Why does DOMImplementation.h now need to include Document.h? I don't see new code that depends on Document, but rather ScriptExecutionContext.h. I think it would make sense to include "ScriptExecutionContext.h" here. Further, the syntax <Document.h> is incorrect, since we are in the same module. The include should be "Document.h". The patch should instead add an include of "ScriptExecutionContext.h" and we should remove the includes of PassRefPtr.h and RefCounted.h since ScriptExecutionContext.h already includes those.

But I'm wondering if it's right to store a RefPtr<ScriptExecutionContext> in DOMImplementation.h anyway. I think maybe we should be storing a RefPtr<Document>. If this is always going to be a Document and never a WorkerContext then perhaps we should do the conversion from ScriptExecutionContext at create time and store the pointer as a .

We could also avoid adding includes here by making the functions non-inline.

At minimum, this needs to be changed to "Document.h", but you could consider these other comments as well.

> -        m_implementation = DOMImplementation::create();
> +        m_implementation = DOMImplementation::create(scriptExecutionContext());

I think you can just pass "this" here. There's no need to explicitly call the scriptExecutionContext() function; a Document is a script execution context.

> +void Document::setOriginDocument(Document* originDocument)

A function that takes ownership of its argument should take a PassRefPtr<Document>, not a Document*.

> +Frame* Document::findFrame() const

Since Document already has a frame function it's important to name this in a way that makes it clear when you would call this instead of frame. I think we need some sort of adjective between the word "find" and the word "frame". This is the kind of thing that can make the class very confusing later if the name does not make it clear. How would you describe the difference between frame and findFrame in words? When would you use findFrame? This should inform a better name for findFrame. Maybe findFrameForSubresourceLoads?

I don't think it's really good to have this be a const member function. It doesn't make much sense to talk about a const Document anyway, since you can get to the frame and back to the document. This is not a simple value-type object where const is a really useful concept. I suggest omitting the const.

> +    const Document* document = this;
> +
> +    while (document) {
> +        if (document->frame())
> +            return document->frame();
> +        document = document->originDocument();
> +    }

I think this would read better as a for loop:

    for (Document* document = this; document; document = document->originDocument()) {
        if (Frame* frame = document->frame())
            return frame;
    }

> -    RefPtr<XSLTProcessor> processor = XSLTProcessor::create();
> +    RefPtr<XSLTProcessor> processor = XSLTProcessor::create(scriptExecutionContext());

Again, please just pass "this".

> +        RefPtr<ScriptExecutionContext> m_scriptExecutionContext;

As above in DOMImplementation, I think it might be better for the pointer in DOMParser to be a Reftr<Document> and convert from script execution context to document at create time instead of doing it at parse time.

ScriptExecutionContext is an abstraction, but we don't want to overdo the abstraction in the code. If we can get to a document, then it's best to do that so the concepts in the code are clearer.

> +#include "Frame.h"
> +#include "Node.h"
> +#include "ResourceError.h"
> +#include "TextResourceDecoder.h"
>  #include "XSLStyleSheet.h"

Are all of these includes needed? Maybe some of them include the others?

> +    XSLStyleSheet* rootXSL = 0;

I think rootXSL is an unclear name for this, because it's not an "XSL". You could name it rootXSLStyleSheet or even rootStyleSheet.

> +    Node* ownerNode = rootXSL->ownerNode();

This dereferences rootXSL without checking if it's non-zero, so will dereference the null pointer in that case. We need a test case covering that code path and the code needs to handle it correctly.

> +        // When parent stylesheet doesn't have a final url, use base url from the

In comments please call it a "URL" not a "url".

> +        // document the root stylesheet originated from. This can happen when the
> +        // stylesheet is created with DOMParser.parseFromString(), for example.
> +        if (Document* originDocument = ownerNode->document()->originDocument())
> +            absHref = KURL(originDocument->baseURL(), m_strHref).string();

Normally to resolve a URL against a document's base we would use the Document::completeURL function rather than using KURL directly. You should do that here unless there's a reason not to.

> +    if (ownerNode && ownerNode->nodeType() == Node::PROCESSING_INSTRUCTION_NODE)
> +        finishLoadingSheetAsynchronously(docLoader, absHref);
> +    else
> +        finishLoadingSheetSynchronously(docLoader, absHref);

This needs a comment. It's entirely unclear why you would do an asynchronous load when the node type is a processing instruction!

> +void XSLImportRule::finishLoadingSheetSynchronously(DocLoader* docLoader, const String& absHref)

Is there any way to share more code with the finishLoadingSheetAsynchronously function? This has an unpleasantly large amount of code.

> +    KURL url(ParsedURLString, absHref);

We already had this in a KURL in the function that calls this function. But then we turned it into a string and back into a KURL. Lets change the argument types so we don't have to convert in this fashion.

> +    if (error.isNull()) {
> +        RefPtr<TextResourceDecoder> decoder = TextResourceDecoder::create("text/xsl");
> +        String encoding = response.textEncodingName();
> +        if (!encoding.isNull())
> +            decoder->setEncoding(encoding, TextResourceDecoder::EncodingFromHTTPHeader);
> +
> +        String sheet(decoder->decode(data.data(), data.size()));
> +        setXSLStyleSheet(absHref, response.url(), sheet);
> +    }

The usual idiom for this sort of thing in WebKit is an early return. So you would write:

    if (error.isNotNull())
        return;

instead of nesting the non-error code inside an if statement.

> +    RefPtr<ScriptExecutionContext> m_scriptExecutionContext;

Same comment for XSLTProcessor as the other classes. I suggest converting to a Document at create time and using a RefPtr<Document> rather than a RefPtr<ScriptExecutionContext>.

review- because of the null dereference getting ownerNode. I am not confident that test cases cover all the code changes here and I'd like you to review the coverage of the tests you added. Does every new piece of code get tested at least once?
Comment 48 Andreas Wictor 2010-06-21 08:32:43 PDT
(In reply to comment #47) 
> > -        m_implementation = DOMImplementation::create();
> > +        m_implementation = DOMImplementation::create(scriptExecutionContext());
> 
> I think you can just pass "this" here. There's no need to explicitly call the scriptExecutionContext() function; a Document is a script execution context.

If you use "this" here you must either const_cast it since Document::implementation is a const function or remove the const declaration from the function. You did not like const on the findFrame function, so i suppose it is preferable to remove the const declaration from the function?

> > +    Node* ownerNode = rootXSL->ownerNode();
> 
> This dereferences rootXSL without checking if it's non-zero, so will dereference the null pointer in that case. We need a test case covering that code path and the code needs to handle it correctly.

I can't think of a case where the root->isXSLStyleSheet() test will fail and leave the pointer at 0, parents to a XSLImportsRule is always a XSLStyleSheet. Is it not better to just ASSERT(root->isXSLStyleSheet()) instead?
Comment 49 Darin Adler 2010-06-21 17:53:16 PDT
(In reply to comment #48)
> (In reply to comment #47) 
> > > -        m_implementation = DOMImplementation::create();
> > > +        m_implementation = DOMImplementation::create(scriptExecutionContext());
> > 
> > I think you can just pass "this" here. There's no need to explicitly call the scriptExecutionContext() function; a Document is a script execution context.
> 
> If you use "this" here you must either const_cast it since Document::implementation is a const function or remove the const declaration from the function. You did not like const on the findFrame function, so i suppose it is preferable to remove the const declaration from the function?

Yes, I think so.

> > > +    Node* ownerNode = rootXSL->ownerNode();
> > 
> > This dereferences rootXSL without checking if it's non-zero, so will dereference the null pointer in that case. We need a test case covering that code path and the code needs to handle it correctly.
> 
> I can't think of a case where the root->isXSLStyleSheet() test will fail and leave the pointer at 0, parents to a XSLImportsRule is always a XSLStyleSheet. Is it not better to just ASSERT(root->isXSLStyleSheet()) instead?

You should assert something that you know is true. “I can’t think of a case” is similar, but not quite the same thing. An assertion is OK, but you are leaving me a little bit worried here!
Comment 50 Andreas Wictor 2010-06-22 04:42:17 PDT
Created attachment 59361 [details]
New proposed patch
Comment 51 Alexey Proskuryakov 2010-06-24 11:26:53 PDT
Comment on attachment 59361 [details]
New proposed patch

+void Document::setOriginDocument(PassRefPtr<Document> originDocument)
+{
+    // This function stores a reference to the document where this document
+    // originated from. This is used for frameless documents, so that we can

I really dislike it that both Document and DOMImplementation get an m_originDocument reference. It doesn't seem that these should ever be different, but it raises all sorts of unpleasant questions about when exactly they are different in practice.

+    // find a frame when we are loading subresources. Also, because frameless
+    // documents don't get a SecurityOrigin and a cookieURL when constructed,
+    // we need to set these here to be able to load subresources.

Please add assertions that there is no SecurityOrigin and cookieURL indeed.

             m_responseXML->setURL(m_url);
+            m_responseXML->setOriginDocument(document());

This looks weird to me. The result of this code is that a script can decide which security origin to use with the result of XMLHttpRequest - req.responseXML will use one, DOMParser.parseFromString(req.responseText, "application/xml") will use another. This sort of freedom looks highly suspicious.

I'm not sure what exactly I'm asking for here - perhaps someone CC'ed on this bug (Adam?) could weigh in.

     if (!parentSheet->finalURL().isNull())
-        // use parent styleheet's URL as the base URL
-        absHref = KURL(parentSheet->finalURL(), m_strHref).string();
-    
+        // Use parent styleheet's URL as the base URL.
+        absHref = KURL(parentSheet->finalURL(), m_strHref);
+    else if (originDocument)
+        // When parent stylesheet doesn't have a final URL, use base URL from the
+        // document the root stylesheet originated from. This can happen when the
+        // stylesheet is created with DOMParser.parseFromString(), for example.
+        absHref = originDocument->completeURL(m_strHref);

Per WebKit coding style, these blocks should be in braces (comments do count).

+    else
+        absHref = KURL(ParsedURLString, m_strHref);

This doesn't seem right - what guarantees that m_strHref is indeed parsed? When do we take this code path?

It seems that this should be ASSERT_NOT_REACHED plus an early return instead.

I'm going to say r- for the above comments, but there is also something I don't understand in how subresources are loaded in general. There is that old ugly code in XSLStyleSheet::loadChildSheets() that iterates over stylesheets to find imports and includes - it it obsoleted by your code? How do they interact?

I'll try to figure this out myself later, but an explanation from you would be appreciated, too.

+    void finishLoadingSheetAsynchronously(DocLoader*, const KURL&);
+    void finishLoadingSheetSynchronously(DocLoader*, const KURL&);

A synchronous load can be started while an asynchronous one is in progress. But we're using global variables for loading, see globalProcessor and globalDocLoader. Is there a race condition here?
Comment 52 Adam Barth 2010-06-24 11:33:40 PDT
I'm concerned that this patch would make 

DOMParser.parseFromString("... untrusted data ...", "application/xml")

an XSS vulnerability.  I don't think that's right.  Do these tests pass in other browsers?
Comment 53 Alexey Proskuryakov 2010-06-25 11:58:48 PDT
Adam, could you please elaborate on this? This doesn't seem any worse than element.innerHTML = "untrusted data" to me.
Comment 54 Adam Barth 2010-06-25 13:08:57 PDT
(In reply to comment #53)
> Adam, could you please elaborate on this? This doesn't seem any worse than element.innerHTML = "untrusted data" to me.

It's an expectation game.  Developers know that innerHTML = "untrusted data" is XSS.  They don't expect DOMParser.parse("untrusted data") to be XSS in the same way JSON.parse shouldn't be XSS either.
Comment 55 Adam Barth 2010-06-25 13:22:28 PDT
[1:18pm] ap: abarth: DOMParser.parse("untrusted data") doesn't execute scripts when parsing AFAIK. or does it?
[1:18pm] abarth: frameless documents can't execute scripts
[1:18pm] abarth: the can't find the script controller
[1:18pm] abarth: which is on Frame
[1:18pm] ap: abarth: why is it XSS in that case?
[1:19pm] abarth: ap: because you're giving the document created from the untrusted bytes your security origin
[1:19pm] abarth: ap: in this case, you'll letting them load subresources with your authority
[1:20pm] abarth: ap: which means the attacker can learn information he's not supposed to know
[1:20pm] abarth: ap: and probably exfiltrate that information via other subresource loads
[1:20pm] ap: abarth: assuming the contributor knows about security as much or less as I do, that's probably worth explaining in a bugzilla comment
Comment 56 Alexey Proskuryakov 2010-06-25 13:45:09 PDT
As we're stamping actual origin on cross-origin XMLHttpResult result, we might need to have special case when overrideMimeType is in effect. Letting someone fiddle with decoding of a document that has a different security origin is wrong.
Comment 57 Adam Barth 2010-06-25 13:48:15 PDT
[1:45pm] abarth: ap: i think the safest thing to is the following:
[1:46pm] abarth: 1) If the document is begin created by XHR and the document is same-origin as the page
[1:46pm] abarth: then it can inherit the page's security origin
[1:46pm] abarth: would that address the primary use case?
Comment 58 Alexey Proskuryakov 2010-06-25 14:31:09 PDT
I never realized it was such a can of worms :(

Even ensuring that we give the document a proper security origin isn't enough - lots of things will go wrong if several documents talk to the same FrameLoader. In particular, Origin and Referer headers will match the document that's in frame, not the one that actually initiated the load.

Maybe limiting ourselves to same origin xhr.responseXML is indeed the way to go for the first iteration. That solves most compatibility problems, and sidesteps most pitfalls. We'll still need to decide what to do with Referer header.

An semi-related thought - we need to test what happens when adding e.g. an <img> element to responseXML document via appendChild(). Will we trigger actual load? Will Firefox do that?
Comment 59 Andreas Wictor 2010-06-28 07:40:25 PDT
(In reply to comment #51)
> I really dislike it that both Document and DOMImplementation get an m_originDocument reference. It doesn't seem that these should ever be different, but it raises all sorts of unpleasant questions about when exactly they are different in practice.

I'm not sure I understand the problem you are describing here.

> I'm going to say r- for the above comments, but there is also something I don't understand in how subresources are loaded in general. There is that old ugly code in XSLStyleSheet::loadChildSheets() that iterates over stylesheets to find imports and includes - it it obsoleted by your code? How do they interact?
> 
> I'll try to figure this out myself later, but an explanation from you would be appreciated, too.

No, my code does not obsolete any existing code in XSLStyleSheet. My understanding of this code is that when we find imports and includes in the loadChildSheets function we create XSLImportRule objects to represent them. These XSLImportRule objects are saved to a list by the append function inherited from StyleList, and then loaded by a call to loadSheet. Later when libxslt needs to get these imported stylesheets we call locateStylesheetSubResource which will iterate through the list of saved XSLImportRule objects and try to find the correct one. My contribution to this process is that I made loads synchronous for transformations not origination from a pi, because otherwise locateStylesheetSubResource will find the correct XSLImportRule object but it will not be fully loaded.

> 
> +    void finishLoadingSheetAsynchronously(DocLoader*, const KURL&);
> +    void finishLoadingSheetSynchronously(DocLoader*, const KURL&);
> 
> A synchronous load can be started while an asynchronous one is in progress. But we're using global variables for loading, see globalProcessor and globalDocLoader. Is there a race condition here?

I have found a way to remove these global variables by using the _private field that exists on most libxml structs. My original plan was to submit a patch after my work on this bug was done, but maybe it should be included here?
Comment 60 Andreas Wictor 2010-06-28 07:41:09 PDT
(In reply to comment #52)
> Do these tests pass in other browsers?

With Firefox 3.6.3 on Ubuntu I get these results.

transform-framed-doc.xhtml: Works, both before and after the frame is navigated.
transform-xhr-doc-cross-origin.xhtml: Works
transform-xhr-doc-from-allowed-cross-origin.xhtml: Fails
transform-docimpl-doc.xhtml: Fails
transform-domparser-doc.xhtml: Works
transform-transformed-doc.xhtml: Fails
transform-xhr-doc.xhtml: Works
Comment 61 Andreas Wictor 2010-06-28 07:48:14 PDT
(In reply to comment #58)
> Maybe limiting ourselves to same origin xhr.responseXML is indeed the way to go for the first iteration. That solves most compatibility problems, and sidesteps most pitfalls. 

So this means that the originDocument idea can be used, but it will only be used for xhr.responseXML to begin with?

> We'll still need to decide what to do with Referer header.

What options are there?
Comment 62 Alexey Proskuryakov 2010-06-28 16:23:14 PDT
> I'm not sure I understand the problem you are describing here.

Document and DOMImplementation are tied to each other, so giving them both an origin document pointer is confusing.

There is no back link from DOMImplementation, so I'd say that it should be getting origin document pointer, but not Document.

> I have found a way to remove these global variables by using the _private 
> field that exists on most libxml structs. My original plan was to submit
> a patch after my work on this bug was done, but maybe it should be included here?

I think that ideally, that would be done in a separate patch before this change.

> With Firefox 3.6.3 on Ubuntu I get these results.

Not matching Firefox is something we wouldn't generally want for the first iteration of this fix, now that we know how sensitive it is.

> So this means that the originDocument idea can be used, but it will only be used
> for xhr.responseXML to begin with?

Yes, let's try the simplest thing that will fix the original test case. And only for same origin case.

As for using the originDocument idea, yes, I don't think it's been shown wrong yet. But let's find out the answer to the below question first.

> > We'll still need to decide what to do with Referer header.
> What options are there?

First question - what does Firefox do? I can imagine several options for the Referer header:
- XMLHttpRequest request URL;
- XMLHttpRequest response URL (after redirects);
- taken from the document that created XMLHttpRequest;
- taken from the document that invoked XMLHttpRequest.responseXML;
- taken from the document that created XSLTProcessor;
- taken from the document that invoked XSLTProcessor.transformToDocument();
- no Referer sent at all.

We should investigate doing the same. If that proves too tricky, maybe it can wait until a follow-up patch.
Comment 63 Adam Barth 2010-06-28 16:26:42 PDT
> > > We'll still need to decide what to do with Referer header.
> > What options are there?
> 
> First question - what does Firefox do? I can imagine several options for the Referer header:
> - XMLHttpRequest request URL;

This one seems unlikely.

> - XMLHttpRequest response URL (after redirects);
> - taken from the document that created XMLHttpRequest;
> - taken from the document that invoked XMLHttpRequest.responseXML;
> - taken from the document that created XSLTProcessor;
> - taken from the document that invoked XSLTProcessor.transformToDocument();
> - no Referer sent at all.

Hopefully we won't have to do this one either.
Comment 64 Andreas Wictor 2010-06-29 06:56:33 PDT
(In reply to comment #62)
> > I have found a way to remove these global variables by using the _private 
> > field that exists on most libxml structs. My original plan was to submit
> > a patch after my work on this bug was done, but maybe it should be included here?
> 
> I think that ideally, that would be done in a separate patch before this change.

I have created bug 41348 and submitted a patch there.
Comment 65 Martin Blom 2010-09-02 05:04:13 PDT
(In reply to comment #62)
> First question - what does Firefox do? I can imagine several options for the Referer header:
> - XMLHttpRequest request URL;
> - XMLHttpRequest response URL (after redirects);
> - taken from the document that created XMLHttpRequest;
> - taken from the document that invoked XMLHttpRequest.responseXML;
> - taken from the document that created XSLTProcessor;
> - taken from the document that invoked XSLTProcessor.transformToDocument();
> - no Referer sent at all.
> 
> We should investigate doing the same. If that proves too tricky, maybe it can wait until a follow-up patch.

If "main-redirect.cgi" redirects to "main.xsl" and "child-1-redirect.cgi" redirects to "child-1.xsl", using XHR to load the XSLT stylesheet main-redirect.cgi, which imports "child-1-redirect.cgi", which in turn includes "child-2.xsl", the following requests are seen in Firefox 3.6.7:

GET /lcs/xslt-referrer/main-redirect.cgi (XHR)
Referer: http://localhost/lcs/xslt-referrer/

GET /lcs/xslt-referrer/main.xsl (XHR)
Referer: http://localhost/lcs/xslt-referrer/

GET /lcs/xslt-referrer/child-1-redirect.cgi (main XSLT)
Referer: http://localhost/lcs/xslt-referrer/main.xsl

GET /lcs/xslt-referrer/child-1.xsl (main XSLT)
Referer: http://localhost/lcs/xslt-referrer/main.xsl

GET /lcs/xslt-referrer/child-2.xsl (child 1 XSLT)
Referer: http://localhost/lcs/xslt-referrer/child-1-redirect.cgi

That is, when main.xsl loads the child, the redirected URI is used as referrer, but when a child loads another child, the URI specified in the stylesheet is used. (Interestingly, the main-loads-child and child-loads-child requests cannot be seen in Firebug.)
Comment 66 Martin Blom 2010-09-02 13:56:28 PDT
(In reply to comment #57)

Adam and Alexey,

Here's how I would assume that things should work:

* A stylesheet loaded via XHR has its "Referer-URI" set to the final
  redirect and a security origin created from that URI.
* A stylesheet loaded by the PI has its Referer-URI set to the final
  redirect and a security origin created from that URI.
* A stylesheet created from document.implementation inherits Referer-URI
  and security origin from document.
* A stylesheet created from DOMParser.parseFromString inherits Referer-URI
  and security origin from the frame's document.
* A stylesheet created from another stylesheet using
  XSLTProcessor.transformToDocument() inherits Referer-URI and security
  origin from the frame's document

When the stylesheet is executed, its security origin is used to check if a
child sheet can be loaded and to create the Origin header. If so, its
"Referer-URI" should be used as the Referer header.

* A stylesheet loaded via xsl:include, xsl:import or the document() XPath
function etc. has its "Referer-URI" set to the final redirect and a security
origin created from that URI.

Are my assumptions correct here?

Also, I don't understand Adam's concerns about DOMParser.parseFromString().
How can parsing an XML document in any way open up for XSS attacks? Sure,
you can take the node and insert it into you DOM or execute it as a
stylesheet, but how is that different from doing eval() on any other
string? Indeed, embedding a small stylesheet in JS code that uses a few
xsl:import tags sounds quite useful to me, and I would expect it to work.
Comment 67 Adam Barth 2010-09-03 10:40:49 PDT
> * A stylesheet created from DOMParser.parseFromString inherits Referer-URI
>   and security origin from the frame's document.
[...]
> Also, I don't understand Adam's concerns about DOMParser.parseFromString().
> How can parsing an XML document in any way open up for XSS attacks?

Notice that in our proposal you imbued the bytes given to parseFromString with the caller's security origin.  If those bytes are untrusted by the caller, you just gave the attacker the honest principal's authority.  That's XSS.

> Sure,
> you can take the node and insert it into you DOM or execute it as a
> stylesheet, but how is that different from doing eval() on any other
> string?

Evaling a string also imbues the string with your authority.  That means folks are careful not to eval untrusted strings.  However, it seems entirely reasonable to parse an untrusted string using DOMParser.

> Indeed, embedding a small stylesheet in JS code that uses a few
> xsl:import tags sounds quite useful to me, and I would expect it to work.

Whether or not it's useful, we need to get the security right first.
Comment 68 Martin Blom 2010-09-06 13:01:20 PDT
(In reply to comment #67)

Adam,

I have written a few tests which are available here:

http://martin.blom.org/tmp/xslt-permissions/

They test transformations using stylesheets created from four sources:

* Created manually using DOMImplementation.createDocument()
* Parsed from a string using DOMParser.parseFromString()
* Generated from another stylesheet using XSLTProcessor.transformToDocument()
* Loaded via XMLHttpRequest

Each type tests xsl:import in three different way:

* Importing a child stylesheet from the same domain
* Importing a child stylesheet from a different domain
* Importing a child stylesheet from the same domain, which in turn imports
  another child stylesheet from a different domain

(The "Access-Control-Allow-Origin" response header is set to "*")

Firefox passes ALL tests. Both MSIE 6 and 8 pass all same-domain tests.
Comment 69 Andreas Wictor 2010-10-06 08:47:44 PDT
Created attachment 69944 [details]
New patch

I have created a new patch that I believe makes WebKit behave in much the same way as Firefox does when importing xsl stylesheets. It is still using an originDocument reference on documents to find a frame for loading sub resources but I have made a couple of other changes since the last patch.

* The XSLImportRule::loadSheet method and the libxslt loader callback in XSLTProcessorLibxslt.cpp is changed to use ThreadableLoader for synchronous loading. This will make sure that the origin header is set for cross origin loads.

* There is a new overload for loadResourceSynchronously on FrameLoader that takes a referrer string as argument.

* In DocumentThreadableLoader use the new loadResourceSynchronously overload on FrameLoader to make sure we send the right referrer when importing stylesheets and other sub resources.

* The [CallWith=ScriptExecutionContext] IDL attribute that was added to get hold of an originDocument is moved from the interfaces to the functions for DOMParser.parseFromString and DOMImplementation.createDocument.

With these changes it will pass all of the the tests in comment #68 just like Firefox.
Comment 70 Darin Adler 2011-01-17 17:08:25 PST
(In reply to comment #69)
> I have created a new patch that I believe makes WebKit behave in much the same way as Firefox does when importing xsl stylesheets.

Why didn’t you set review? on the new patch?
Comment 71 Andreas Wictor 2011-03-02 09:52:32 PST
Created attachment 84430 [details]
proposed patch

Sorry for the long delay but here is a new up to date patch with review? flag set.
Comment 72 WebKit Review Bot 2011-03-02 09:54:08 PST
Attachment 84430 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'LayoutTests/ChangeLog', u'LayoutTests/fast..." exit_code: 1

Source/WebCore/xml/XSLTProcessor.h:51:  The parameter name "frame" adds no information, so it should be removed.  [readability/parameter_name] [5]
LayoutTests/ChangeLog:10:  Line contains tab character.  [whitespace/tab] [5]
LayoutTests/ChangeLog:13:  Line contains tab character.  [whitespace/tab] [5]
LayoutTests/ChangeLog:16:  Line contains tab character.  [whitespace/tab] [5]
LayoutTests/ChangeLog:30:  Line contains tab character.  [whitespace/tab] [5]
LayoutTests/ChangeLog:33:  Line contains tab character.  [whitespace/tab] [5]
Source/WebCore/xml/DOMParser.h:35:  The parameter name "str" adds no information, so it should be removed.  [readability/parameter_name] [5]
Source/WebCore/ChangeLog:49:  Line contains tab character.  [whitespace/tab] [5]
Source/WebCore/ChangeLog:50:  Line contains tab character.  [whitespace/tab] [5]
Total errors found: 9 in 41 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 73 Adam Barth 2011-03-02 22:31:11 PST
Comment on attachment 84430 [details]
proposed patch

View in context: https://bugs.webkit.org/attachment.cgi?id=84430&action=review

> Source/WebCore/loader/DocumentThreadableLoader.cpp:364
> +        KURL referrer(m_document->url());
> +        referrer.setUser(String());
> +        referrer.setPass(String());
> +        referrer.removeFragmentIdentifier();

There's a method on KURL that does this now.

> Source/WebCore/xml/DOMParser.cpp:37
> +        doc->setSecurityOrigin(static_cast<Document*>(scriptExecutionContext)->securityOrigin());
> +        doc->setOriginDocument(static_cast<Document*>(scriptExecutionContext));

This is all very scary.  At a bare minimum, please add an assert that we're on the main thread.

> Source/WebCore/xml/XMLHttpRequest.cpp:254
>              m_responseXML->setSecurityOrigin(document()->securityOrigin());
> +            m_responseXML->setOriginDocument(document());

It seems redundant to have a document and an origin set this way.  We can get the origin from the document.
Comment 74 Andreas Wictor 2011-03-03 10:02:50 PST
Created attachment 84578 [details]
New proposed patch
Comment 75 WebKit Review Bot 2011-03-03 10:05:01 PST
Attachment 84578 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'LayoutTests/ChangeLog', u'LayoutTests/fast..." exit_code: 1

Source/WebCore/xml/XSLTProcessor.h:51:  The parameter name "frame" adds no information, so it should be removed.  [readability/parameter_name] [5]
Source/WebCore/xml/DOMParser.h:35:  The parameter name "str" adds no information, so it should be removed.  [readability/parameter_name] [5]
Total errors found: 2 in 41 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 76 Adam Barth 2011-03-03 13:08:59 PST
> Source/WebCore/xml/XSLTProcessor.h:51:  The parameter name "frame" adds no information, so it should be removed.  [readability/parameter_name] [5]
> Source/WebCore/xml/DOMParser.h:35:  The parameter name "str" adds no information, so it should be removed.  [readability/parameter_name] [5]
> Total errors found: 2 in 41 files

Any reason you didn't fix these style nits?
Comment 77 Andreas Wictor 2011-03-04 00:01:27 PST
(In reply to comment #76)
> > Source/WebCore/xml/XSLTProcessor.h:51:  The parameter name "frame" adds no information, so it should be removed.  [readability/parameter_name] [5]
> > Source/WebCore/xml/DOMParser.h:35:  The parameter name "str" adds no information, so it should be removed.  [readability/parameter_name] [5]
> > Total errors found: 2 in 41 files
> 
> Any reason you didn't fix these style nits?

I was just trying to change as little as possible in this patch.
Comment 78 Adam Barth 2011-03-04 00:31:21 PST
Comment on attachment 84578 [details]
New proposed patch

View in context: https://bugs.webkit.org/attachment.cgi?id=84578&action=review

>> Source/WebCore/xml/DOMParser.h:35
>> -    PassRefPtr<Document> parseFromString(const String&, const String& contentType);
>> +    PassRefPtr<Document> parseFromString(ScriptExecutionContext*, const String& str, const String& contentType);
> 
> The parameter name "str" adds no information, so it should be removed.  [readability/parameter_name] [5]

This "str" you added!

>> Source/WebCore/xml/XSLTProcessor.h:51
>> -    PassRefPtr<Document> createDocumentFromSource(const String& source, const String& sourceEncoding, const String& sourceMIMEType, Node* sourceNode, Frame* frame);
>> +    PassRefPtr<Document> createDocumentFromSource(const String& source, const String& sourceEncoding, const String& sourceMIMEType, Node* sourceNode, Frame* frame, ScriptExecutionContext*);
> 
> The parameter name "frame" adds no information, so it should be removed.  [readability/parameter_name] [5]

This one is definitely worth fixing since you're touching this line and "Frame* frame" is totally vacuous.
Comment 79 Andreas Wictor 2011-03-04 00:49:39 PST
(In reply to comment #78)
> This "str" you added!

Oops, this happened accidentally because of a git rebase.

> This one is definitely worth fixing since you're touching this line and "Frame* frame" is totally vacuous.

I will submit a new patch.
Comment 80 Andreas Wictor 2011-03-04 01:03:53 PST
Created attachment 84709 [details]
New proposed patch
Comment 81 Adam Barth 2011-05-05 01:31:05 PDT
Comment on attachment 84709 [details]
New proposed patch

View in context: https://bugs.webkit.org/attachment.cgi?id=84709&action=review

Ok.  So, I got half-way though the review, and my conclusion is that this is too much complexity for such a corner use case.  I would like to support this use case, but the virtuous way to do that is to disentangle the loader from the Frame.  That's not going to be easy, but this patch will end up looking like the wrong direction if/when we make that happen.  I'm sympathetic to the fact that you've spent a lot of time and effort working on this patch, but this is complex stuff that's basically going to be untested because so few people will use this feature.  That means its going to have bugs, and likely bad security bugs at that.

> Source/WebCore/bindings/v8/custom/V8XSLTProcessorCustom.cpp:93
> +    ScriptExecutionContext* scriptContext = getScriptExecutionContext();

getScriptExecutionContext => scriptExecutionContext

> Source/WebCore/dom/Document.cpp:4100
> -    processor->createDocumentFromSource(newSource, resultEncoding, resultMIMEType, this, frame());
> +    processor->createDocumentFromSource(newSource, resultEncoding, resultMIMEType, this, frame(), 0);

There's no origin document in this case?

> Source/WebCore/dom/Document.h:521
> +    void setOriginDocument(PassRefPtr<Document>);

This should probably have a scary warning about how calling it could be disaster.  There's a similar warning for setSecurityOrigin and this on is even more powerful!

> Source/WebCore/dom/Document.h:1180
> +    RefPtr<Document> m_originDocument;

Does this create a reference cycle?

> Source/WebCore/loader/DocumentThreadableLoader.cpp:361
> +        String referrer = m_document->url().strippedForUseAsReferrer();

Is this the correct referrer?  It seems like we do something more complicated when we compute frame->loader()->outgoingReferrer().

> Source/WebCore/loader/DocumentThreadableLoader.cpp:363
> +        identifier = frame->loader()->loadResourceSynchronously(
> +            request, storedCredentials, referrer, error, response, data);

No need to break this line.

> Source/WebCore/loader/FrameLoader.cpp:2763
> +unsigned long FrameLoader::loadResourceSynchronously(const ResourceRequest& request, StoredCredentials storedCredentials, const String& outgoingReferrer, ResourceError& error, ResourceResponse& response, Vector<char>& data)

Boo to adding another one of these methods.  We have too many of these FrameLoader methods with infinite arguments.

> Source/WebCore/xml/DOMParser.cpp:37
> +        doc->setOriginDocument(static_cast<Document*>(scriptExecutionContext));

Why can't we pass the origin during document construction?  That seems like a safer time than after-the-fact.

> Source/WebCore/xml/DOMParser.idl:22
> -        Document parseFromString(in DOMString str, in DOMString contentType);
> +        [CallWith=ScriptExecutionContext] Document parseFromString(in DOMString str, in DOMString contentType);

So this means the originDocument will be based on the lexical scope in which this function was called.  I guess that makes sense.  It's slightly odd how that interacts with third-party cookie blocking, especially if the constructed document is handed off from one page to another.

> Source/WebCore/xml/XMLHttpRequest.cpp:253
> -            m_responseXML->setSecurityOrigin(document()->securityOrigin());
> +            m_responseXML->setOriginDocument(document());

This part of the change looks elegant.

> Source/WebCore/xml/XSLImportRule.cpp:100
> +    RefPtr<Document> parentDocument = parentSheet->ownerDocument();

So there's an originDocument and a parentDocument, but they're not the same.  Interesting.

> Source/WebCore/xml/XSLImportRule.cpp:112
> +            parentURL = originDocument->url();

Wait, why do we care about the originDocument's URL?  That doesn't make any sense to me.  The origin document is just a bread-crumb trail back to the loader machinery.

> Source/WebCore/xml/XSLImportRule.cpp:116
> +        // Create a fake document used for the duration of this load.
> +        parentDocument = Document::create(0, parentURL);

Woah, yet another document.  This is starting to look pretty hacky.

> Source/WebCore/xml/XSLImportRule.cpp:119
> +        if (originDocument)
> +            parentDocument->setOriginDocument(originDocument);

Ok, you've lost me.  Why are we calling setOrigin document on this fake document?  That seems like an extra level of indirection.
Comment 82 Adam Barth 2011-05-05 01:32:09 PDT
Lets re-visit this issue once we've disentangled the loader from the Frame.  This bug has gone epic, so we'll probably want to start again with a fresh bug.
Comment 83 David Kilzer (:ddkilzer) 2011-05-05 08:38:07 PDT
(In reply to comment #82)
> Lets re-visit this issue once we've disentangled the loader from the Frame.  This bug has gone epic, so we'll probably want to start again with a fresh bug.

Adam, please open a new bug if you're going to close this one.  It's being tracked internally by Apple as <rdar://problem/4823079>, and closing this one without opening a new bug (and duping this one to the new bug) will make it difficult to find the "new" bug later.
Comment 84 Adam Barth 2011-05-05 09:27:07 PDT

*** This bug has been marked as a duplicate of bug 60276 ***
Comment 85 Don Corley 2011-05-06 11:48:01 PDT
Andreas and others,

Thanks for all the hard work on this bug! Sorry to see it delayed after years of waiting.

I'm surprised more people don't use xslt in javascript. For me it's such a pain to transform my xml in the browser without using stylesheets.

Thanks again... hopefully this new direction will get it fixed.

P.S. Here is the google chrome bug that is linked to this issue:
http://code.google.com/p/chromium/issues/detail?id=8441