Bug 41348

Summary: Remove global variables from XSLTProcessorLibxslt.cpp
Product: WebKit Reporter: Andreas Wictor <andreas.wictor>
Component: XMLAssignee: Nobody <webkit-unassigned>
Status: REOPENED ---    
Severity: Enhancement CC: abarth, ap, commit-queue, darin, eric, martin, webkit.review.bot, zimmermann
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Bug Depends on: 41955    
Bug Blocks:    
Attachments:
Description Flags
Remove global variables by using the _private field that exists on most libxml structs.
none
New proposed patch
ap: review-
New proposed patch
none
New proposed patch
none
New proposed patch
none
New proposed patch abarth: review-

Description Andreas Wictor 2010-06-29 06:32:46 PDT
You can remove global variables from XSLTProcessorLibxslt.cpp by using the _private field that exists on most libxml structs.
Comment 1 Andreas Wictor 2010-06-29 06:53:26 PDT
Created attachment 60018 [details]
Remove global variables by using the _private field that exists on most libxml structs.
Comment 2 Alexey Proskuryakov 2010-06-29 10:05:34 PDT
+        XSLStyleSheet* xslStyleSheet = (XSLStyleSheet*)style->doc->_private;

We use C++ style casts (such as static_cast), not C style ones. Some existing code in this file is wrong in this respect.

Does this change the mode of failure when another code in the same process uses libxslt? This can't be made to work as long as we have to use xsltSetLoaderFunc(), but can this make situation worse? I' worried that casting _private member set by other code may cause crashes.
Comment 3 Andreas Wictor 2010-06-30 08:13:35 PDT
(In reply to comment #2)
> Does this change the mode of failure when another code in the same process uses libxslt? This can't be made to work as long as we have to use xsltSetLoaderFunc(), but can this make situation worse? 

I assumed that no other code in the same process would use libxslt, is this assumption wrong? Perhaps plugins could be a problem? 

> I' worried that casting _private member set by other code may cause crashes.

Right now no other WebKit code is using the _private field on libxml documents and libxslt transformation contexts. But yes you have to be extra careful with _private fields.
Comment 4 Alexey Proskuryakov 2010-06-30 08:58:21 PDT
Any application that links to WebKit can use libxml2 and libxslt on its own. That's of course already a problematic situation, and it doesn't seem to be possible to fully resolve this problem without moving WebKit into a different process, significantly modifying libxml2, or switching to another XML library.

I was thinking about keeping a HashSet of XSLTProcessors, so that we could check if _private is one of ours, and bail out quickly if it isn't. Other code's usage of libxslt may fail, but at least we won't crash right away. Logging an error would be helpful, probably even in release builds (but then we shouldn't complain to console more than once).
Comment 5 Andreas Wictor 2010-07-01 06:14:48 PDT
(In reply to comment #4)
> I was thinking about keeping a HashSet of XSLTProcessors, so that we could check if _private is one of ours, and bail out quickly if it isn't. Other code's usage of libxslt may fail, but at least we won't crash right away.

Sounds like a good idea. We also need a HashSet of XSLStyleSheets.

> Logging an error would be helpful, probably even in release builds (but then we shouldn't complain to console more than once).

How do I find a console object to log messages to, when I can't get a reliable XSLTProcessor object? The existing code in docLoaderFunc starts from globalProcessor when it finds a console.
Comment 6 Alexey Proskuryakov 2010-07-01 08:24:54 PDT
Release mode logging can be done via fprintf to stderr (I'm not sure if we have other examples of this already). Authors of embedding applications may not ever look at Web Inspector console.
Comment 7 Andreas Wictor 2010-07-02 06:19:32 PDT
Created attachment 60362 [details]
New proposed patch
Comment 8 Alexey Proskuryakov 2010-07-02 10:17:41 PDT
Comment on attachment 60362 [details]
New proposed patch

+static HashSet<XSLTProcessor*> globalXSLTProcessors;
+static HashSet<XSLStyleSheet*> globalXSLStyleSheets;

I don't think this will even build on Mac - there is a step in build script that verifies that there are no global destructors. Our usual pattern is to have a static function with a DEFINE_STATIC_GLOBAL macro inside.

+        xsltTransformContextPtr context = reinterpret_cast<xsltTransformContextPtr>(ctxt);
+        XSLTProcessor* processor = reinterpret_cast<XSLTProcessor*>(context->_private);

Didn't static_cast work?

+                fprintf(stderr, "Expected a registred XSLTProcessor pointer in the libxslt "
+                        "_private field but found something else: %p\n", context->_private);

This is not a message that can help the author of an application using WebKit much. We need to give them pointers they can use to work around the problem.

Something like "WebKit XSLT document loader was called with unknown transformation context." might be more helpful.

Looking at just this code, it seems that it can be simplified a little by storing transformation contexts instead of processors/stylesheets. But it's difficult to get to the context from where _private is set. Maybe it can be structured differently, although I don't have a concrete solution.

+    Node* m_sourceNode;

Ideally, this would have a comment explaining why this can't become a dangling pointer.

r- because of HashSet destructor issue.
Comment 9 Andreas Wictor 2010-07-06 08:24:59 PDT
(In reply to comment #8)
> Looking at just this code, it seems that it can be simplified a little by storing transformation contexts instead of processors/stylesheets. But it's difficult to get to the context from where _private is set. Maybe it can be structured differently, although I don't have a concrete solution.

You can store transformation contexts instead of processors, but I don't think it can be done for stylesheets. All import callbaks as already been fired before you get a chance to store a pointer to the libxslt stylesheet. So I think it's better to be concistent and use the same aproach for both cases.

I will soon supply a new patch and will be available to respond to feedback until friday, european time. After that will I be away for a month.
Comment 10 Andreas Wictor 2010-07-06 08:26:06 PDT
Created attachment 60634 [details]
New proposed patch
Comment 11 Alexey Proskuryakov 2010-07-06 09:49:54 PDT
Comment on attachment 60634 [details]
New proposed patch

> +    // m_sourceNode is set (and reset) in the transformToString function
> +    // so that the libxslt loader callback can access the source node
> +    // of the transformation
> +    Node* m_sourceNode;

This seems too wordy (of course m_sourceNode is the source node), and doesn't directly explain why this needn't be a RefPtr. I suggest something like "Source node is only non-null in transformToString(), so this cannot become a dangling pointer".

> +    // Save a pointer to the stylesheet so that we can access it from the libxslt loader callback

Comments are full sentences, and should end with a period.

r=me
Comment 12 Alexey Proskuryakov 2010-07-06 10:05:31 PDT
Tweaked the comments, and landed in <http://trac.webkit.org/changeset/62559>.
Comment 13 WebKit Review Bot 2010-07-06 10:34:59 PDT
http://trac.webkit.org/changeset/62559 might have broken SnowLeopard Intel Release (Tests)
Comment 14 Alexey Proskuryakov 2010-07-06 10:56:17 PDT
Rolled out in r62566, as this causes crashes.

Two problems: 
- _private isn't set on some code paths
- HashSet::contains() fails badly when key is empty or deleted value - since _private can be anything for non-WebKit loads, we should special case those.
Comment 15 Darin Adler 2010-07-06 11:00:25 PDT
There’s a misspelling in the patch, by the way: registred.
Comment 16 Andreas Wictor 2010-07-07 08:22:18 PDT
It seems that there is a bug in the run-webkit-tests script. When I run the script under Ubuntu it only runs 19 of 36 xsl test cases. One of the test cases it skips is xslt-import-depth.xml that crashed with the last patch.

I have fixed the bug that caused the crash and borrowed a MacBook to test the patch on. It should pass all 36 tests now.
Comment 17 Andreas Wictor 2010-07-07 08:25:12 PDT
Created attachment 60735 [details]
New proposed patch

Fixed the bug that caused the crash in the last patch.
Comment 18 Alexey Proskuryakov 2010-07-07 11:53:07 PDT
Comment on attachment 60735 [details]
New proposed patch

+        if (!processor || !registeredXSLTProcessors().contains(processor)) {

I think that we also need to check for deleted value (-1). I'm not quite sure if I'm right about that.

It would also be better to take these magic values from HashTraits.
Comment 19 Andreas Wictor 2010-07-08 02:45:14 PDT
Created attachment 60857 [details]
New proposed patch

Added check for deleted values
Comment 20 Alexey Proskuryakov 2010-07-08 10:06:59 PDT
Comment on attachment 60857 [details]
New proposed patch

r=me

> When I run the script under Ubuntu it only runs 19 of 36 xsl test cases.

This sounds horrible. Do you know what the reason for this is? Could you please file a bug?
Comment 21 Andreas Wictor 2010-07-09 04:15:20 PDT
(In reply to comment #20)
> (From update of attachment 60857 [details])
> r=me
> 
> > When I run the script under Ubuntu it only runs 19 of 36 xsl test cases.
> 
> This sounds horrible. Do you know what the reason for this is? Could you please file a bug?

I did not realize they are skipped on purpose. The reason acording to the comments in LayoutTests/platform/gtk/Skipped, is partly becuase there is some problem with the version of libxml2 that is used by the debug bots.
Comment 22 WebKit Commit Bot 2010-07-09 04:33:27 PDT
Comment on attachment 60857 [details]
New proposed patch

Clearing flags on attachment: 60857

Committed r62937: <http://trac.webkit.org/changeset/62937>
Comment 23 WebKit Commit Bot 2010-07-09 04:33:34 PDT
All reviewed patches have been landed.  Closing bug.
Comment 24 Nikolas Zimmermann 2010-07-09 05:44:59 PDT
Had to rollout the patch, crashed multiple bots. See bug 41955. Hits an assertion.
Comment 25 WebKit Review Bot 2010-07-09 05:52:53 PDT
http://trac.webkit.org/changeset/62937 might have broken Leopard Intel Debug (Tests)
Comment 26 Darin Adler 2010-07-09 08:09:03 PDT
I guess it’s OK to make this change. But note that because we can’t unconditionally use _private this just replaces one set of global variables with another. This does make things a little more elegant, but the code still will only work on a single thread and is probably a bit slower with the hash table lookups. For my perspective, what are some of the concrete benefits of the change?
Comment 27 Alexey Proskuryakov 2010-07-09 09:17:27 PDT
The motivation for this change is preparing to fix xsl:import and document(), making sync and async loading work at the same time, see bug 10313 comment 59.

Thinking about it again, I'm not sure if I follow the logic. Maybe there was no problem, to begin with.
Comment 28 Andreas Wictor 2010-07-09 10:29:16 PDT
(In reply to comment #24)
> Had to rollout the patch, crashed multiple bots. See bug 41955. Hits an assertion.

The assertion that failed was added to check that the _private field is properly initialized. I see now that in the case of embedded stylesheets (e.g. xslt-relative-path.xml), libxslt can replace our properly initialized document with a copy. This copy created by libxslt does not have the _private field set. I hope there is a workaround for this, but unfortunately I will be away for a month so I can't look into it right now.
Comment 29 Alexey Proskuryakov 2010-07-19 16:51:32 PDT
+                fprintf(stderr, "WebKit XSLT document loader was called with unknown transformation context.");

The logging shouldn't be performed if private browsing is enabled.
Comment 30 Andreas Wictor 2010-08-18 05:02:34 PDT
(In reply to comment #29)
> +                fprintf(stderr, "WebKit XSLT document loader was called with unknown transformation context.");
> 
> The logging shouldn't be performed if private browsing is enabled.

As I understand it you need a Settings object to determine if private browsing is enabled. I can't see how it's possible to get access to a Settings object from the libxslt document loader callback. Especially since the error message is printed because it can't get hold of a XSLTProcessor or XSLStyleSheet object.
Comment 31 Andreas Wictor 2010-09-06 06:23:05 PDT
Created attachment 66634 [details]
New proposed patch

New proposed patch that fixes the problem with embedded stylesheets in the last patch. The error logging referred to in comment 29 is left unchanged because I could not find a way to determine if private browsing is enabled from the libxslt document loader callback.
Comment 32 Adam Barth 2010-12-23 00:27:27 PST
Comment on attachment 66634 [details]
New proposed patch

I'm marking this patch R- for now.  The patch has been up for review for a while and there's some question as to what problem this patch is solving.  If someone else would like to review this patch, please don't let this review stand in your way.