REOPENED 15715
Nested XSL stylesheets can produce memory corruption
https://bugs.webkit.org/show_bug.cgi?id=15715
Summary Nested XSL stylesheets can produce memory corruption
Jonathan Haas
Reported 2007-10-26 14:55:23 PDT
libxml, after it loads and parses a document, goes through a postprocessing phase where it replaces some or all of the strings in the document with pointers to a dictionary it builds. When an XML document is transformed by a stylesheet, it will end up with references to strings in the stylesheet's dictionary. When the XML doc is destroyed, each node is checked to see if the address of its pointer resides in space managed by the internal dictionary. If not, libxml knows it can safely free the memory. Current behavior is for nested stylesheets to load and parse as separate documents. When the stylesheet is applied, the transformed document can get references to both stylesheets. When the destructor is called, the node traversal checks strings only against the parent stylesheet's dictionary, not the nested stylesheet. When it gets to a node that has a pointer to the child spreadsheet's dictionary, the mechanism for checking whether a node contains a string allocated by the model or a pointer to a dictionary fails, and free() is called on memory that we don't want freed, and which may never have been malloc'd to begin with. Fix attached.
Attachments
proposed patch (5.29 KB, patch)
2007-10-26 14:56 PDT, Jonathan Haas
no flags
proposed patch take 2, accidentally included stuff I didn't mean to (4.20 KB, patch)
2007-10-26 15:24 PDT, Jonathan Haas
eric: review-
proposed patch, fixed indentation (4.18 KB, patch)
2007-11-02 11:29 PDT, Jonathan Haas
ap: review-
new and improved patch with 100% less unnecessary UTF-8 conversions (3.76 KB, patch)
2008-08-05 15:12 PDT, Jonathan Haas
eric: review-
fixed (4.15 KB, patch)
2008-08-06 11:58 PDT, Jonathan Haas
myrdred: review-
latest patch (4.19 KB, patch)
2008-08-22 13:03 PDT, Jonathan Haas
darin: review-
Patch addressing Darin's comments (7.77 KB, patch)
2008-10-31 17:31 PDT, Pam Greene (IRC:pamg)
no flags
patch fo fix leaks (1.09 KB, patch)
2008-11-12 21:59 PST, Stephanie Lewis
no flags
Jonathan Haas
Comment 1 2007-10-26 14:56:34 PDT
Created attachment 16889 [details] proposed patch Gives child stylesheets a link to their parents. Forces them to use the parent's dictionary when parsing, leaving the whole nested system with a single dictionary.
Jonathan Haas
Comment 2 2007-10-26 15:24:42 PDT
Created attachment 16890 [details] proposed patch take 2, accidentally included stuff I didn't mean to
Alexey Proskuryakov
Comment 3 2007-10-26 23:26:27 PDT
Is the problem this patch fixes more than potential? Can it be seen with a test case?
Eric Seidel (no email)
Comment 4 2007-10-27 12:29:30 PDT
I think ap and I would be very interested in discussing this with you over IRC. We've both hacked on this code, but at least I can't claim to be a libxml2 expert. What you're "fixing" looks like a potential bug in libxslt/libxml2? Correct? If so, we should definitely also file a bug with gnome and link to the bug from here (for our records). WebCore generally deals only with UTF16 strings , that's why we we were passing xmlCtxtReadDoc a UTF16 string. If it's true xmlCtxtReadDoc only works with UTF8, that's yet another bug with libxml2 (at least in the documentation): http://xmlsoft.org/html/libxml-parser.html#xmlCtxtReadDoc which also should be filed and linked to from this bug. If this is really a memory corruption issue, then it can be marked p1, and we'll definitely make a fix. But I'm not comfortable taking this change w/o at least a test case, and preferably chatting with you a bit over irc. :) MacDome or ap in #webkit is the easiest way to reach us.
Eric Seidel (no email)
Comment 5 2007-10-27 12:30:36 PDT
Comment on attachment 16890 [details] proposed patch take 2, accidentally included stuff I didn't mean to Can't land this w/o at least a test case and change log.
Jonathan Haas
Comment 6 2007-10-29 12:39:44 PDT
Sorry it's taken so long to respond, I've been out of town. Alexey: the problem can be seen quite easily by building WebKit and linking with your own build of libxml2. The prebuilt libxml2 binary that WebKit ships with does not reproduce the problem; I suspect that runtime heap checks are turned off. With full heap checking, an assert is always triggered with a non-empty nested stylesheet. The following reproduces it: --- [test.xml] <?xml version="1.0" encoding="utf-8"?> <?xml-stylesheet type="text/xsl" href="test0.xsl"?> <testing> <message>Hello, world!</message> </testing> [test0.xsl] <?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:import href="test1.xsl"/> </xsl:stylesheet> [test1.xsl] <?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <html> <head> <title>Testing</title> </head> <body> <div id="kablammo"> <b>FOO!</b> </div> </body> </html> </xsl:template> </xsl:stylesheet> --- Eric, I wouldn't call this a bug in libxml2/libxslt (and believe me, I seriously considered the possibility.) It could be resolved by making changes to these libraries, but not very elegantly. The problem is that libxml loads and parses a document either with its own dictionary (if it creates the parse context) or with a shared dictionary (if a context is provided by the client). By the time libxslt sees the doc, it's already been parsed, and its strings are already in whichever dictionary they're going to be in. Since libxslt delegates the loading of child stylesheets to the client, it's the client's responsibility to ensure that the dictionaries are not disjoint. As for the UTF-8 comment, I think I wasn't very clear... the current code didn't call xmlCtxtReadDoc() at all, but rather xmlReadMemory(), which will happily take buffers with any encoding you care to pass in. But in order to have the child stylesheet share the parent's dictionary, we need to create the context ourselves, and the only way that libxml exposes to do that is to use xmlCtxtReadDoc(), which only works on UTF-8 (it takes a pointer to xmlChar), which is why I was wondering why the function takes an encoding parameter at all. That'd be something to take up with the gnome folks. I'll look for you on IRC to talk abut it.
David Kilzer (:ddkilzer)
Comment 7 2007-10-29 21:42:34 PDT
Copying Mark since he updated libxml2 recently.
Alexey Proskuryakov
Comment 8 2007-10-30 03:20:56 PDT
With a debug build of libxml2 2.6.16 (./configure --with-mem-debug --with-run-debug), I do see dictionary inconsistency errors: element html: error : Name is not from the document dictionnary 'html' element head: error : Name is not from the document dictionnary 'head' element title: error : Name is not from the document dictionnary 'title' element body: error : Name is not from the document dictionnary 'body' element div: error : Name is not from the document dictionnary 'div' element b: error : Name is not from the document dictionnary 'b' However, my results are somewhat different: 1) I do not see any free() problems (OS X memory allocator would have complained if someone were trying to free non-allocated memory, or to double-free). I also don't get any assertion failures. 2) I see similar errors from all XSL transformations, not just ones that involve nested stylesheets. So, while there's definitely something wrong going on, I still don't quite see what exactly the problem is, and whether it is indeed a memory corruption problem. Note that WebKit becomes very crash-prone when running against a debug build of libxml2, because the debug version of xmlFree() cannot take NULL arguments. I think this is a libxml2 bug, I've reported it: <http://bugzilla.gnome.org/show_bug.cgi?id=491651>. > But in order to have the child > stylesheet share the parent's dictionary, we need to create the context > ourselves, and the only way that libxml exposes to do that is to use > xmlCtxtReadDoc(), which only works on UTF-8 (it takes a pointer to xmlChar), Can we use xmlCtxtReadMemory()?
Jonathan Haas
Comment 9 2007-10-30 14:42:23 PDT
> Can we use xmlCtxtReadMemory()? No, because libxml2 doesn't expose xmlCreateMemoryParserContext().
Jonathan Haas
Comment 10 2007-10-31 15:29:20 PDT
So I tried to repro this on a Mac, doing the following: 1) Downloaded and installed sources for libxml2-2.6.30 and libxslt-1.1.22 (the most recent versions, in both cases) 2) Built both using ./configure --prefix=/usr; make; sudo make install 3) Built WebKit using WebKit/WebKitTools/Scripts/build-webkit --debug 4) Ran Safari with WebKit/WebKitTools/Scripts/run-safari --debug 5) Navigated to ~/test/test.xml (containing repro scripts above) Output: jhaas$ webkit/webkittools/scripts/run-safari --debug Starting Safari with DYLD_FRAMEWORK_PATH set to point to built WebKit in /Users/jhaas/webkit/WebKit/WebKitBuild/Debug. Safari(6253,0xa000d000) malloc: *** Deallocation of a pointer not malloced: 0x29432d7; This could be a double free(), or free() called with the middle of an allocated block; Try setting environment variable MallocHelp to see tools to help debug More or less the same behavior I saw in Win32.
Alexey Proskuryakov
Comment 11 2007-11-01 02:04:36 PDT
Confirmed that the behavior is different with latest libxml2 and libxslt (I wonder what versions 10.5 has). With libxml2 memory debugging enabled, the output is: Memory tag error occurs :0x311d0d7 bye 12:01:03 PM MEMORY ALLOCATED : 42495, MAX was 54711 <dump of all blocks> xmlMemFree(311D0F7) error xmlMallocBreakpoint reached on block 0
Jonathan Haas
Comment 12 2007-11-02 11:29:59 PDT
Created attachment 16995 [details] proposed patch, fixed indentation Patch with indentation fixed. The repro case is described in the discussion... the bug doesn't really lend itself well to a test case since it only repros (on Tiger) with a self-built libxml2. Changelog is included in the patch.
Darin Adler
Comment 13 2007-11-06 07:48:38 PST
Comment on attachment 16995 [details] proposed patch, fixed indentation It seems unfortunate that we need to create a UTF-8 copy of the entire sheet just so libxml2 can parse it; we went out of our way to avoid this in the past. Is there a way to avoid this part of the change? What guarantees that the parent style sheet will outlast the child one? Can m_parentStyleSheet become a stale pointer? These are reference-counted objects, so a typical approach would be to use a RefPtr rather than a raw pointer. Unless there's a guarantee that the parent style sheet outlives this one.
Alexey Proskuryakov
Comment 14 2007-11-23 20:49:41 PST
Comment on attachment 16995 [details] proposed patch, fixed indentation (In reply to comment #9) > > Can we use xmlCtxtReadMemory()? > No, because libxml2 doesn't expose xmlCreateMemoryParserContext(). Looks like the context for xmlCtxtReadMemory() is to be created with xmlNewParserCtxt(), which is exposed. r-, because we want to avoid reconversion to UTF-8. Sorry for slow reviewing - I'm trying to get more familiar with XSLT ownership issues now!
Julien Chaffraix
Comment 15 2008-05-14 11:16:53 PDT
(In reply to comment #14) > (From update of attachment 16995 [details] [edit]) > (In reply to comment #9) > > > Can we use xmlCtxtReadMemory()? > > No, because libxml2 doesn't expose xmlCreateMemoryParserContext(). > It does! (check http://xmlsoft.org/html/libxml-parserInternals.html#xmlCreateMemoryParserCtxt) > > Looks like the context for xmlCtxtReadMemory() is to be created with > xmlNewParserCtxt(), which is exposed. I see 3 choices here: - use an xmlMemoryCtxt: XMLTokenizer::createMemoryParser does that too but we also do a UTF8 conversion (that should be solved). - use the same approach as XMLTokenizer::createStringParser which avoids the conversion and would work here as we are parsing a well-formed document. - use something else and in that case, avoid doing the initialization ourselves and thus avoid using xmlNewParserCtxt IMHO, the second one seems the most promising here.
Jonathan Haas
Comment 16 2008-08-05 15:12:51 PDT
Created attachment 22661 [details] new and improved patch with 100% less unnecessary UTF-8 conversions Note: I can't really test this on a Mac, since the bug only reproes with ToT WebKit. Building and applying ToT libxml2 and libxslt ended up breaking libphp which complained about the "wrong architecture" on libexslt. The Internet said I should rebuild as 64-bit binaries, which completely horked the Mac and required that I reboot from DVD and copy over the old libxml/libxslt. But the bug reproes just fine with ToT libxml2/libxslt on Windows, and goes away with this patch.
Eric Seidel (no email)
Comment 17 2008-08-05 18:16:50 PDT
Comment on attachment 22661 [details] new and improved patch with 100% less unnecessary UTF-8 conversions Looks good except... Should use spaces, not tabs: 9298 bool m_stylesheetDocTaken; 99 XSLStyleSheet* m_parentStyleSheet; This needs a comment to explain why this is done: + if (parentStyleSheet()) { + xmlDictFree(ctxt->dict); + ctxt->dict = parentStyleSheet()->m_stylesheetDoc->dict; + } Really the code could just use a comment or two in general to tell why we're stuffing away this parent document pointer, and under which circumstances we could ever stop... Also, this needs a test case which demonstrates the crash. Your test case should work under run-webkit-tests and should crash before your patch and not after your patch. Thanks for the patch!
Jonathan Haas
Comment 18 2008-08-06 11:58:29 PDT
Created attachment 22685 [details] fixed Thanks for the feedback. I fixed the spacing issue and added a comment explaining why this is necessary. I can't make a regression test, because this doesn't repro in the stock build of WebKit. It only repros when using recent builds of libxml2 and libxslt. Apple ships binaries of these libraries with MacOS, and as part of the support libraries for the Windows build of WebKit, so this bug just doesn't happen.
Jonathan Haas
Comment 19 2008-08-06 16:06:47 PDT
Actually, please hold off on landing this patch, I'm checking into something.
Alexey Proskuryakov
Comment 20 2008-08-10 23:24:17 PDT
(In reply to comment #16) > Note: I can't really test this on a Mac, since the bug only reproes with ToT > WebKit. Building and applying ToT libxml2 and libxslt ended up breaking libphp FWIW, it's not necessary to replace system versions of these libraries to test their behavior with WebKit - you can set a DYLD_LIBRARY_PATH environment variable, similarly to how run-safari sets DYLD_FRAMEWORK_PATH. export DYLD_LIBRARY_PATH=/path/to/custom/libxml2
Mark Rowe (bdash)
Comment 21 2008-08-10 23:29:13 PDT
(In reply to comment #18) > I can't make a regression test, because this doesn't repro in the stock build > of WebKit. It only repros when using recent builds of libxml2 and libxslt. > Apple ships binaries of these libraries with MacOS, and as part of the support > libraries for the Windows build of WebKit, so this bug just doesn't happen. You should still create a regression test for this none the less. Apple won't continue to ship the same versions of libxml2 and libxslt for ever, and it is very important to ensure that the problem isn't reintroduced at some point in the future.
Jonathan Haas
Comment 22 2008-08-22 13:03:05 PDT
Created attachment 22944 [details] latest patch One more try.
Darin Adler
Comment 23 2008-08-22 16:21:42 PDT
(In reply to comment #22) > One more try. What about Mark's comment about the regression test?
Darin Adler
Comment 24 2008-08-24 15:14:39 PDT
Comment on attachment 22944 [details] latest patch Looks OK. I have a few thoughts and questions. Mark Rowe requested a regression test; I'd like to see the patch include one. Since parentStyleSheet() is never used outside the XSLStyleSheet class, I don't think we need to define a function. If later we decide it's needed, that's OK, but it's slightly nicer to have tighter encapsulation and know no one outside the class looks at m_parentStyleSheet. What guarantees that m_parentStyleSheet is not left pointing to a deleted object? + void setParentStyleSheet(XSLStyleSheet* parent) { + m_parentStyleSheet = parent; + if (parent) + setOwnerDocument(parent->ownerDocument()); + } The opening brace is supposed to go on a separate line. This function is also getting long enough that it might be better to not put it inline. Our normal approach is to only use inlines for completely trivial functions unless there's a specific performance motivation for doing so. Are there any callers left for XSLStyleSheet::setOwnerDocument()? If not, perhaps we should remove it. I'm going to say review- since I'm hoping for a patch with a regression test. In a pinch I suppose we can land this without one. Let me know if you'd like to do that -- you should even feel re-mark this patch for review if you think this should go in as-is even considering my comments.
Pam Greene (IRC:pamg)
Comment 25 2008-10-31 17:31:57 PDT
Created attachment 24827 [details] Patch addressing Darin's comments Jonathan was working on Chromium but has since been called to work on some other things. I'm taking over getting his patch finished up. > Mark Rowe requested a regression test; I'd like to see the patch include one. Added. > Since parentStyleSheet() is never used outside the XSLStyleSheet class, I don't > think we need to define a function. Removed. > What guarantees that m_parentStyleSheet is not left pointing to a deleted > object? Sorry, I don't know this code that well; I'm just working with the patch and tests. If it's a concern, I'll ask Jonathan to weigh in. > + void setParentStyleSheet(XSLStyleSheet* parent) { > + m_parentStyleSheet = parent; > + if (parent) > + setOwnerDocument(parent->ownerDocument()); > + } > > The opening brace is supposed to go on a separate line. This function is also > getting long enough that it might be better to not put it inline. Style fixed and method out-lined. > Are there any callers left for XSLStyleSheet::setOwnerDocument()? If not, > perhaps we should remove it. Removed.
Darin Adler
Comment 26 2008-11-03 09:35:49 PST
(In reply to comment #25) > > What guarantees that m_parentStyleSheet is not left pointing to a deleted > > object? > > Sorry, I don't know this code that well; I'm just working with the patch and > tests. If it's a concern, I'll ask Jonathan to weigh in. Yes, it's a concern. That's why I asked.
Darin Adler
Comment 27 2008-11-03 09:38:18 PST
Comment on attachment 24827 [details] Patch addressing Darin's comments r=me I'd still like to understand the lifetime guarantee for the new m_parentStyleSheet data member.
Pam Greene (IRC:pamg)
Comment 28 2008-11-03 10:18:00 PST
(In reply to comment #26) > Yes, it's a concern. That's why I asked. Well, yes. It was shorthand for "if this is enough of a concern to warrant a reasonable amount of effort, as opposed to an offhand question". I was confused by the juxtaposition of your question and an r+ (and still am, but I'll get over it). Anyway, I'll both land this today and ask Jonathan to stop by.
Darin Adler
Comment 29 2008-11-03 10:29:12 PST
(In reply to comment #28) > Well, yes. It was shorthand for "if this is enough of a concern to warrant a > reasonable amount of effort, as opposed to an offhand question". I was confused > by the juxtaposition of your question and an r+ (and still am, but I'll get > over it). I think the fix has value as is and I don't want to unnecessarily delay getting it checked in. But I suspect that the lifetime of m_parentStyleSheet could be a real problem and lead to future bugs. So I think it's great that you're going to look into it some more.
Pam Greene (IRC:pamg)
Comment 30 2008-11-04 16:26:44 PST
Landed in r38115. Over to Jonathan for more info; keeping the bug open pending that.
Jonathan Haas
Comment 31 2008-11-05 12:41:05 PST
The child stylesheet is owned by (and a RefPtr to it is held by) an XSLImportRule. The XSLImportRule is in turn a child of (and a RefPtr to it is kept by) the parent sheet. The destruction of the parent sheet will destroy all its children, including the XSLImportRule, whose destruction in turn will trigger the destruction of the child sheet. But actually, now I'm questioning whether it's possible for the same XSLStyleSheet object to be referenced by multiple XSLImportRules. If so, there could be problems that would go beyond having a parent sheet destroyed before its children... there could be inappropriate behavior if an XSLStyleSheet is reused by a different parent with different shared dictionary entries.
Darin Adler
Comment 32 2008-11-05 12:51:46 PST
(In reply to comment #31) > The destruction of the parent sheet will destroy all > its children I don't think that's correct. It will deref all the children, but that won't cause them to be destroyed. For example, some script might be holding a reference to one of the rules in a variable.
Mark Rowe (bdash)
Comment 33 2008-11-11 01:36:41 PST
The buildbot is showing a bunch of XSL-related memory leaks which I suspect may have been caused by this patch: <http://build.webkit.org/results/trunk-mac-intel-debug/5482/DumpRenderTree2-leaks.txt>.
Darin Adler
Comment 34 2008-11-11 08:56:46 PST
(In reply to comment #33) > The buildbot is showing a bunch of XSL-related memory leaks which I suspect may > have been caused by this patch: Is someone going to tackle these, or do I need to work on them myself?
Stephanie Lewis
Comment 35 2008-11-12 19:08:18 PST
I think i have a fix for the leaks, but I need to verify that I didn't reintroduce the crash.
Stephanie Lewis
Comment 36 2008-11-12 21:59:58 PST
Created attachment 25116 [details] patch fo fix leaks Patch for the leaks
Stephanie Lewis
Comment 37 2008-11-12 22:06:26 PST
Comment on attachment 25116 [details] patch fo fix leaks Mark Rowe reviewed it
Stephanie Lewis
Comment 38 2008-11-12 22:10:08 PST
Committed leak fix in 38362
Eric Seidel (no email)
Comment 39 2008-12-02 12:01:44 PST
Looks like this has been landed.
Alexey Proskuryakov
Comment 40 2008-12-02 12:10:12 PST
Per comment 30, this is kept open intentionally. It may make sense to file a follow-up bug for further investigation though.
Eric Seidel (no email)
Comment 41 2008-12-02 13:07:30 PST
Comment on attachment 24827 [details] Patch addressing Darin's comments Clearing r+ since this was landed as: http://trac.webkit.org/changeset/38115
Eric Seidel (no email)
Comment 42 2008-12-02 13:07:47 PST
Comment on attachment 25116 [details] patch fo fix leaks Clearing r+ since this was landed as: http://trac.webkit.org/changeset/38362
Note You need to log in before you can comment on or make changes to this bug.