Bug 82577

Summary: Work around an entity parsing bug in libxml2 2.7.3 (supplied with Lion) and unskip tests
Product: WebKit Reporter: Nikolas Zimmermann <zimmermann>
Component: XMLAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, ap, darin, eric, fpizlo, hyatt, jonlee, krit, levin, mitz, mjs, mrowe, rniwa, sam, simon.fraser, tony, vsevik, zherczeg, zimmermann
Priority: P2 Keywords: LayoutTestFailure
Version: 528+ (Nightly build)   
Hardware: Mac   
OS: OS X 10.7   
Bug Depends on: 83297    
Bug Blocks:    
Attachments:
Description Flags
Patch fpizlo: review+

Description Nikolas Zimmermann 2012-03-29 00:36:42 PDT
These tests are currently skipped on Lion, w/o a comment :(
svg/W3C-SVG-1.1/coords-viewattr-01-b.svg
svg/W3C-SVG-1.1/coords-viewattr-02-b.svg
svg/custom/preserve-aspect-ratio-syntax.svg
svg/custom/viewbox-syntax.svg
svg/zoom/page/zoom-coords-viewattr-01-b.svg
svg/zoom/text/zoom-coords-viewattr-01-b.svg
svg/W3C-SVG-1.1/render-elems-03-t.svg

I checked them, and it turns out entity parsing is broken, due to a libxml2 bug.
Those tests use them like:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1 Tiny//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11-tiny.dtd"
[
<!ENTITY shape "	<path d='M60,0 l60,0 l60,60 l0,60 l-60,60 l-60,0 l-60,-60 l0,-60 z'/>">
]>
...
<g fill="yellow" stroke="red" stroke-width="8" >&shape;</g>


We're creating our libxml2 parser with the NOENT mode, which means libxml2 replaces the entity, and parses the contained elements, just if they were part of the original tree.
Though upon parsing them it doesn't see the "svg" namespace, and thus in the example above an Element is created instead of a SVGPathElement, as expected.

https://bugzilla.gnome.org/show_bug.cgi?id=502960
libxml 2.7.4 already fixed this in 2009 (!). Anyhow, we need to find a workaround, these are important tests that aren't run on Lion at the moment.
(I stumbled across this as I broke preserve-aspect-ratio-syntax.svg for non-Mac platforms, and couldn't notice this, because the test is skipped).
Comment 1 Dirk Schulze 2012-03-29 06:55:31 PDT
Ha, did the fix of libxml finally land in lion? :) We have some hacks in the controller of libxml because of bugs in older versions of libxml, which where shipped till SL. IIRC we already have a bunch of these bug reports about broken entity support in newer versions of libxml.
Comment 2 Nikolas Zimmermann 2012-03-30 05:56:18 PDT
(In reply to comment #1)
> Ha, did the fix of libxml finally land in lion? :) We have some hacks in the controller of libxml because of bugs in older versions of libxml, which where shipped till SL. IIRC we already have a bunch of these bug reports about broken entity support in newer versions of libxml.
No Lion includes libxml 2.7.3, which does not include the fix. We need to add a new workaround for that :( I have a patch ready, need to tidy it up and submit.
Comment 3 Nikolas Zimmermann 2012-04-01 04:33:41 PDT
Created attachment 134993 [details]
Patch
Comment 4 Nikolas Zimmermann 2012-04-01 04:33:51 PDT
Uploading my workaround with --suggest-reviewers, looking for comments! I tried to do it as un-intrusive as possible, I hope it's fine as-is.
Comment 5 Nikolas Zimmermann 2012-04-01 04:34:35 PDT
Forgot to say: the actual patch is only 7k, the rest are rebaselines - no worries ;-)
Comment 6 Adam Barth 2012-04-01 10:20:49 PDT
Comment on attachment 134993 [details]
Patch

Can you create a USE(LIBXML_ENTITY_PARSER_WORKAROUND) ifdef and put all this code behind the ifdef?  That will make it easier for us to remove it in the future.
Comment 7 Nikolas Zimmermann 2012-04-01 10:56:07 PDT
(In reply to comment #6)
> (From update of attachment 134993 [details])
> Can you create a USE(LIBXML_ENTITY_PARSER_WORKAROUND) ifdef and put all this code behind the ifdef?  That will make it easier for us to remove it in the future.
I followed the existing style for another libxml workaround in the same file.
I don't see the benefit of adding USE() macro support for hacks like this - if you disable it in Platform.h at some point, you still need to remove it from the cpp file as well.

If I shall change it, the other libxml workaround needs to be changed as well. Is it worth it?
Comment 8 Eric Seidel (no email) 2012-04-01 11:34:37 PDT
Comment on attachment 134993 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=134993&action=review

> Source/WebCore/xml/parser/XMLDocumentParserLibxml2.cpp:548
> +    , m_depthTriggeringEntityExpansion(-1)
> +    , m_isParsingEntityDeclaration(false)

Seems these should also be removed at compile time if the workaround is not in place.

> Source/WebCore/xml/parser/XMLDocumentParserLibxml2.cpp:743
> +#if LIBXML_VERSION >= 20704

Which suggests that this needs to be turned into a #define WORKAROUND_ENTITY_PARSING_BUG or similar, so it can be re-used as a compile time check in various places throughtout this file.
Comment 9 Filip Pizlo 2012-04-01 11:54:13 PDT
(In reply to comment #6)
> (From update of attachment 134993 [details])
> Can you create a USE(LIBXML_ENTITY_PARSER_WORKAROUND) ifdef and put all this code behind the ifdef?  That will make it easier for us to remove it in the future.

I think that this patch makes removing the hack pretty easy already, since the things it does are guarded by hackAroundLibXMLEntityParsingBug().  Furthermore, the behavior of hackAroundLibXMLEntityParsingBug() already depends on the libxml version, and already has a comment describing that it is a workaround for a bug that existed before 2.7.4.

I also think it's good for code quality to have this patch use a guard that can be checked using a C++ conditional rather than a C preprocessor conditional, since the latter tends to make the code ugly. Though in this patch, it may not make a big difference, and if it were me, I'd probably have used C preprocessor conditionals, for the reasons that Eric stated.

The thing that C preprocessor conditionals instead of C++ conditionals would give us here is to remove the 8 bytes of data that seem to be added to the XMLDocumentParser class. But that does not seem like much, since the class already has a lot of state. In particular, most of the other cruft that this code adds ought to be removed at compile-time already. hackAroundLibXMLEntityParsingBug() is a static inline that returns a constant, so that should result in simplification of control flow.  entityDeclarationHandler() is the only source of time overhead; at best it will be a tail call, leading to a useless jump. At worst, it'll be a handful of instructions that do some stack fiddling. But I can't imagine that matters much.

Nikolas, do you have insight into what the performance and memory usage impacts of this patch are?
Comment 10 Alexey Proskuryakov 2012-04-01 12:45:57 PDT
> Which suggests that this needs to be turned into a #define WORKAROUND_ENTITY_PARSING_BUG or similar, so it can be re-used as a compile time check in various places throughtout this file.

This seems like a better approach to me, too.
Comment 11 Nikolas Zimmermann 2012-04-01 12:47:35 PDT
(In reply to comment #9)
> (In reply to comment #6)
> > (From update of attachment 134993 [details] [details])
> > Can you create a USE(LIBXML_ENTITY_PARSER_WORKAROUND) ifdef and put all this code behind the ifdef?  That will make it easier for us to remove it in the future.
> 
> I think that this patch makes removing the hack pretty easy already, since the things it does are guarded by hackAroundLibXMLEntityParsingBug().  Furthermore, the behavior of hackAroundLibXMLEntityParsingBug() already depends on the libxml version, and already has a comment describing that it is a workaround for a bug that existed before 2.7.4.
Completely agreed.

> Nikolas, do you have insight into what the performance and memory usage impacts of this patch are?
Nope, I didn't measure any performance with this patch. Do you think this should be tested? Can you suggest a file that should be benchmarked?
Comment 12 Nikolas Zimmermann 2012-04-01 12:48:20 PDT
(In reply to comment #10)
> > Which suggests that this needs to be turned into a #define WORKAROUND_ENTITY_PARSING_BUG or similar, so it can be re-used as a compile time check in various places throughtout this file.
> 
> This seems like a better approach to me, too.
I just followed the existing pattern, as I didn't specifically want to avoid initializing the variables if not needed. I thought it's not worth it.
Comment 13 Nikolas Zimmermann 2012-04-04 06:55:52 PDT
Any further comments?
Comment 14 Filip Pizlo 2012-04-04 17:49:27 PDT
Comment on attachment 134993 [details]
Patch

Sorry for the delay!  I've been distracted this week.  R=me.
Comment 15 Adam Barth 2012-04-04 18:02:44 PDT
Filip, any thoughts on Comment #6, Comment #8, and Comment #10 ?
Comment 16 Filip Pizlo 2012-04-04 18:13:19 PDT
(In reply to comment #15)
> Filip, any thoughts on Comment #6, Comment #8, and Comment #10 ?

#6, #8: see my previous reply.

#10: spoke to Alexey in person. He still views Nikolas' style as being somewhat unusual. I still view it as being desirable, because it reduces #ifdef's, which make the code harder to read.

I don't think these stylistic issues are so grotesque as to preclude this patch from landing.

I don't think that using "if (fooBar()) { ... }" makes it any harder to remove the code than using "#if FOO_BAR \n ... \n #endif".

I don't think there is a performance implication to using "if (fooBar())" instead of "#if FOO_BAR", since the function in question is static inline.

I haven't heard further replies from you or Eric, and nobody flagged the patch as r-. This led me to believe that you didn't feel strongly enough about your views on the style of if statement being used to block the patch from landing as-is. The patch has sat around for long enough that I decided that it would be better to let the patch land than to continue arguing about these small things.

If you do feel strongly about preferring "#if FOO_BAR" over "if (fooBar())" then I suppose there is still time to r- the patch. It's your call. I like the patch as is and would rather err on the side of letting it land but I don't have strong feelings either way.
Comment 17 Mark Rowe (bdash) 2012-04-04 18:17:53 PDT
For what it's worth, #if's *are* easier to remove since you can simply feed the files in question through unifdef.
Comment 18 Nikolas Zimmermann 2012-04-04 23:31:33 PDT
(In reply to comment #17)
> For what it's worth, #if's *are* easier to remove since you can simply feed the files in question through unifdef.

I'll take care of removing the code once Lion ships with a newer libxml, okay?
Thanks for the review Filip! I'll land it as-is today.

I can rework the patch to use ifdefs but only if we change the other libxml bug workaround to the same style, as a mixture doesn't make much sense. Agreed?
Comment 19 Nikolas Zimmermann 2012-04-05 01:18:45 PDT
Committed r113299: <http://trac.webkit.org/changeset/113299>
Comment 20 Simon Fraser (smfr) 2012-04-05 11:00:30 PDT
This appears to have made a bunch of tests start failing, esp. some in css3/selectors3/xml
Comment 21 Jon Lee 2012-04-05 15:23:42 PDT
Rolled out in r113385 because of large number of failing tests. Reopening bug.
Comment 22 Nikolas Zimmermann 2012-04-06 23:29:01 PDT
(In reply to comment #20)
> This appears to have made a bunch of tests start failing, esp. some in css3/selectors3/xml
*sigh* I forgot that only cr-linux EWS runs tests. The cr-linux slave has a libxml2 version > 2.7.3.
I'll investigate.
Comment 23 Nikolas Zimmermann 2012-04-07 00:07:01 PDT
(In reply to comment #20)
> This appears to have made a bunch of tests start failing, esp. some in css3/selectors3/xml
Funny, css3/selectors3/xhtml/css3-modsel-120.xml - this fails but has no entities at all.

    if (hackAroundLibXMLEntityParsingBug() && context()->depth > depthTriggeringEntityExpansion() && uri.isNull() && prefix.isNull())

I should check if depthTriggeringEntityExpansion is actually != -1. I'll reland this with this obvious fix, after rerunning all layout tests on Mac.
Comment 24 Nikolas Zimmermann 2012-04-07 00:40:07 PDT
Relanded in r113542.