Bug 6390

Summary: Complex entities seem to fail on TOT
Product: WebKit Reporter: Eric Seidel (no email) <eric>
Component: WebKit Misc.Assignee: Eric Seidel (no email) <eric>
Status: RESOLVED FIXED    
Severity: Normal CC: ap
Priority: P4    
Version: 420+   
Hardware: Mac   
OS: OS X 10.4   
URL: http://www.w3.org/Graphics/SVG/Test/20030813/htmlframe/full-render-elems-03-t.html
Bug Depends on: 5792    
Bug Blocks:    
Attachments:
Description Flags
difference in parser context between first "bad" call and second "good" call
none
an alternative hack to work-around libxml bugs
none
Final patch (ap's already seen most of this) ap: review+

Eric Seidel (no email)
Reported 2006-01-05 16:40:48 PST
Complex entities seem to fail on TOT Now that AP has fixed the crasher. This file still fails to handle the entity replace correctly in this SVG (xml) example. It used to work, but is broken on TOT, not sure why. It broke during the DOM merger back in early November.
Attachments
difference in parser context between first "bad" call and second "good" call (2.44 KB, text/plain)
2006-07-26 21:05 PDT, Eric Seidel (no email)
no flags
an alternative hack to work-around libxml bugs (4.01 KB, patch)
2006-07-26 21:25 PDT, Eric Seidel (no email)
no flags
Final patch (ap's already seen most of this) (185.77 KB, patch)
2006-07-26 22:41 PDT, Eric Seidel (no email)
ap: review+
Eric Seidel (no email)
Comment 1 2006-07-26 18:51:42 PDT
This hack is what's causing this bug: // Work around a libxml SAX2 bug that causes charactersHandler to be called twice. if (ent) ctxt->replaceEntities = (ctxt->instate == XML_PARSER_ATTRIBUTE_VALUE) || (ent->etype != XML_INTERNAL_GENERAL_ENTITY); Unfortunately I don't know enough about the hack to understand why it's necessary. I pulled it over from KDOM back in the day. WildFox might remember more. Removing the hack causes other issues. Something about trying to set the encoding before there is any content. That might have to do with a separate hack that I added earlier this year: // Hack around libxml2's lack of encoding overide support by manually // resetting the encoding to UTF-16 before every chunk. Otherwise libxml // will detect <?xml version="1.0" encoding="<encoding name>"?> blocks // and switch encodings, causing the parse to fail. const DeprecatedChar BOM(0xFEFF); const unsigned char BOMHighByte = *reinterpret_cast<const unsigned char *>(&BOM); xmlSwitchEncoding(m_context, BOMHighByte == 0xFF ? XML_CHAR_ENCODING_UTF16LE : XML_CHAR_ENCODING_UTF16BE); Still investigating.
Eric Seidel (no email)
Comment 2 2006-07-26 19:17:50 PDT
The bug we're trying to work around: http://bugzilla.gnome.org/show_bug.cgi?id=159219
Eric Seidel (no email)
Comment 3 2006-07-26 20:00:20 PDT
It seems this workaround fails, because the bug is more complicated than originallly thought. Entities which replace with textual content seem to be replaced twice. Entities which replace with tags do not get replaced twice. Example: <?xml version="1.0"?> <!DOCTYPE svg SYSTEM "staff.dtd" [ <!ENTITY ent1 "<rect x='0' y='0' width='40' height='10' fill='green'/>"> <!ENTITY ent2 "foo"> ]> <svg xmlns="http://www.w3.org/2000/svg"> &ent1; <text x="5" y="20">&ent2;</text> </svg> With the work-around enabled, the rect is never rendered. With it disabled, the text is rendered twice.
Eric Seidel (no email)
Comment 4 2006-07-26 20:23:06 PDT
It turns out I was wrong. In both cases the <rect> is written to the document twice. However the first time it's written the namespaces fail to resolve correctly, and the element is never rendered. Since the "work-around" disables the second writing (not the first), when the work-around is enabled entities which contain elements don't end up receiving proper namespace resolution. I'm still working on a better workaround/hack. I've emailed the libxml mailing list in case they have any suggestions.
Eric Seidel (no email)
Comment 5 2006-07-26 21:05:26 PDT
Created attachment 9705 [details] difference in parser context between first "bad" call and second "good" call Due to: http://bugzilla.gnome.org/show_bug.cgi?id=159219 it seems the libxml will call the SAX2 callbacks even when doing "internal" parsing of entities. Thus making two calls to all of the SAX callbacks, one when libxml does an internal parse of the entity and a second when actually inserting the entity data into the parse tree. This diff shows the difference in the context variables between those two calls.
Eric Seidel (no email)
Comment 6 2006-07-26 21:25:48 PDT
Created attachment 9706 [details] an alternative hack to work-around libxml bugs I can add a changelog as necessary. I also have updated test cases.
Eric Seidel (no email)
Comment 7 2006-07-26 22:41:21 PDT
Created attachment 9708 [details] Final patch (ap's already seen most of this)
Alexey Proskuryakov
Comment 8 2006-07-27 00:51:49 PDT
Comment on attachment 9708 [details] Final patch (ap's already seen most of this) r=me
Alexey Proskuryakov
Comment 9 2006-07-27 06:52:19 PDT
Eric landed this in r15648.
Note You need to log in before you can comment on or make changes to this bug.