WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
6390
Complex entities seem to fail on TOT
https://bugs.webkit.org/show_bug.cgi?id=6390
Summary
Complex entities seem to fail on TOT
Eric Seidel (no email)
Reported
2006-01-05 16:40:48 PST
Complex entities seem to fail on TOT Now that AP has fixed the crasher. This file still fails to handle the entity replace correctly in this SVG (xml) example. It used to work, but is broken on TOT, not sure why. It broke during the DOM merger back in early November.
Attachments
difference in parser context between first "bad" call and second "good" call
(2.44 KB, text/plain)
2006-07-26 21:05 PDT
,
Eric Seidel (no email)
no flags
Details
an alternative hack to work-around libxml bugs
(4.01 KB, patch)
2006-07-26 21:25 PDT
,
Eric Seidel (no email)
no flags
Details
Formatted Diff
Diff
Final patch (ap's already seen most of this)
(185.77 KB, patch)
2006-07-26 22:41 PDT
,
Eric Seidel (no email)
ap
: review+
Details
Formatted Diff
Diff
Show Obsolete
(1)
View All
Add attachment
proposed patch, testcase, etc.
Eric Seidel (no email)
Comment 1
2006-07-26 18:51:42 PDT
This hack is what's causing this bug: // Work around a libxml SAX2 bug that causes charactersHandler to be called twice. if (ent) ctxt->replaceEntities = (ctxt->instate == XML_PARSER_ATTRIBUTE_VALUE) || (ent->etype != XML_INTERNAL_GENERAL_ENTITY); Unfortunately I don't know enough about the hack to understand why it's necessary. I pulled it over from KDOM back in the day. WildFox might remember more. Removing the hack causes other issues. Something about trying to set the encoding before there is any content. That might have to do with a separate hack that I added earlier this year: // Hack around libxml2's lack of encoding overide support by manually // resetting the encoding to UTF-16 before every chunk. Otherwise libxml // will detect <?xml version="1.0" encoding="<encoding name>"?> blocks // and switch encodings, causing the parse to fail. const DeprecatedChar BOM(0xFEFF); const unsigned char BOMHighByte = *reinterpret_cast<const unsigned char *>(&BOM); xmlSwitchEncoding(m_context, BOMHighByte == 0xFF ? XML_CHAR_ENCODING_UTF16LE : XML_CHAR_ENCODING_UTF16BE); Still investigating.
Eric Seidel (no email)
Comment 2
2006-07-26 19:17:50 PDT
The bug we're trying to work around:
http://bugzilla.gnome.org/show_bug.cgi?id=159219
Eric Seidel (no email)
Comment 3
2006-07-26 20:00:20 PDT
It seems this workaround fails, because the bug is more complicated than originallly thought. Entities which replace with textual content seem to be replaced twice. Entities which replace with tags do not get replaced twice. Example: <?xml version="1.0"?> <!DOCTYPE svg SYSTEM "staff.dtd" [ <!ENTITY ent1 "<rect x='0' y='0' width='40' height='10' fill='green'/>"> <!ENTITY ent2 "foo"> ]> <svg xmlns="
http://www.w3.org/2000/svg
"> &ent1; <text x="5" y="20">&ent2;</text> </svg> With the work-around enabled, the rect is never rendered. With it disabled, the text is rendered twice.
Eric Seidel (no email)
Comment 4
2006-07-26 20:23:06 PDT
It turns out I was wrong. In both cases the <rect> is written to the document twice. However the first time it's written the namespaces fail to resolve correctly, and the element is never rendered. Since the "work-around" disables the second writing (not the first), when the work-around is enabled entities which contain elements don't end up receiving proper namespace resolution. I'm still working on a better workaround/hack. I've emailed the libxml mailing list in case they have any suggestions.
Eric Seidel (no email)
Comment 5
2006-07-26 21:05:26 PDT
Created
attachment 9705
[details]
difference in parser context between first "bad" call and second "good" call Due to:
http://bugzilla.gnome.org/show_bug.cgi?id=159219
it seems the libxml will call the SAX2 callbacks even when doing "internal" parsing of entities. Thus making two calls to all of the SAX callbacks, one when libxml does an internal parse of the entity and a second when actually inserting the entity data into the parse tree. This diff shows the difference in the context variables between those two calls.
Eric Seidel (no email)
Comment 6
2006-07-26 21:25:48 PDT
Created
attachment 9706
[details]
an alternative hack to work-around libxml bugs I can add a changelog as necessary. I also have updated test cases.
Eric Seidel (no email)
Comment 7
2006-07-26 22:41:21 PDT
Created
attachment 9708
[details]
Final patch (ap's already seen most of this)
Alexey Proskuryakov
Comment 8
2006-07-27 00:51:49 PDT
Comment on
attachment 9708
[details]
Final patch (ap's already seen most of this) r=me
Alexey Proskuryakov
Comment 9
2006-07-27 06:52:19 PDT
Eric landed this in
r15648
.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug