WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
Bug 50708
Parse error during XSLT transformation
https://bugs.webkit.org/show_bug.cgi?id=50708
Summary
Parse error during XSLT transformation
Jasmin Lapalme
Reported
2010-12-08 13:05:44 PST
Created
attachment 75951
[details]
Archive containing test case and patch There is a bug in WebCore where Safari is unable to render an HTML web page created by an XSLT transformation from an XML file and an XSLT stylesheet. To reproduce, open test.xml (in the zip archive in attachment) in Safari (Mac OS X 10.6 or Safari Mobile on iPad or iPhone). There will be a message like this : --- This page contains the following errors: error on line 1 at column 1: Extra content at the end of the document Below is a rendering of the page up to the first error. This document was created as the result of an XSL transformation. The line and column numbers given are from the transformed result. --- By examining the source code, I found that it is caused by faulty conversion from UTF-8 to UTF-16 in WebCore. The conversion convert chunks of UTF-8 data, but does not treat the case where the end of the chunk is in the middle of a multibyte UTF-8 character. I am attaching a patch to fix this case (utf8_conversion.patch)
Attachments
Archive containing test case and patch
(14.18 KB, application/zip)
2010-12-08 13:05 PST
,
Jasmin Lapalme
no flags
Details
patch to fix UTF-8 conversion error
(1.44 KB, patch)
2010-12-08 13:07 PST
,
Jasmin Lapalme
no flags
Details
Formatted Diff
Diff
Patch to fix the issue
(28.28 KB, patch)
2010-12-08 18:26 PST
,
Jasmin Lapalme
ap
: review+
ap
: commit-queue-
Details
Formatted Diff
Diff
revised patch to fix the issue
(36.39 KB, patch)
2010-12-09 07:34 PST
,
Jasmin Lapalme
no flags
Details
Formatted Diff
Diff
2nd revised patch to fix the issue
(36.41 KB, patch)
2010-12-09 09:03 PST
,
Jasmin Lapalme
no flags
Details
Formatted Diff
Diff
Show Obsolete
(3)
View All
Add attachment
proposed patch, testcase, etc.
Jasmin Lapalme
Comment 1
2010-12-08 13:07:03 PST
Created
attachment 75953
[details]
patch to fix UTF-8 conversion error
David Kilzer (:ddkilzer)
Comment 2
2010-12-08 13:21:17 PST
<
rdar://problem/8728066
>
Alexey Proskuryakov
Comment 3
2010-12-08 13:22:22 PST
Thank you for tackling this serious issue! Would you be willing to formally submit a patch for review, as described in <
http://webkit.org/coding/contributing.html
>? We can't land a fix without a ChangeLog and a regression test. For Apple: <
rdar://problem/8728066
>/<
rdar://problem/8697596
>.
David Kilzer (:ddkilzer)
Comment 4
2010-12-08 17:57:13 PST
(In reply to
comment #3
)
> We can't land a fix without a ChangeLog and a regression test.
The regression test should contain a reduced (if possible) and non-copyrighted test case.
Jasmin Lapalme
Comment 5
2010-12-08 18:26:48 PST
Created
attachment 75999
[details]
Patch to fix the issue
Jasmin Lapalme
Comment 6
2010-12-08 18:30:22 PST
(In reply to
comment #3
)
> We can't land a fix without a ChangeLog and a regression test.
I've attached the complete patch with a regression test and ChangeLog. I hope it's all OK.
Jasmin Lapalme
Comment 7
2010-12-08 18:33:14 PST
(In reply to
comment #4
)
> The regression test should contain a reduced (if possible) and non-copyrighted test case.
I've reduced a lot the test and made it non-copyrighted. The xml file cannot be reduced more than that since it's with big files the bug will show up.
Darin Adler
Comment 8
2010-12-08 18:56:36 PST
Comment on
attachment 75999
[details]
Patch to fix the issue What’s wrong with String::fromUTF8?
Alexey Proskuryakov
Comment 9
2010-12-08 18:59:36 PST
Comment on
attachment 75999
[details]
Patch to fix the issue View in context:
https://bugs.webkit.org/attachment.cgi?id=75999&action=review
This looks great, thanks! I've had a number of trivial nitpicks, at least a few of which need to be fixed before landing. I'm marking this r+, so a committer can fix these before landing. Or ideally, you could upload an updated patch, and then we'll land through commit-queue.
> WebCore/ChangeLog:11 > + * xml/XSLTProcessorLibxslt.cpp: > + (WebCore::writeToVector):
This auto-generated list is intended to be completed with a brief per-function description of changes. Many active contributors don't do that, but it's a useful practice.
> WebCore/xml/XSLTProcessorLibxslt.cpp:50 > #include <wtf/text/CString.h> > #include <wtf/Vector.h> > +#include <wtf/unicode/UTF8.h>
A pre-existing issue: we sort headers in ASCII order (same as UNIX sort command or Xcode Sort Selection script). So upper case comes first.
> WebCore/xml/XSLTProcessorLibxslt.cpp:166 > + if (len == 0)
WebKit style is "if (!len)".
> WebCore/xml/XSLTProcessorLibxslt.cpp:171 > + UChar* bufferUChar; > + String stringBuffer(StringImpl::createUninitialized(len, bufferUChar)); > + UChar* bufferUCharEnd = bufferUChar + len;
I think that StringBuffer would look better here. String::fromUTF8() code that you copied this from probably predates StringBuffer.
> WebCore/xml/XSLTProcessorLibxslt.cpp:173 > + // Try converting into the buffer.
Why try? Can converting really fail? Simply converting into a buffer doesn't seem worth a comment to me.
> WebCore/xml/XSLTProcessorLibxslt.cpp:176 > + WTF::Unicode::ConversionResult result = > + WTF::Unicode::convertUTF8ToUTF16(&stringCurrent, buffer + len, &bufferUChar, bufferUCharEnd);
No need to wrap this line, it's short enough.
> WebCore/xml/XSLTProcessorLibxslt.cpp:177 > + if (result == WTF::Unicode::conversionOK || result == WTF::Unicode::sourceExhausted) {
Common WebKit style it to have early return on error case, not to nest "if"s for success cases.
> WebCore/xml/XSLTProcessorLibxslt.cpp:183 > return len;
Is this correct error handling? Per <
http://xmlsoft.org/html/libxml-xmlIO.html#xmlOutputWriteCallback
>, -1 should be returned. Also, looks like we don't really ever expect any errors here, so ASSERT_NOT_REACHED() would be good, with or without release build handling.
> LayoutTests/ChangeLog:10 > + * fast/xsl/xslt-bad-encoding-expected.txt: Added. > + * fast/xsl/xslt-bad-encoding.xml: Added. > + * fast/xsl/xslt-bad-encoding.xsl: Added.
This is not a good name for the test, as the source files are all properly encoded. I'd suggest something like "utf8-chunks.xml".
> LayoutTests/fast/xsl/xslt-bad-encoding.xml:30 > +ééééééééééééééééééééééééééééééééééééééééééééééééé
Is this enough to reliably trigger the bug under different versions of libxml2 and other XML back-ends? Maybe make this section longer, just to be safe?
> LayoutTests/fast/xsl/xslt-bad-encoding.xsl:3 > + xmlns:xsl="
http://www.w3.org/1999/XSL/Transform
">
Our subversion pre-commit hooks will complain about tabs in files. Please convert to spaces.
Alexey Proskuryakov
Comment 10
2010-12-08 19:01:17 PST
> What’s wrong with String::fromUTF8?
We need to know how many input bytes were consumed, in case a UTF-8 sequence is not all in the provided chunk.
Jasmin Lapalme
Comment 11
2010-12-09 07:34:18 PST
Created
attachment 76063
[details]
revised patch to fix the issue
Jasmin Lapalme
Comment 12
2010-12-09 07:38:50 PST
(In reply to
comment #9
)
> This looks great, thanks! I've had a number of trivial nitpicks, at least a few of which need to be fixed before landing.
I've attached a revised patch following your comments.
Alexey Proskuryakov
Comment 13
2010-12-09 08:32:35 PST
Comment on
attachment 76063
[details]
revised patch to fix the issue View in context:
https://bugs.webkit.org/attachment.cgi?id=76063&action=review
Looks good. Did you decide to not add ASSERT_NOT_REACHED because the error case can actually be seen?
> WebCore/ChangeLog:8 > + Test: fast/xsl/xslt-bad-encoding.xml
Could you please also fix this?
> LayoutTests/ChangeLog:10 > + * fast/xsl/xslt-bad-encoding-expected.txt: Added. > + * fast/xsl/xslt-bad-encoding.xml: Added. > + * fast/xsl/xslt-bad-encoding.xsl: Added.
And this?
Jasmin Lapalme
Comment 14
2010-12-09 09:03:22 PST
Created
attachment 76073
[details]
2nd revised patch to fix the issue
Jasmin Lapalme
Comment 15
2010-12-09 09:05:55 PST
(In reply to
comment #13
)
> Did you decide to not add ASSERT_NOT_REACHED because the error case can actually be seen?
You're right. I don't think it will happen. I add an ASSERT_NOT_REACHED.
Alexey Proskuryakov
Comment 16
2010-12-09 09:20:46 PST
Comment on
attachment 76073
[details]
2nd revised patch to fix the issue Thank you!
WebKit Commit Bot
Comment 17
2010-12-09 20:06:14 PST
Comment on
attachment 76073
[details]
2nd revised patch to fix the issue Clearing flags on attachment: 76073 Committed
r73679
: <
http://trac.webkit.org/changeset/73679
>
WebKit Commit Bot
Comment 18
2010-12-09 20:06:20 PST
All reviewed patches have been landed. Closing bug.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug