RESOLVED FIXED Bug 50708
Parse error during XSLT transformation
https://bugs.webkit.org/show_bug.cgi?id=50708
Summary Parse error during XSLT transformation
Jasmin Lapalme
Reported 2010-12-08 13:05:44 PST
Created attachment 75951 [details] Archive containing test case and patch There is a bug in WebCore where Safari is unable to render an HTML web page created by an XSLT transformation from an XML file and an XSLT stylesheet. To reproduce, open test.xml (in the zip archive in attachment) in Safari (Mac OS X 10.6 or Safari Mobile on iPad or iPhone). There will be a message like this : --- This page contains the following errors: error on line 1 at column 1: Extra content at the end of the document Below is a rendering of the page up to the first error. This document was created as the result of an XSL transformation. The line and column numbers given are from the transformed result. --- By examining the source code, I found that it is caused by faulty conversion from UTF-8 to UTF-16 in WebCore. The conversion convert chunks of UTF-8 data, but does not treat the case where the end of the chunk is in the middle of a multibyte UTF-8 character. I am attaching a patch to fix this case (utf8_conversion.patch)
Attachments
Archive containing test case and patch (14.18 KB, application/zip)
2010-12-08 13:05 PST, Jasmin Lapalme
no flags
patch to fix UTF-8 conversion error (1.44 KB, patch)
2010-12-08 13:07 PST, Jasmin Lapalme
no flags
Patch to fix the issue (28.28 KB, patch)
2010-12-08 18:26 PST, Jasmin Lapalme
ap: review+
ap: commit-queue-
revised patch to fix the issue (36.39 KB, patch)
2010-12-09 07:34 PST, Jasmin Lapalme
no flags
2nd revised patch to fix the issue (36.41 KB, patch)
2010-12-09 09:03 PST, Jasmin Lapalme
no flags
Jasmin Lapalme
Comment 1 2010-12-08 13:07:03 PST
Created attachment 75953 [details] patch to fix UTF-8 conversion error
David Kilzer (:ddkilzer)
Comment 2 2010-12-08 13:21:17 PST
Alexey Proskuryakov
Comment 3 2010-12-08 13:22:22 PST
Thank you for tackling this serious issue! Would you be willing to formally submit a patch for review, as described in <http://webkit.org/coding/contributing.html>? We can't land a fix without a ChangeLog and a regression test. For Apple: <rdar://problem/8728066>/<rdar://problem/8697596>.
David Kilzer (:ddkilzer)
Comment 4 2010-12-08 17:57:13 PST
(In reply to comment #3) > We can't land a fix without a ChangeLog and a regression test. The regression test should contain a reduced (if possible) and non-copyrighted test case.
Jasmin Lapalme
Comment 5 2010-12-08 18:26:48 PST
Created attachment 75999 [details] Patch to fix the issue
Jasmin Lapalme
Comment 6 2010-12-08 18:30:22 PST
(In reply to comment #3) > We can't land a fix without a ChangeLog and a regression test. I've attached the complete patch with a regression test and ChangeLog. I hope it's all OK.
Jasmin Lapalme
Comment 7 2010-12-08 18:33:14 PST
(In reply to comment #4) > The regression test should contain a reduced (if possible) and non-copyrighted test case. I've reduced a lot the test and made it non-copyrighted. The xml file cannot be reduced more than that since it's with big files the bug will show up.
Darin Adler
Comment 8 2010-12-08 18:56:36 PST
Comment on attachment 75999 [details] Patch to fix the issue What’s wrong with String::fromUTF8?
Alexey Proskuryakov
Comment 9 2010-12-08 18:59:36 PST
Comment on attachment 75999 [details] Patch to fix the issue View in context: https://bugs.webkit.org/attachment.cgi?id=75999&action=review This looks great, thanks! I've had a number of trivial nitpicks, at least a few of which need to be fixed before landing. I'm marking this r+, so a committer can fix these before landing. Or ideally, you could upload an updated patch, and then we'll land through commit-queue. > WebCore/ChangeLog:11 > + * xml/XSLTProcessorLibxslt.cpp: > + (WebCore::writeToVector): This auto-generated list is intended to be completed with a brief per-function description of changes. Many active contributors don't do that, but it's a useful practice. > WebCore/xml/XSLTProcessorLibxslt.cpp:50 > #include <wtf/text/CString.h> > #include <wtf/Vector.h> > +#include <wtf/unicode/UTF8.h> A pre-existing issue: we sort headers in ASCII order (same as UNIX sort command or Xcode Sort Selection script). So upper case comes first. > WebCore/xml/XSLTProcessorLibxslt.cpp:166 > + if (len == 0) WebKit style is "if (!len)". > WebCore/xml/XSLTProcessorLibxslt.cpp:171 > + UChar* bufferUChar; > + String stringBuffer(StringImpl::createUninitialized(len, bufferUChar)); > + UChar* bufferUCharEnd = bufferUChar + len; I think that StringBuffer would look better here. String::fromUTF8() code that you copied this from probably predates StringBuffer. > WebCore/xml/XSLTProcessorLibxslt.cpp:173 > + // Try converting into the buffer. Why try? Can converting really fail? Simply converting into a buffer doesn't seem worth a comment to me. > WebCore/xml/XSLTProcessorLibxslt.cpp:176 > + WTF::Unicode::ConversionResult result = > + WTF::Unicode::convertUTF8ToUTF16(&stringCurrent, buffer + len, &bufferUChar, bufferUCharEnd); No need to wrap this line, it's short enough. > WebCore/xml/XSLTProcessorLibxslt.cpp:177 > + if (result == WTF::Unicode::conversionOK || result == WTF::Unicode::sourceExhausted) { Common WebKit style it to have early return on error case, not to nest "if"s for success cases. > WebCore/xml/XSLTProcessorLibxslt.cpp:183 > return len; Is this correct error handling? Per <http://xmlsoft.org/html/libxml-xmlIO.html#xmlOutputWriteCallback>, -1 should be returned. Also, looks like we don't really ever expect any errors here, so ASSERT_NOT_REACHED() would be good, with or without release build handling. > LayoutTests/ChangeLog:10 > + * fast/xsl/xslt-bad-encoding-expected.txt: Added. > + * fast/xsl/xslt-bad-encoding.xml: Added. > + * fast/xsl/xslt-bad-encoding.xsl: Added. This is not a good name for the test, as the source files are all properly encoded. I'd suggest something like "utf8-chunks.xml". > LayoutTests/fast/xsl/xslt-bad-encoding.xml:30 > +ééééééééééééééééééééééééééééééééééééééééééééééééé Is this enough to reliably trigger the bug under different versions of libxml2 and other XML back-ends? Maybe make this section longer, just to be safe? > LayoutTests/fast/xsl/xslt-bad-encoding.xsl:3 > + xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> Our subversion pre-commit hooks will complain about tabs in files. Please convert to spaces.
Alexey Proskuryakov
Comment 10 2010-12-08 19:01:17 PST
> What’s wrong with String::fromUTF8? We need to know how many input bytes were consumed, in case a UTF-8 sequence is not all in the provided chunk.
Jasmin Lapalme
Comment 11 2010-12-09 07:34:18 PST
Created attachment 76063 [details] revised patch to fix the issue
Jasmin Lapalme
Comment 12 2010-12-09 07:38:50 PST
(In reply to comment #9) > This looks great, thanks! I've had a number of trivial nitpicks, at least a few of which need to be fixed before landing. I've attached a revised patch following your comments.
Alexey Proskuryakov
Comment 13 2010-12-09 08:32:35 PST
Comment on attachment 76063 [details] revised patch to fix the issue View in context: https://bugs.webkit.org/attachment.cgi?id=76063&action=review Looks good. Did you decide to not add ASSERT_NOT_REACHED because the error case can actually be seen? > WebCore/ChangeLog:8 > + Test: fast/xsl/xslt-bad-encoding.xml Could you please also fix this? > LayoutTests/ChangeLog:10 > + * fast/xsl/xslt-bad-encoding-expected.txt: Added. > + * fast/xsl/xslt-bad-encoding.xml: Added. > + * fast/xsl/xslt-bad-encoding.xsl: Added. And this?
Jasmin Lapalme
Comment 14 2010-12-09 09:03:22 PST
Created attachment 76073 [details] 2nd revised patch to fix the issue
Jasmin Lapalme
Comment 15 2010-12-09 09:05:55 PST
(In reply to comment #13) > Did you decide to not add ASSERT_NOT_REACHED because the error case can actually be seen? You're right. I don't think it will happen. I add an ASSERT_NOT_REACHED.
Alexey Proskuryakov
Comment 16 2010-12-09 09:20:46 PST
Comment on attachment 76073 [details] 2nd revised patch to fix the issue Thank you!
WebKit Commit Bot
Comment 17 2010-12-09 20:06:14 PST
Comment on attachment 76073 [details] 2nd revised patch to fix the issue Clearing flags on attachment: 76073 Committed r73679: <http://trac.webkit.org/changeset/73679>
WebKit Commit Bot
Comment 18 2010-12-09 20:06:20 PST
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.