Created attachment 75951 [details] Archive containing test case and patch There is a bug in WebCore where Safari is unable to render an HTML web page created by an XSLT transformation from an XML file and an XSLT stylesheet. To reproduce, open test.xml (in the zip archive in attachment) in Safari (Mac OS X 10.6 or Safari Mobile on iPad or iPhone). There will be a message like this : --- This page contains the following errors: error on line 1 at column 1: Extra content at the end of the document Below is a rendering of the page up to the first error. This document was created as the result of an XSL transformation. The line and column numbers given are from the transformed result. --- By examining the source code, I found that it is caused by faulty conversion from UTF-8 to UTF-16 in WebCore. The conversion convert chunks of UTF-8 data, but does not treat the case where the end of the chunk is in the middle of a multibyte UTF-8 character. I am attaching a patch to fix this case (utf8_conversion.patch)
Created attachment 75953 [details] patch to fix UTF-8 conversion error
<rdar://problem/8728066>
Thank you for tackling this serious issue! Would you be willing to formally submit a patch for review, as described in <http://webkit.org/coding/contributing.html>? We can't land a fix without a ChangeLog and a regression test. For Apple: <rdar://problem/8728066>/<rdar://problem/8697596>.
(In reply to comment #3) > We can't land a fix without a ChangeLog and a regression test. The regression test should contain a reduced (if possible) and non-copyrighted test case.
Created attachment 75999 [details] Patch to fix the issue
(In reply to comment #3) > We can't land a fix without a ChangeLog and a regression test. I've attached the complete patch with a regression test and ChangeLog. I hope it's all OK.
(In reply to comment #4) > The regression test should contain a reduced (if possible) and non-copyrighted test case. I've reduced a lot the test and made it non-copyrighted. The xml file cannot be reduced more than that since it's with big files the bug will show up.
Comment on attachment 75999 [details] Patch to fix the issue What’s wrong with String::fromUTF8?
Comment on attachment 75999 [details] Patch to fix the issue View in context: https://bugs.webkit.org/attachment.cgi?id=75999&action=review This looks great, thanks! I've had a number of trivial nitpicks, at least a few of which need to be fixed before landing. I'm marking this r+, so a committer can fix these before landing. Or ideally, you could upload an updated patch, and then we'll land through commit-queue. > WebCore/ChangeLog:11 > + * xml/XSLTProcessorLibxslt.cpp: > + (WebCore::writeToVector): This auto-generated list is intended to be completed with a brief per-function description of changes. Many active contributors don't do that, but it's a useful practice. > WebCore/xml/XSLTProcessorLibxslt.cpp:50 > #include <wtf/text/CString.h> > #include <wtf/Vector.h> > +#include <wtf/unicode/UTF8.h> A pre-existing issue: we sort headers in ASCII order (same as UNIX sort command or Xcode Sort Selection script). So upper case comes first. > WebCore/xml/XSLTProcessorLibxslt.cpp:166 > + if (len == 0) WebKit style is "if (!len)". > WebCore/xml/XSLTProcessorLibxslt.cpp:171 > + UChar* bufferUChar; > + String stringBuffer(StringImpl::createUninitialized(len, bufferUChar)); > + UChar* bufferUCharEnd = bufferUChar + len; I think that StringBuffer would look better here. String::fromUTF8() code that you copied this from probably predates StringBuffer. > WebCore/xml/XSLTProcessorLibxslt.cpp:173 > + // Try converting into the buffer. Why try? Can converting really fail? Simply converting into a buffer doesn't seem worth a comment to me. > WebCore/xml/XSLTProcessorLibxslt.cpp:176 > + WTF::Unicode::ConversionResult result = > + WTF::Unicode::convertUTF8ToUTF16(&stringCurrent, buffer + len, &bufferUChar, bufferUCharEnd); No need to wrap this line, it's short enough. > WebCore/xml/XSLTProcessorLibxslt.cpp:177 > + if (result == WTF::Unicode::conversionOK || result == WTF::Unicode::sourceExhausted) { Common WebKit style it to have early return on error case, not to nest "if"s for success cases. > WebCore/xml/XSLTProcessorLibxslt.cpp:183 > return len; Is this correct error handling? Per <http://xmlsoft.org/html/libxml-xmlIO.html#xmlOutputWriteCallback>, -1 should be returned. Also, looks like we don't really ever expect any errors here, so ASSERT_NOT_REACHED() would be good, with or without release build handling. > LayoutTests/ChangeLog:10 > + * fast/xsl/xslt-bad-encoding-expected.txt: Added. > + * fast/xsl/xslt-bad-encoding.xml: Added. > + * fast/xsl/xslt-bad-encoding.xsl: Added. This is not a good name for the test, as the source files are all properly encoded. I'd suggest something like "utf8-chunks.xml". > LayoutTests/fast/xsl/xslt-bad-encoding.xml:30 > +ééééééééééééééééééééééééééééééééééééééééééééééééé Is this enough to reliably trigger the bug under different versions of libxml2 and other XML back-ends? Maybe make this section longer, just to be safe? > LayoutTests/fast/xsl/xslt-bad-encoding.xsl:3 > + xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> Our subversion pre-commit hooks will complain about tabs in files. Please convert to spaces.
> What’s wrong with String::fromUTF8? We need to know how many input bytes were consumed, in case a UTF-8 sequence is not all in the provided chunk.
Created attachment 76063 [details] revised patch to fix the issue
(In reply to comment #9) > This looks great, thanks! I've had a number of trivial nitpicks, at least a few of which need to be fixed before landing. I've attached a revised patch following your comments.
Comment on attachment 76063 [details] revised patch to fix the issue View in context: https://bugs.webkit.org/attachment.cgi?id=76063&action=review Looks good. Did you decide to not add ASSERT_NOT_REACHED because the error case can actually be seen? > WebCore/ChangeLog:8 > + Test: fast/xsl/xslt-bad-encoding.xml Could you please also fix this? > LayoutTests/ChangeLog:10 > + * fast/xsl/xslt-bad-encoding-expected.txt: Added. > + * fast/xsl/xslt-bad-encoding.xml: Added. > + * fast/xsl/xslt-bad-encoding.xsl: Added. And this?
Created attachment 76073 [details] 2nd revised patch to fix the issue
(In reply to comment #13) > Did you decide to not add ASSERT_NOT_REACHED because the error case can actually be seen? You're right. I don't think it will happen. I add an ASSERT_NOT_REACHED.
Comment on attachment 76073 [details] 2nd revised patch to fix the issue Thank you!
Comment on attachment 76073 [details] 2nd revised patch to fix the issue Clearing flags on attachment: 76073 Committed r73679: <http://trac.webkit.org/changeset/73679>
All reviewed patches have been landed. Closing bug.