Summary: | REGRESSION: Safari doesn't display hebrew text on a web page, displayed correctly on Tiger | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Darin Adler <darin> | ||||||
Component: | Text | Assignee: | Alexey Proskuryakov <ap> | ||||||
Status: | RESOLVED FIXED | ||||||||
Severity: | Normal | CC: | ap | ||||||
Priority: | P1 | Keywords: | InRadar, Regression | ||||||
Version: | 420+ | ||||||||
Hardware: | Mac | ||||||||
OS: | OS X 10.4 | ||||||||
Attachments: |
|
Description
Darin Adler
2007-01-31 09:48:47 PST
(In reply to comment #0) > The key question here is: why did the Tiger version of the charset sniffer work > better on this page? In Tiger, all unknown tags were allowed in HEAD: int id = khtml::getTagID(tmp, len); if(end) id += ID_CLOSE_TAG; ... case (ID_META+ID_CLOSE_TAG): case ID_SCRIPT: case (ID_SCRIPT+ID_CLOSE_TAG): ...other tags explicitly allowed in HEAD... case ID_HTML: case ID_HEAD: case 0: case (0 + ID_CLOSE_TAG ): break; default: body = true; I don't think we want to re-add this behavior. (In reply to comment #1) > I don't think we want to re-add this behavior. At the same time, I suspect the change was non-intentional (likely made at the time when we switched from tag IDs to AtomicStrings). Created attachment 12830 [details]
proposed fix
This patch fixes more than the regression - known tags hidden in attributes don't fool the sniffer anymore, too.
Comment on attachment 12830 [details]
proposed fix
I believe that the HTML parser is tolerant of spurious quote marks in tags attribute lists -- see line 927 of HTMLTokenizer.cpp, the part that says "ignore any quotes we encounter and treat them like spaces". This code is strict about that. I'd like to see some more test cases that cover these strange quoting cases to see if there are any repercussions of that.
r=me
I'm going to look into how the real parser handles quote marks; will re-open if the fix needs to be corrected. Turns out the only reason the test case was succeeding was a bug in the code! The patch did not fix the bug, as demonstrated by the layout test. Comment on attachment 12830 [details]
proposed fix
Clearing review flag on this, since it was already landed.
(In reply to comment #7) > Turns out the only reason the test case was succeeding was a bug in the code! > The patch did not fix the bug, as demonstrated by the layout test. I have a fix for this; looking into whether the issue with spurious quote marks can be fixed without copying most of HTMLTokenizer. Created attachment 12857 [details]
fix the fix
The spurious quotes quirk is very context-dependent, I don't see a way to support it without stateful code similar to that found in HTMLTokenizer. Maybe we should find a way to re-use that.
Comment on attachment 12857 [details]
fix the fix
r=me
Committed revision 19348. |