Bug 12165 - REGRESSION: text encoding problem at jn.sapo.pt
Summary: REGRESSION: text encoding problem at jn.sapo.pt
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Page Loading (show other bugs)
Version: 420+
Hardware: Mac OS X 10.4
: P1 Normal
Assignee: Alexey Proskuryakov
URL:
Keywords: Regression
Depends on:
Blocks:
 
Reported: 2007-01-08 10:39 PST by José Luís Andrade
Modified: 2007-01-13 08:52 PST (History)
1 user (show)

See Also:


Attachments
proposed fix (5.20 KB, patch)
2007-01-13 04:32 PST, Alexey Proskuryakov
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description José Luís Andrade 2007-01-08 10:39:14 PST
Safari and Firefox show fine this site <http://jn.sapo.pt> but Webkit no. It has some problem with the reading of the Text Enconding.
Comment 1 Alexey Proskuryakov 2007-01-08 12:19:16 PST
Confirmed as a regression with r18673.

That appears to be caused by some garbage before the beginning of HTML document:

----------------------------------------------
<!-- temp --><script language="JavaScript" type="text/JavaScript"> document.write ('<SCR' + 'IPT SRC="http://ads.sapo.pt/js.ng/site=lusomundo&chan=jn&adsize=1x1&type=richmedia&TileID='+TileID+'"></SCR' + 'IPT>'); </script>
<!-- /temp --><!--HEADER-->

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
----------------------------------------------
Comment 2 José Luís Andrade 2007-01-08 13:25:33 PST
The validator.w3.org says about <http://sapo.pt>:

Sorry! This document can not be checked.

Sorry, I am unable to validate this document because on line 636, 660 it contained one or more bytes that I cannot interpret as  utf-8 (in other words, the bytes found are not valid values in the specified Character Encoding). Please check both the content of the file and the character encoding indication.
Comment 3 José Luís Andrade 2007-01-08 13:32:13 PST
Correction:

The validator.w3.org says about <http://jn.sapo.pt/> ...

and not

"The validator.w3.org says about <http://sapo.pt> ..."
Comment 4 Alexey Proskuryakov 2007-01-13 04:32:26 PST
Created attachment 12414 [details]
proposed fix

Invalid HTML has lots of ways to fool our charset meta detector. I'm wondering why we aren't getting a lot more reports of such, though.
Comment 5 Darin Adler 2007-01-13 07:11:58 PST
Comment on attachment 12414 [details]
proposed fix

I guess this is OK, but I'm worried that it's a little risky to ignore tags in scripts when we don't know enough about script syntax to properly handle comments inside the script and know when the script ends.

But ... r=me
Comment 6 Alexey Proskuryakov 2007-01-13 08:52:12 PST
Committed revision 18833.