Bug 39617

Summary: [GTK] Google sites do not like WebKitGTK+
Product: WebKit Reporter: Gustavo Noronha (kov) <gustavo>
Component: WebKitGTKAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, a.butenka, dglazkov, eric, evan, fishd, joone, jparent, krit, leoserra, mrobinson, ojan, oliver, webkit.review.bot, xan.lopez
Priority: P2 Keywords: GoogleBug, Gtk
Version: 528+ (Nightly build)   
Hardware: PC   
OS: OS X 10.5   
Attachments:
Description Flags
proposed fix
eric: review-, gustavo: commit-queue-
proposed fix v2
eric: review-
proposed patch
eric: review-
proposed patch
none
proposed patch mrobinson: review+

Description Gustavo Noronha (kov) 2010-05-24 13:50:09 PDT
Many google sites give warnings about the browser not being supported, when the user agent identifies itself with anything other than the major browsers, sometimes even providing less useful/cool content. Even our addition of 'Safari' to the User Agent does not seem to be enough. Discussing the issue with Google people has not led to a solution. To resolve the issue, I believe we should special case google domains, and identify to them as if we were Google Chrome.
Comment 1 Gustavo Noronha (kov) 2010-05-24 13:53:47 PDT
Created attachment 56922 [details]
proposed fix

With this patch, even the nice fade in of various links in Google's main page start being given to us, like it is given to, say, Firefox.
Comment 2 WebKit Review Bot 2010-05-24 13:55:51 PDT
Attachment 56922 [details] did not pass style-queue:

Failed to run "['WebKitTools/Scripts/check-webkit-style', '--no-squash']" exit_code: 1
WebKit/gtk/webkit/webkitprivate.h:65:  Alphabetical sorting problem.  [build/include_order] [4]
Total errors found: 1 in 4 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 3 Alexander Butenko 2010-05-24 14:46:58 PDT
> With this patch, even the nice fade in of various links in Google's main page start being given to us, like it is given to, say, Firefox.

Im not a reviewer, but this list of google domain looks weird. For example you already missed google.com.do in there. gmail can have issues as well i assume.
Comment 4 Gustavo Noronha (kov) 2010-05-24 15:06:56 PDT
I have not been able to find a full list of google-related domains. I think you are right about gmail, perhaps blogspot, appspot should also be included... do you have an idea of where we could get an extensive list?
Comment 5 Alexander Butenko 2010-05-24 15:42:37 PDT
http://www.google.com/supported_domains

but thats only google search.
Comment 6 Leonardo Serra 2010-05-24 15:45:45 PDT
Hi,
Here[1] is an old unofficial list of Google domains, created in 2006.

[1] http://www.pronetadvertising.com/articles/googles-growing-list-of-domains.html

thanks,
leoserra
Comment 7 Dirk Schulze 2010-06-08 09:19:45 PDT
Should this realy be fixed in WebKit? Not rather in the browser, that uses WebKit? It's still a bad style to lock out unknown browsers :-/
Comment 8 Oliver Hunt 2010-06-21 18:03:12 PDT
What's setting the user agent?  You should really be trying (as much as possible) to match the Safari UA, maybe with an additional bit to say "Actual Browser Name"
Comment 9 Eric Seidel (no email) 2010-06-21 18:22:20 PDT
Seems like a bad idea.
Comment 10 Eric Seidel (no email) 2010-06-21 18:23:25 PDT
Comment on attachment 56922 [details]
proposed fix

Seems like either a really bad idea, or you're trolling or both.  I mean, it's your browser, but this would be signing both Gtk's webkit to forever being a second-class citizen since its user agent can't be trusted.
Comment 11 Oliver Hunt 2010-06-21 18:31:02 PDT
(In reply to comment #10)
> (From update of attachment 56922 [details])
> Seems like either a really bad idea, or you're trolling or both.  I mean, it's your browser, but this would be signing both Gtk's webkit to forever being a second-class citizen since its user agent can't be trusted.

No, the problem is the googles properties that are user agent rather than capability sniffing, this puts epiphany, et al in the position of being a second class citizen through no fault of their own.

Safari has in the past had to send spoofed UA strings for site compatibility.  Even google's own Chrome browser has had to spoof its UA for site compatibility.  I would say this UA spoofing should probably go into Epiphany (or whatever) but the fact is that UA spoofing is needed as long as major sites artificially limit the user experience non-major browsers
Comment 12 Gustavo Noronha (kov) 2010-06-22 06:52:11 PDT
(In reply to comment #10)
> (From update of attachment 56922 [details])
> Seems like either a really bad idea, or you're trolling or both.  I mean, it's your browser, but this would be signing both Gtk's webkit to forever being a second-class citizen since its user agent can't be trusted.

This sounds rather awkward to me, coming from someone from Google. I would have no problem with our U-A not being trusted - perhaps it would help further the goal of stopping the U-A sniffing problem, indeed. It seems like there is a possibility of having Google handle WebKitGTK+ as safari, so I'm holding on this, but my intention is to land a patch that special-cases Google if this does not get fixed server-side.
Comment 13 Julie Parent 2010-06-22 09:51:15 PDT
Can you provide a concrete list of sites where you are seeing problems, what the problems are, and the UA you are using?
Comment 14 Gustavo Noronha (kov) 2010-07-16 07:27:51 PDT
(In reply to comment #13)
> Can you provide a concrete list of sites where you are seeing problems, what the problems are, and the UA you are using?

Wave says Epiphany is not supported. It does have a way of saying 'let me through anyway', but many users see that page and go try with Firefox instead because they think the browser doesn't work.

Google's main page does a nice fade-in effect. That is not presented to Epiphany. I'm not a google power user, so I am pretty sure there are other places in which Epiphany ends up seeing a crippled version of a feature.

My current U-A:

Mozilla/5.0 (X11; U; Linux x86_64; en-gb) AppleWebKit/531.2+ (KHTML, like Gecko) Safari/531.2+ Debian/squeeze (2.30.2-3) Epiphany/2.30.2

I've had many googlers say they would look into these problems since January. Invariably all communications came to a halt somewhere down the line after a promise of these issues being on their way to be fixed. The most promising exchange brought up the possibility of putting Epiphany in the "Safari" bucket for google.com (they seem to have Safari, Chrome, Firefox, IE, Opera, and "other" buckets, and we end up on the last one). The whole thing is quite depressing if you ask me, especially coming from someone who should be leading by example, and who has the same kind of bullshit imposed on their own browser by others.
Comment 15 Gustavo Noronha (kov) 2010-07-16 07:29:08 PDT
Oh, and "fixing it for Epiphany" alone is not a good enough solution for me. I don't want to have Epiphany working and Devhelp broken, and have each browser need to solve this issue, which is why I'd target a solution to the engine, not the browser.
Comment 16 Gustavo Noronha (kov) 2010-08-10 13:59:31 PDT
Youtube is also doing sniffing, now: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=567311#55
Comment 17 Gustavo Noronha (kov) 2010-08-10 15:55:06 PDT
Created attachment 64054 [details]
proposed fix v2

With Youtube denying us HTML5 video this has become a larger problem. This is an updated patch that spoofs the U-A also to Youtube, only spoofs if the site-specific quirks is enabled (as discussed with Xan when I wrote the original patch), and uses a HashSet in hopes of having better performance. Debian's package will be shipping this patch as of tonight (maybe tomorrow).
Comment 18 Oliver Hunt 2010-08-10 16:10:42 PDT
Comment on attachment 64054 [details]
proposed fix v2

I'd recommend spoofing as safari in case google sites make js engine assumptions if they see "chrome" in the ua
Comment 19 Eric Seidel (no email) 2010-08-12 05:53:03 PDT
Comment on attachment 64054 [details]
proposed fix v2

I don't understand the point of the explicit domain list.  You'll never hit it.

google.com redirects to www.google.com.

Also, I agree with oliver, you should spoof Safari instead of Chrome since you're more like Safari than Chrome.

(You'd also be the only "linux safari" which might allow site authors to better handle/understand your spoofing)
Comment 20 Eric Seidel (no email) 2010-08-12 05:53:45 PDT
Comment on attachment 64054 [details]
proposed fix v2

r- per my comments above.
Comment 21 Gustavo Noronha (kov) 2010-08-12 10:56:43 PDT
(In reply to comment #19)
> (From update of attachment 64054 [details])
> I don't understand the point of the explicit domain list.  You'll never hit it.
> 
> google.com redirects to www.google.com.

Do you mean having 'google.com' in the explicit domain list?

> Also, I agree with oliver, you should spoof Safari instead of Chrome since you're more like Safari than Chrome.

That sounds right indeed!
 
> (You'd also be the only "linux safari" which might allow site authors to better handle/understand your spoofing)

I am not sure we should keep 'Linux' in there, though, it seems to be enough to make some sites go wrong.
Comment 22 Gustavo Noronha (kov) 2010-08-12 13:24:13 PDT
Created attachment 64252 [details]
proposed patch

I brought the standard webkitgtk User Agent up-to-date with Safari's, and we now use it instead of any custom user agent when dealing with Google sites. I also took the time to refactor the functions that make up the various pieces of the U-A string. I don't think the list of domains needs to be changed. I'll research other sites requiring this kind of spoofing (hotmail comes to mind) and do follow-up patches, after we land this.
Comment 23 Eric Seidel (no email) 2010-08-12 22:03:40 PDT
Comment on attachment 64252 [details]
proposed patch

2     // First we try a full-match, for cases such as 'youtube.com' as opposed to
 193     // 'www.youtube.com'.
 194     if (googleDomains.contains(host))
 195         return true;

That code will never be hit with the current list of domains.

I would not have architected this check this way.

google.* is the only one which want's a "namespace" an only so you can look for the string ".google." and then look up the final tail in a list.  All the other domains only need .com and could have used domain postfixes instead of needing the namespace architecture.

Repeating the string "google." in all of your googleDomains seems unecessary.

The above match against the shortened domain seems wrong in all of the domain examples given.
Comment 24 Eric Seidel (no email) 2010-08-12 22:08:57 PDT
Comment on attachment 64252 [details]
proposed patch

My r- is perhaps a bit strong, but this code still seems confused.

WebKit/gtk/WebCoreSupport/FrameLoaderClientGtk.cpp:171
 +      googleDomains.add("gmail.com");
This always redirects to mail.google.com.  No need for explicit inclusion here.

WebKit/gtk/webkit/webkitwebsettings.cpp:223
 +      DEFINE_STATIC_LOCAL(const String, staticUA, (String::format("Mozilla/5.0 (%s; U; %s; %s) AppleWebKit/%s (KHTML, like Gecko) Version/5.0 Safari/%s",
I'm surprised we don't yet have a version of String::format which can take String objects passed into it.

My only current concern with this patch is that you're casting your net a bit too wide.  Generally with site-specific spoofing the goal is to only spoof for the absolute minimum number of pages, and make it possible for there to be a path away from the spoofing.  I think that Gtk needs this patch (and will probably need it for a long time), but the end goal should be to have a world where this patch is not needed.
Comment 25 Eric Seidel (no email) 2010-08-12 22:09:26 PDT
CCing Evan just in case he cares to comment about the user agent string.
Comment 26 Eric Seidel (no email) 2010-08-12 22:18:31 PDT
Also, is ytimg.com a valid host?  It doesn't respond to pings or http requests.
Comment 27 Gustavo Noronha (kov) 2010-08-13 05:56:55 PDT
Created attachment 64328 [details]
proposed patch

OK, I think I now understand what you're saying - you mean the postfixes are not needed for anything that is not .google., sorry for being confused! I reworked the patch to only do the postfix checking for .google., and I do the postfix-only checks for the others, as you suggest. Let me also comment on your other messages:

> My r- is perhaps a bit strong, but this code still seems confused.

It's fine, I just failed to understand what you meant.

> WebKit/gtk/WebCoreSupport/FrameLoaderClientGtk.cpp:171
>  +      googleDomains.add("gmail.com");
> This always redirects to mail.google.com.  No need for explicit inclusion here.

I would prefer to have checks even for the redirect-only domains in case they are repurposed in any way. It's not required, sure, but it's not bad to have it either, so I would prefer keeping it.

> My only current concern with this patch is that you're casting your net a bit too wide.  Generally with site-specific spoofing the goal is to only spoof for the absolute minimum number of pages, and make it possible for there to be a path away from the spoofing.  I think that Gtk needs this patch (and will probably need it for a long time), but the end goal should be to have a world where this patch is not needed.

I understand your concern, but there are a few things I would like to point out about why I planned this net to be cast widely:

First off, we don't know to which extent Google is excluding us from content based on User-Agent sniffing - the fact that I only learned about the Google images improvement after writing the patch is an example of that - not being a hardcore google user myself, it's hard for me to even try to make a list of issues. Also, there are distributions which are going to be shipping browsers with WebKitGTK+ which will not have them updated to newer versions for months, years perhaps. Even though I consider that a shortcoming of the usual distribution process of FOSS distributions, it's something we have to account for when trying to decide what to do. Doing it this way we avoid having random Google services start failing to work for those people because they started doing broken sniffing (as was the case with Youtube).

The solution is far from perfect, I agree, and I would love to not ever need this patch - I have even been trying to fix the sniffing that Google does instead since around January this year because I really hate to have this kind of thing in our code, but with the results we got up to now, I think for now it's a good first step in assuring we won't have people saying Epiphany doesn't work when the next cool pacman-style doodle comes out.
Comment 28 Evan Martin 2010-08-13 10:10:34 PDT
Without commenting on the code details, it is my personal (not speaking on behalf of the company) belief that this is your best bet.

However, I don't think this problem is specific to Google -- you just notice Google the most because Google is the most aggressive at making sites that rely on newer browser features.  E.g. the Apple HTML5 demo site, the newer Yahoo mail, etc. all have similar problems.  You might consider how this code might generalize in the future (the isGoogleDomain() check I guess could be wrapped in a isSpoofNeededDomain() later).
Comment 29 Alexander Butenko 2010-08-13 16:55:51 PDT
(In reply to comment #28)
> 
> However, I don't think this problem is specific to Google -- you just notice Google the most because Google is the most aggressive at making sites that rely on newer browser features.  E.g. the Apple HTML5 demo site, the newer Yahoo mail, etc. all have similar problems.  You might consider how this code might generalize in the future (the isGoogleDomain() check I guess could be wrapped in a isSpoofNeededDomain() later).

Evan, with all the respect to google, but about what *newer browser features* you are talking about which are supported in chomium and not supported in webkitgtk? webkitgtk share webkit and js implementation with safari about which webkitgtk UA is trully telling by default.

Same problems on the google sites, yahoo mail, etc sounds like a generic bug in UA detection and not a problem with webkitgtk itself. 

Another thing, "your browser is too buggy to serve" way of work is a complete noncence. Developers need to see errors in order to make software "not too buggy".

The other thing is that such kind of a "customer protection" could be a hidden marketing step to force users to switch to a googleTM browser. Thats i believe the most reasonable "bug".

Well, just my 0.02c as im not a webkit dev.
Comment 30 Kenneth Rohde Christiansen 2010-08-14 07:50:23 PDT
Comment on attachment 64328 [details]
proposed patch

Have you tried contacting Google? There must be someone from the Chromium team who can help you.
Comment 31 Adam Barth 2010-08-14 23:20:22 PDT
Comment on attachment 64328 [details]
proposed patch

I hesitate to get involved because people have strong feels here, but this patch at the wrong layer.  If an embedder of gtk-webkit wants to spoof someone else's UA string, that seems like something that should be handled by the embedder.  For example, when Chrome has spoofed a UA in the past, it's beed done at the embedding layer.

We'd probably all be better off if everyone set their UA string to "WebBrowser", but I don't think we're going to get there from here.  Why not spoof Safari's UA string to every site?  That's likely to give you the best compatibility, in all honesty.  Sorry.  :(
Comment 32 Gustavo Noronha (kov) 2010-08-15 09:37:28 PDT
Comment on attachment 64328 [details]
proposed patch

With all due respect, Adam (and I do have loads of it for you and your work), this decision is for the WebKitGTK+'s team to make.

Unlike Chromium's port of WebKit, we ship a reusable library that is used by various browsers, and it is my belief that this should be handled in this layer. Please note that the spoofing is disabled by default, is only done when the embedder explicitely requests that such "compatibility" quirks are enabled, and can be undone by the embedder if it really wants to.
Comment 33 Gustavo Noronha (kov) 2010-08-15 09:47:19 PDT
(In reply to comment #32)
> Unlike Chromium's port of WebKit, we ship a reusable library that is used by various browsers, and it is my belief that this should be handled in this layer. Please note that the spoofing is disabled by default, is only done when the embedder explicitely requests that such "compatibility" quirks are enabled, and can be undone by the embedder if it really wants to.

Having said that, I have no problem with moving the actual implementation to webkit/webkitwebview.cpp or webkit/webkitwebsettings.cpp, so it would be in the same place as the Mac and Windows port's spoofing code goes*. I don't think it is viable to take this out of WebKitGTK+ and into the embedder, though.

* http://trac.webkit.org/browser/trunk/WebKit/mac/WebView/WebView.mm#L3461
Comment 34 Gustavo Noronha (kov) 2010-08-16 08:24:16 PDT
(In reply to comment #30)
> (From update of attachment 64328 [details])
> Have you tried contacting Google? There must be someone from the Chromium team who can help you.

Since January, I have talked to more than 10 Googlers about various issues.
Comment 35 Gustavo Noronha (kov) 2010-08-16 08:25:00 PDT
(In reply to comment #26)
> Also, is ytimg.com a valid host?  It doesn't respond to pings or http requests.

It's used by Youtube. Check s.ytimg.com.
Comment 36 Martin Robinson 2010-09-03 13:56:17 PDT
Comment on attachment 64328 [details]
proposed patch

View in context: https://bugs.webkit.org/attachment.cgi?id=64328&action=prettypatch

> WebKit/gtk/WebCoreSupport/FrameLoaderClientGtk.cpp:96
> +typedef HashSet<String> GoogleDomainsSet;
This typedef is only used once, so perhaps it's unnecessary? I think that Vector<String> is fairly readable.

> WebKit/gtk/webkit/webkitwebsettings.cpp:207
> +    uaOSVersion = String::fromUTF8(osVersion.get());
Instead of using the GOwnPtr / g_strdup_printf here, is it possible to just use String::format?

I like this patch a lot, especially the use of DEFINE_STATIC_LOCAL. If Xan agrees, I think we can r+ this.
Comment 37 Gustavo Noronha (kov) 2010-09-07 06:55:07 PDT
<arno_> google does broken ua sniffing again with today's logo
<slomo> works here
[...]
<kov> slomo, fwiw, it might work with the debian package and not with upstream because I am shipping the patch that does u-a spoofing
<slomo> kov: ah... good ;)

And that's why I'd like to go forward with this, do you oppose it Xan, or can I go ahead and fix the patch to take Martin's suggestions into account, then bribe him to r+ the patch? =)
Comment 38 Xan Lopez 2010-09-09 07:50:30 PDT
Personally I don't like this change too much. I think it's a lot of code for something quite specific (why stop at Google? shouldn't we have an endless list of relevant sites with broken spoofing?). I'd rather do something like trying real hard to be identified as Safari by default (which we kinda try to do anyway?), since we should be really similar to it (WebKit, JSC, etc).

In any case, if both Martin and you like this I won't oppose it.
Comment 39 Gustavo Noronha (kov) 2010-09-10 06:09:03 PDT
(In reply to comment #38)
> Personally I don't like this change too much. I think it's a lot of code for something quite specific (why stop at Google? shouldn't we have an endless list of relevant sites with broken spoofing?). I'd rather do something like trying real hard to be identified as Safari by default (which we kinda try to do anyway?), since we should be really similar to it (WebKit, JSC, etc).

I don't think of stopping at Google. I believe we'll need spoofing for any site that behaves badly, this is a first step though because many of our users rely on Google stuff.

> In any case, if both Martin and you like this I won't oppose it.

Thanks, I'll finish the patch up, and land, then =)
Comment 40 Martin Robinson 2010-09-10 08:24:14 PDT
Is this still an issue? I checked the recent builds for debug GTK+ and Mac Leopard and they both appear to be of the same order.
Comment 41 Gustavo Noronha (kov) 2010-09-10 08:25:26 PDT
Created attachment 67187 [details]
proposed patch

Here's the fixed patch with an optimistic Reviewed by line hehe. =)
Comment 42 Martin Robinson 2010-09-10 08:41:27 PDT
Comment on attachment 67187 [details]
proposed patch

View in context: https://bugs.webkit.org/attachment.cgi?id=67187&action=prettypatch

> WebKit/gtk/webkit/webkitwebsettings.cpp:-194
> -    if (uname(&name) != -1)
Hrm. I think it's slightly weird to not use the un-ifdef'd return here. Perhaps change this to:
if (uname(&name) != -1)
    uaOSVersion = String::format("%s %s", name.sysname, name.machine);
else
    uaOSVersion = String("Unknown");

Looks good! r=me with this change.
Comment 43 Gustavo Noronha (kov) 2010-09-10 10:15:31 PDT
Good point. I changed it accordingly and landed as r67211. Thanks! Enjoy Google Instant now =)
Comment 44 Martin Robinson 2010-09-10 10:22:08 PDT
(In reply to comment #40)
> Is this still an issue? I checked the recent builds for debug GTK+ and Mac Leopard and they both appear to be of the same order.

Please ignore this comment. I'm not sure why it ended up here, other than I posted it before drinking my coffee.
Comment 45 Eric Seidel (no email) 2010-09-10 11:12:28 PDT
Seems you should have a runtime way to disable this.  Otherwise how will you ever check if it can be removed?  Unless you want this spoofing to be a permanent part of WebKitGtk...
Comment 46 Martin Robinson 2010-09-10 11:14:16 PDT
I assume you mean other than the site specific quirks setting? Currently, we only spoof when that setting is enabled.
Comment 47 Oliver Hunt 2010-09-10 11:23:04 PDT
(In reply to comment #45)
> Seems you should have a runtime way to disable this.  Otherwise how will you ever check if it can be removed?  Unless you want this spoofing to be a permanent part of WebKitGtk...

Alternatively google could have fixed this and it wouldn't have been needed -- you commented in the bug in june saying you thought it was a bad idea so had to be aware of it, and i recall telling you or someone else from google about the problem and yet nothing seems to have been done to correct it.

UA checks are bad, google engineers have given tech talks saying UA sniffing is bad. And yet there is code from google that is a) doing UA sniffing and b) blocking other browsers built on the same engine as safari and chrome...
Comment 48 Oliver Hunt 2010-09-10 11:24:33 PDT
given that set of circumstances it's hard to see why this shouldn't be in webkit -- otherwise every other webkit/gtk (and probably -qt, wx, efl, ...) based browser would need to implement the same ua spoofing.
Comment 49 WebKit Review Bot 2010-09-10 13:39:18 PDT
http://trac.webkit.org/changeset/67211 might have broken Leopard Intel Debug (Tests)
The following changes are on the blame list:
http://trac.webkit.org/changeset/67211
http://trac.webkit.org/changeset/67212