RESOLVED FIXED 39617
[GTK] Google sites do not like WebKitGTK+
https://bugs.webkit.org/show_bug.cgi?id=39617
Summary [GTK] Google sites do not like WebKitGTK+
Gustavo Noronha (kov)
Reported 2010-05-24 13:50:09 PDT
Many google sites give warnings about the browser not being supported, when the user agent identifies itself with anything other than the major browsers, sometimes even providing less useful/cool content. Even our addition of 'Safari' to the User Agent does not seem to be enough. Discussing the issue with Google people has not led to a solution. To resolve the issue, I believe we should special case google domains, and identify to them as if we were Google Chrome.
Attachments
proposed fix (8.00 KB, patch)
2010-05-24 13:53 PDT, Gustavo Noronha (kov)
eric: review-
gustavo: commit-queue-
proposed fix v2 (9.80 KB, patch)
2010-08-10 15:55 PDT, Gustavo Noronha (kov)
eric: review-
proposed patch (11.97 KB, patch)
2010-08-12 13:24 PDT, Gustavo Noronha (kov)
eric: review-
proposed patch (11.18 KB, patch)
2010-08-13 05:56 PDT, Gustavo Noronha (kov)
no flags
proposed patch (11.94 KB, patch)
2010-09-10 08:25 PDT, Gustavo Noronha (kov)
mrobinson: review+
Gustavo Noronha (kov)
Comment 1 2010-05-24 13:53:47 PDT
Created attachment 56922 [details] proposed fix With this patch, even the nice fade in of various links in Google's main page start being given to us, like it is given to, say, Firefox.
WebKit Review Bot
Comment 2 2010-05-24 13:55:51 PDT
Attachment 56922 [details] did not pass style-queue: Failed to run "['WebKitTools/Scripts/check-webkit-style', '--no-squash']" exit_code: 1 WebKit/gtk/webkit/webkitprivate.h:65: Alphabetical sorting problem. [build/include_order] [4] Total errors found: 1 in 4 files If any of these errors are false positives, please file a bug against check-webkit-style.
Alexander Butenko
Comment 3 2010-05-24 14:46:58 PDT
> With this patch, even the nice fade in of various links in Google's main page start being given to us, like it is given to, say, Firefox. Im not a reviewer, but this list of google domain looks weird. For example you already missed google.com.do in there. gmail can have issues as well i assume.
Gustavo Noronha (kov)
Comment 4 2010-05-24 15:06:56 PDT
I have not been able to find a full list of google-related domains. I think you are right about gmail, perhaps blogspot, appspot should also be included... do you have an idea of where we could get an extensive list?
Alexander Butenko
Comment 5 2010-05-24 15:42:37 PDT
http://www.google.com/supported_domains but thats only google search.
Leonardo Serra
Comment 6 2010-05-24 15:45:45 PDT
Hi, Here[1] is an old unofficial list of Google domains, created in 2006. [1] http://www.pronetadvertising.com/articles/googles-growing-list-of-domains.html thanks, leoserra
Dirk Schulze
Comment 7 2010-06-08 09:19:45 PDT
Should this realy be fixed in WebKit? Not rather in the browser, that uses WebKit? It's still a bad style to lock out unknown browsers :-/
Oliver Hunt
Comment 8 2010-06-21 18:03:12 PDT
What's setting the user agent? You should really be trying (as much as possible) to match the Safari UA, maybe with an additional bit to say "Actual Browser Name"
Eric Seidel (no email)
Comment 9 2010-06-21 18:22:20 PDT
Seems like a bad idea.
Eric Seidel (no email)
Comment 10 2010-06-21 18:23:25 PDT
Comment on attachment 56922 [details] proposed fix Seems like either a really bad idea, or you're trolling or both. I mean, it's your browser, but this would be signing both Gtk's webkit to forever being a second-class citizen since its user agent can't be trusted.
Oliver Hunt
Comment 11 2010-06-21 18:31:02 PDT
(In reply to comment #10) > (From update of attachment 56922 [details]) > Seems like either a really bad idea, or you're trolling or both. I mean, it's your browser, but this would be signing both Gtk's webkit to forever being a second-class citizen since its user agent can't be trusted. No, the problem is the googles properties that are user agent rather than capability sniffing, this puts epiphany, et al in the position of being a second class citizen through no fault of their own. Safari has in the past had to send spoofed UA strings for site compatibility. Even google's own Chrome browser has had to spoof its UA for site compatibility. I would say this UA spoofing should probably go into Epiphany (or whatever) but the fact is that UA spoofing is needed as long as major sites artificially limit the user experience non-major browsers
Gustavo Noronha (kov)
Comment 12 2010-06-22 06:52:11 PDT
(In reply to comment #10) > (From update of attachment 56922 [details]) > Seems like either a really bad idea, or you're trolling or both. I mean, it's your browser, but this would be signing both Gtk's webkit to forever being a second-class citizen since its user agent can't be trusted. This sounds rather awkward to me, coming from someone from Google. I would have no problem with our U-A not being trusted - perhaps it would help further the goal of stopping the U-A sniffing problem, indeed. It seems like there is a possibility of having Google handle WebKitGTK+ as safari, so I'm holding on this, but my intention is to land a patch that special-cases Google if this does not get fixed server-side.
Julie Parent
Comment 13 2010-06-22 09:51:15 PDT
Can you provide a concrete list of sites where you are seeing problems, what the problems are, and the UA you are using?
Gustavo Noronha (kov)
Comment 14 2010-07-16 07:27:51 PDT
(In reply to comment #13) > Can you provide a concrete list of sites where you are seeing problems, what the problems are, and the UA you are using? Wave says Epiphany is not supported. It does have a way of saying 'let me through anyway', but many users see that page and go try with Firefox instead because they think the browser doesn't work. Google's main page does a nice fade-in effect. That is not presented to Epiphany. I'm not a google power user, so I am pretty sure there are other places in which Epiphany ends up seeing a crippled version of a feature. My current U-A: Mozilla/5.0 (X11; U; Linux x86_64; en-gb) AppleWebKit/531.2+ (KHTML, like Gecko) Safari/531.2+ Debian/squeeze (2.30.2-3) Epiphany/2.30.2 I've had many googlers say they would look into these problems since January. Invariably all communications came to a halt somewhere down the line after a promise of these issues being on their way to be fixed. The most promising exchange brought up the possibility of putting Epiphany in the "Safari" bucket for google.com (they seem to have Safari, Chrome, Firefox, IE, Opera, and "other" buckets, and we end up on the last one). The whole thing is quite depressing if you ask me, especially coming from someone who should be leading by example, and who has the same kind of bullshit imposed on their own browser by others.
Gustavo Noronha (kov)
Comment 15 2010-07-16 07:29:08 PDT
Oh, and "fixing it for Epiphany" alone is not a good enough solution for me. I don't want to have Epiphany working and Devhelp broken, and have each browser need to solve this issue, which is why I'd target a solution to the engine, not the browser.
Gustavo Noronha (kov)
Comment 16 2010-08-10 13:59:31 PDT
Gustavo Noronha (kov)
Comment 17 2010-08-10 15:55:06 PDT
Created attachment 64054 [details] proposed fix v2 With Youtube denying us HTML5 video this has become a larger problem. This is an updated patch that spoofs the U-A also to Youtube, only spoofs if the site-specific quirks is enabled (as discussed with Xan when I wrote the original patch), and uses a HashSet in hopes of having better performance. Debian's package will be shipping this patch as of tonight (maybe tomorrow).
Oliver Hunt
Comment 18 2010-08-10 16:10:42 PDT
Comment on attachment 64054 [details] proposed fix v2 I'd recommend spoofing as safari in case google sites make js engine assumptions if they see "chrome" in the ua
Eric Seidel (no email)
Comment 19 2010-08-12 05:53:03 PDT
Comment on attachment 64054 [details] proposed fix v2 I don't understand the point of the explicit domain list. You'll never hit it. google.com redirects to www.google.com. Also, I agree with oliver, you should spoof Safari instead of Chrome since you're more like Safari than Chrome. (You'd also be the only "linux safari" which might allow site authors to better handle/understand your spoofing)
Eric Seidel (no email)
Comment 20 2010-08-12 05:53:45 PDT
Comment on attachment 64054 [details] proposed fix v2 r- per my comments above.
Gustavo Noronha (kov)
Comment 21 2010-08-12 10:56:43 PDT
(In reply to comment #19) > (From update of attachment 64054 [details]) > I don't understand the point of the explicit domain list. You'll never hit it. > > google.com redirects to www.google.com. Do you mean having 'google.com' in the explicit domain list? > Also, I agree with oliver, you should spoof Safari instead of Chrome since you're more like Safari than Chrome. That sounds right indeed! > (You'd also be the only "linux safari" which might allow site authors to better handle/understand your spoofing) I am not sure we should keep 'Linux' in there, though, it seems to be enough to make some sites go wrong.
Gustavo Noronha (kov)
Comment 22 2010-08-12 13:24:13 PDT
Created attachment 64252 [details] proposed patch I brought the standard webkitgtk User Agent up-to-date with Safari's, and we now use it instead of any custom user agent when dealing with Google sites. I also took the time to refactor the functions that make up the various pieces of the U-A string. I don't think the list of domains needs to be changed. I'll research other sites requiring this kind of spoofing (hotmail comes to mind) and do follow-up patches, after we land this.
Eric Seidel (no email)
Comment 23 2010-08-12 22:03:40 PDT
Comment on attachment 64252 [details] proposed patch 2 // First we try a full-match, for cases such as 'youtube.com' as opposed to 193 // 'www.youtube.com'. 194 if (googleDomains.contains(host)) 195 return true; That code will never be hit with the current list of domains. I would not have architected this check this way. google.* is the only one which want's a "namespace" an only so you can look for the string ".google." and then look up the final tail in a list. All the other domains only need .com and could have used domain postfixes instead of needing the namespace architecture. Repeating the string "google." in all of your googleDomains seems unecessary. The above match against the shortened domain seems wrong in all of the domain examples given.
Eric Seidel (no email)
Comment 24 2010-08-12 22:08:57 PDT
Comment on attachment 64252 [details] proposed patch My r- is perhaps a bit strong, but this code still seems confused. WebKit/gtk/WebCoreSupport/FrameLoaderClientGtk.cpp:171 + googleDomains.add("gmail.com"); This always redirects to mail.google.com. No need for explicit inclusion here. WebKit/gtk/webkit/webkitwebsettings.cpp:223 + DEFINE_STATIC_LOCAL(const String, staticUA, (String::format("Mozilla/5.0 (%s; U; %s; %s) AppleWebKit/%s (KHTML, like Gecko) Version/5.0 Safari/%s", I'm surprised we don't yet have a version of String::format which can take String objects passed into it. My only current concern with this patch is that you're casting your net a bit too wide. Generally with site-specific spoofing the goal is to only spoof for the absolute minimum number of pages, and make it possible for there to be a path away from the spoofing. I think that Gtk needs this patch (and will probably need it for a long time), but the end goal should be to have a world where this patch is not needed.
Eric Seidel (no email)
Comment 25 2010-08-12 22:09:26 PDT
CCing Evan just in case he cares to comment about the user agent string.
Eric Seidel (no email)
Comment 26 2010-08-12 22:18:31 PDT
Also, is ytimg.com a valid host? It doesn't respond to pings or http requests.
Gustavo Noronha (kov)
Comment 27 2010-08-13 05:56:55 PDT
Created attachment 64328 [details] proposed patch OK, I think I now understand what you're saying - you mean the postfixes are not needed for anything that is not .google., sorry for being confused! I reworked the patch to only do the postfix checking for .google., and I do the postfix-only checks for the others, as you suggest. Let me also comment on your other messages: > My r- is perhaps a bit strong, but this code still seems confused. It's fine, I just failed to understand what you meant. > WebKit/gtk/WebCoreSupport/FrameLoaderClientGtk.cpp:171 > + googleDomains.add("gmail.com"); > This always redirects to mail.google.com. No need for explicit inclusion here. I would prefer to have checks even for the redirect-only domains in case they are repurposed in any way. It's not required, sure, but it's not bad to have it either, so I would prefer keeping it. > My only current concern with this patch is that you're casting your net a bit too wide. Generally with site-specific spoofing the goal is to only spoof for the absolute minimum number of pages, and make it possible for there to be a path away from the spoofing. I think that Gtk needs this patch (and will probably need it for a long time), but the end goal should be to have a world where this patch is not needed. I understand your concern, but there are a few things I would like to point out about why I planned this net to be cast widely: First off, we don't know to which extent Google is excluding us from content based on User-Agent sniffing - the fact that I only learned about the Google images improvement after writing the patch is an example of that - not being a hardcore google user myself, it's hard for me to even try to make a list of issues. Also, there are distributions which are going to be shipping browsers with WebKitGTK+ which will not have them updated to newer versions for months, years perhaps. Even though I consider that a shortcoming of the usual distribution process of FOSS distributions, it's something we have to account for when trying to decide what to do. Doing it this way we avoid having random Google services start failing to work for those people because they started doing broken sniffing (as was the case with Youtube). The solution is far from perfect, I agree, and I would love to not ever need this patch - I have even been trying to fix the sniffing that Google does instead since around January this year because I really hate to have this kind of thing in our code, but with the results we got up to now, I think for now it's a good first step in assuring we won't have people saying Epiphany doesn't work when the next cool pacman-style doodle comes out.
Evan Martin
Comment 28 2010-08-13 10:10:34 PDT
Without commenting on the code details, it is my personal (not speaking on behalf of the company) belief that this is your best bet. However, I don't think this problem is specific to Google -- you just notice Google the most because Google is the most aggressive at making sites that rely on newer browser features. E.g. the Apple HTML5 demo site, the newer Yahoo mail, etc. all have similar problems. You might consider how this code might generalize in the future (the isGoogleDomain() check I guess could be wrapped in a isSpoofNeededDomain() later).
Alexander Butenko
Comment 29 2010-08-13 16:55:51 PDT
(In reply to comment #28) > > However, I don't think this problem is specific to Google -- you just notice Google the most because Google is the most aggressive at making sites that rely on newer browser features. E.g. the Apple HTML5 demo site, the newer Yahoo mail, etc. all have similar problems. You might consider how this code might generalize in the future (the isGoogleDomain() check I guess could be wrapped in a isSpoofNeededDomain() later). Evan, with all the respect to google, but about what *newer browser features* you are talking about which are supported in chomium and not supported in webkitgtk? webkitgtk share webkit and js implementation with safari about which webkitgtk UA is trully telling by default. Same problems on the google sites, yahoo mail, etc sounds like a generic bug in UA detection and not a problem with webkitgtk itself. Another thing, "your browser is too buggy to serve" way of work is a complete noncence. Developers need to see errors in order to make software "not too buggy". The other thing is that such kind of a "customer protection" could be a hidden marketing step to force users to switch to a googleTM browser. Thats i believe the most reasonable "bug". Well, just my 0.02c as im not a webkit dev.
Kenneth Rohde Christiansen
Comment 30 2010-08-14 07:50:23 PDT
Comment on attachment 64328 [details] proposed patch Have you tried contacting Google? There must be someone from the Chromium team who can help you.
Adam Barth
Comment 31 2010-08-14 23:20:22 PDT
Comment on attachment 64328 [details] proposed patch I hesitate to get involved because people have strong feels here, but this patch at the wrong layer. If an embedder of gtk-webkit wants to spoof someone else's UA string, that seems like something that should be handled by the embedder. For example, when Chrome has spoofed a UA in the past, it's beed done at the embedding layer. We'd probably all be better off if everyone set their UA string to "WebBrowser", but I don't think we're going to get there from here. Why not spoof Safari's UA string to every site? That's likely to give you the best compatibility, in all honesty. Sorry. :(
Gustavo Noronha (kov)
Comment 32 2010-08-15 09:37:28 PDT
Comment on attachment 64328 [details] proposed patch With all due respect, Adam (and I do have loads of it for you and your work), this decision is for the WebKitGTK+'s team to make. Unlike Chromium's port of WebKit, we ship a reusable library that is used by various browsers, and it is my belief that this should be handled in this layer. Please note that the spoofing is disabled by default, is only done when the embedder explicitely requests that such "compatibility" quirks are enabled, and can be undone by the embedder if it really wants to.
Gustavo Noronha (kov)
Comment 33 2010-08-15 09:47:19 PDT
(In reply to comment #32) > Unlike Chromium's port of WebKit, we ship a reusable library that is used by various browsers, and it is my belief that this should be handled in this layer. Please note that the spoofing is disabled by default, is only done when the embedder explicitely requests that such "compatibility" quirks are enabled, and can be undone by the embedder if it really wants to. Having said that, I have no problem with moving the actual implementation to webkit/webkitwebview.cpp or webkit/webkitwebsettings.cpp, so it would be in the same place as the Mac and Windows port's spoofing code goes*. I don't think it is viable to take this out of WebKitGTK+ and into the embedder, though. * http://trac.webkit.org/browser/trunk/WebKit/mac/WebView/WebView.mm#L3461
Gustavo Noronha (kov)
Comment 34 2010-08-16 08:24:16 PDT
(In reply to comment #30) > (From update of attachment 64328 [details]) > Have you tried contacting Google? There must be someone from the Chromium team who can help you. Since January, I have talked to more than 10 Googlers about various issues.
Gustavo Noronha (kov)
Comment 35 2010-08-16 08:25:00 PDT
(In reply to comment #26) > Also, is ytimg.com a valid host? It doesn't respond to pings or http requests. It's used by Youtube. Check s.ytimg.com.
Martin Robinson
Comment 36 2010-09-03 13:56:17 PDT
Comment on attachment 64328 [details] proposed patch View in context: https://bugs.webkit.org/attachment.cgi?id=64328&action=prettypatch > WebKit/gtk/WebCoreSupport/FrameLoaderClientGtk.cpp:96 > +typedef HashSet<String> GoogleDomainsSet; This typedef is only used once, so perhaps it's unnecessary? I think that Vector<String> is fairly readable. > WebKit/gtk/webkit/webkitwebsettings.cpp:207 > + uaOSVersion = String::fromUTF8(osVersion.get()); Instead of using the GOwnPtr / g_strdup_printf here, is it possible to just use String::format? I like this patch a lot, especially the use of DEFINE_STATIC_LOCAL. If Xan agrees, I think we can r+ this.
Gustavo Noronha (kov)
Comment 37 2010-09-07 06:55:07 PDT
<arno_> google does broken ua sniffing again with today's logo <slomo> works here [...] <kov> slomo, fwiw, it might work with the debian package and not with upstream because I am shipping the patch that does u-a spoofing <slomo> kov: ah... good ;) And that's why I'd like to go forward with this, do you oppose it Xan, or can I go ahead and fix the patch to take Martin's suggestions into account, then bribe him to r+ the patch? =)
Xan Lopez
Comment 38 2010-09-09 07:50:30 PDT
Personally I don't like this change too much. I think it's a lot of code for something quite specific (why stop at Google? shouldn't we have an endless list of relevant sites with broken spoofing?). I'd rather do something like trying real hard to be identified as Safari by default (which we kinda try to do anyway?), since we should be really similar to it (WebKit, JSC, etc). In any case, if both Martin and you like this I won't oppose it.
Gustavo Noronha (kov)
Comment 39 2010-09-10 06:09:03 PDT
(In reply to comment #38) > Personally I don't like this change too much. I think it's a lot of code for something quite specific (why stop at Google? shouldn't we have an endless list of relevant sites with broken spoofing?). I'd rather do something like trying real hard to be identified as Safari by default (which we kinda try to do anyway?), since we should be really similar to it (WebKit, JSC, etc). I don't think of stopping at Google. I believe we'll need spoofing for any site that behaves badly, this is a first step though because many of our users rely on Google stuff. > In any case, if both Martin and you like this I won't oppose it. Thanks, I'll finish the patch up, and land, then =)
Martin Robinson
Comment 40 2010-09-10 08:24:14 PDT
Is this still an issue? I checked the recent builds for debug GTK+ and Mac Leopard and they both appear to be of the same order.
Gustavo Noronha (kov)
Comment 41 2010-09-10 08:25:26 PDT
Created attachment 67187 [details] proposed patch Here's the fixed patch with an optimistic Reviewed by line hehe. =)
Martin Robinson
Comment 42 2010-09-10 08:41:27 PDT
Comment on attachment 67187 [details] proposed patch View in context: https://bugs.webkit.org/attachment.cgi?id=67187&action=prettypatch > WebKit/gtk/webkit/webkitwebsettings.cpp:-194 > - if (uname(&name) != -1) Hrm. I think it's slightly weird to not use the un-ifdef'd return here. Perhaps change this to: if (uname(&name) != -1) uaOSVersion = String::format("%s %s", name.sysname, name.machine); else uaOSVersion = String("Unknown"); Looks good! r=me with this change.
Gustavo Noronha (kov)
Comment 43 2010-09-10 10:15:31 PDT
Good point. I changed it accordingly and landed as r67211. Thanks! Enjoy Google Instant now =)
Martin Robinson
Comment 44 2010-09-10 10:22:08 PDT
(In reply to comment #40) > Is this still an issue? I checked the recent builds for debug GTK+ and Mac Leopard and they both appear to be of the same order. Please ignore this comment. I'm not sure why it ended up here, other than I posted it before drinking my coffee.
Eric Seidel (no email)
Comment 45 2010-09-10 11:12:28 PDT
Seems you should have a runtime way to disable this. Otherwise how will you ever check if it can be removed? Unless you want this spoofing to be a permanent part of WebKitGtk...
Martin Robinson
Comment 46 2010-09-10 11:14:16 PDT
I assume you mean other than the site specific quirks setting? Currently, we only spoof when that setting is enabled.
Oliver Hunt
Comment 47 2010-09-10 11:23:04 PDT
(In reply to comment #45) > Seems you should have a runtime way to disable this. Otherwise how will you ever check if it can be removed? Unless you want this spoofing to be a permanent part of WebKitGtk... Alternatively google could have fixed this and it wouldn't have been needed -- you commented in the bug in june saying you thought it was a bad idea so had to be aware of it, and i recall telling you or someone else from google about the problem and yet nothing seems to have been done to correct it. UA checks are bad, google engineers have given tech talks saying UA sniffing is bad. And yet there is code from google that is a) doing UA sniffing and b) blocking other browsers built on the same engine as safari and chrome...
Oliver Hunt
Comment 48 2010-09-10 11:24:33 PDT
given that set of circumstances it's hard to see why this shouldn't be in webkit -- otherwise every other webkit/gtk (and probably -qt, wx, efl, ...) based browser would need to implement the same ua spoofing.
WebKit Review Bot
Comment 49 2010-09-10 13:39:18 PDT
http://trac.webkit.org/changeset/67211 might have broken Leopard Intel Debug (Tests) The following changes are on the blame list: http://trac.webkit.org/changeset/67211 http://trac.webkit.org/changeset/67212
Note You need to log in before you can comment on or make changes to this bug.