Summary: | WebKit doesn't parse "#" as delimiter for fragment identifier in data URIs | ||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Daniel Holbert <dholbert> | ||||||||||||||||||||||||||
Component: | Platform | Assignee: | Rob Buis <rbuis> | ||||||||||||||||||||||||||
Status: | RESOLVED FIXED | ||||||||||||||||||||||||||||
Severity: | Normal | CC: | abarth, aboxhall, achristensen, apinheiro, ap, ayg, benjamin, brettw, cdumez, cfleizach, cmarcelo, darin, dmazzoni, eoconnor, eric.carlson, ews-watchlist, gavin.sharp, glenn, ht, ian, japhet, jcraig, jdiggs, jer.noble, jfernandez, julian.reschke, mathias, m.goleb+bugzilla, mkwst, moz, philipj, rbuis, rego, rniwa, roc, samuel_white, sergio, shadow2531, smcgruer, svillar, webkit-bug-importer | ||||||||||||||||||||||||||
Priority: | P2 | Keywords: | InRadar | ||||||||||||||||||||||||||
Version: | 528+ (Nightly build) | ||||||||||||||||||||||||||||
Hardware: | Unspecified | ||||||||||||||||||||||||||||
OS: | Unspecified | ||||||||||||||||||||||||||||
URL: | data:text/html,<html>no hash should be visible:# | ||||||||||||||||||||||||||||
See Also: | https://bugs.webkit.org/show_bug.cgi?id=219156 | ||||||||||||||||||||||||||||
Bug Depends on: | 217239 | ||||||||||||||||||||||||||||
Bug Blocks: | 91791 | ||||||||||||||||||||||||||||
Attachments: |
|
Description
Daniel Holbert
2011-09-14 10:32:55 PDT
ALTERNATE STEPS TO REPRODUCE (slightly trickier, but same underlying issue): Load this URI: ========= data:application/xhtml+xml,<html xmlns='http://www.w3.org/1999/xhtml'><div style='height:3000px; width: 300px; background: red'/><div id='target' style='height: 3000px; width: 300px; background: green;'/></html>#target ========= ACTUAL RESULTS: Error page, for an XML parse error, saying: > This page contains the following errors: > error on line 1 at column 47: Extra content at the end of the document > Below is a rendering of the page up to the first error. > [rendering of the page] EXPECTED RESULTS: (a) No errors reported (b) tall green rect should be visible (and no red should be visible). That is -- the initial load should seek past the red to the 'target' element. Again, Opera & Firefox give EXPECTED RESULTS, and Chromium 14 gives ACTUAL RESULTS. (note: in both comment 0 and comment 1, Opera only gives "expected results" if I view the data URIs *directly*, whereas it matches WebKit's behavior if I view it in an <img> / <embed>. That inconsistency appears to be a bug in Opera.) Hmm, there is probably some history behind this that I don't know. Looks like Chrome and Safari might have disagreed on this (per bug 22882), but no longer do. What does IE do? There's a thread about this topic on whatwg currently. This issue is subtle. Thanks, that is <http://www.mail-archive.com/whatwg@lists.whatwg.org/msg28146.html>. As mentioned on the thread, one can expect consistency between data:, javascript: and possibly even mailto: in this regard. The underlying issue is that there's a dream that all URIs should be parseable with a single algorithm, but I don't think we can realize that dream given compatibility constraints. Compounded on that are tricky issues of using relative fragments in documents created from data URLs. WebKit does something somewhat magical to make this "work". It's very unclear to me what the best solution to all these constraints is. Ryosuke said "It seems like sticking to RFC is a cleaner option if possible." Is it possible for Webkit? If not, we need to change the spec. > Is it possible for Webkit? If not, we need to change the spec.
I haven't finished reading the discussion on whatwg yet. My understanding is that the specification is currently being revised in the IRI working group. I'll try to catch up on the discussion on Monday. Is there more to read besides this bug and the whatwg thread?
Not that I know of. Ok. I've caught up on my reading. I don't see a big reason to change behavior here. The dream of having all URIs parse the same way is just a dream. Some common URI schemes, such as javascript URIs, can't be done this way. There are also obscure ones, like ed2k, that can't work this way either. The main reason I see to change our parsing of data URLs is in connection with fragment references in SVG. I'm not sure what the best solution there is, but using the RFC 3986 regexp on data URLs doesn't seem like the best approach, especially give that doing so has resulted in Firefox getting compat complaints. OK, can you jump back into the WHATWG thread and propose a suitable spec change? I expect some opposition and I don't want Daniel or I to be the man in the middle :-). IMHO what Firefox does here makes much more sense. Otherwise, how do you do fragment identifiers with data: URLs? There's simply no good reason to not support fragment identifiers here as far as I can tell, and there's no reason not to — it's trivial to escape the # character. I agree with Adam. If same document fragment references in data: URLs somehow become really important, we'll likely want to shortcut calculation by not completing the "reltative" fragment-only references for performance reasons anyway. (In reply to comment #10) > Ok. I've caught up on my reading. I don't see a big reason to change behavior here. The dream of having all URIs parse the same way is just a dream. Some common URI schemes, such as javascript URIs, can't be done this way. There are also obscure ones, like ed2k, that can't work this way either. > > The main reason I see to change our parsing of data URLs is in connection with fragment references in SVG. I'm not sure what the best solution there is, but using the RFC 3986 regexp on data URLs doesn't seem like the best approach, especially give that doing so has resulted in Firefox getting compat complaints. Sounds like "(b) and (c) are weird as well, so we can't make (a) less weirder". ed2k, as you say, is obscure. Does it affect browsers? javascript is indeed interesting. I still think these aren't proper identifiers at all, and only live in browsers, so browser people may want to reserve the scheme name, and just specify what they do with them as something being in use where URIs/IRIs appear, but not really belonging to this class of identifiers. What makes data different from the other schemes you mentioned is: - browser implementations differ (as far as I can tell) - they are produced/consumed in other places (per spec) - they actually denote a representation to which a fragment can be applied (I don't think this is the case for javascript or ed2k) I guess I'm still not clear on what problem we're solving. Is there a web site that would work better if we changed WebKit's behavior? The problem I want to solve here is interop. If you think Webkit's behavior is fine, please document it and propose it to the list. When you do that it would be good if you can describe how relative URIs should work in data: URIs. Given data:text/html,<a href="#foo">hello</a><div id="#foo" style="margin-top:10000px;">pick me!</div> clicking on the link seems to navigate to about:blank in Chrome, which is unexpected. > The problem I want to solve here is interop. If you think Webkit's behavior is fine, please document it and propose it to the list. I'm sorry that I don't have time to do that at the moment. Previously I was working a full specification of URL handling, but I've stopped working on that spec. You can find my work-in-progress here: https://github.com/abarth/url-spec/blob/master/drafts/url.xml > When you do that it would be good if you can describe how relative URIs should work in data: URIs. It's pretty simple: 1) If the relative URL is an absolute URL, then the resolved URL is that absolute URL (appropriately canonicalized). 2) Otherwise, the resolved URL is an invalid URL. Separately, trying to navigate to an invalid URL results in a navigation to about:blank. In the spec, this is marked as a TODO: <t>TODO: If base-url's scheme is not hierarchical, we can't resolve as a relative URL. We'll probably want to return an invalid URL. Check what happens when resolving an empty string as a relative URL with a non-hierarchical base.</t> From the point of view of the spec, data URLs are not hierarchical, which essentially means we parse out the scheme and leave the remainder of the URL to be parsed by a scheme-specific parser. OK. The following HTML displays a green rectangle in Chrome (the #greenRect is allowed to resolve to an element in the data: document), so I guess you'd consider that a bug? Should I file it? <img src="data:image/svg+xml,<svg xmlns='http://www.w3.org/2000/svg' xmlns:xlink='http://www.w3.org/1999/xlink'><defs><rect id='greenRect' height='50' width='300' fill='green'/></defs><text y='30' fill='red'>fail</text><use xlink:href='#greenRect'/></svg>"> > OK. The following HTML displays a green rectangle in Chrome (the #greenRect is allowed to resolve to an element in the data: document), so I guess you'd consider that a bug? Should I file it?
We can look at the code, but I think SVG has hacks to avoid URL resolution. I believe that was discussed in more detail on the whatwg thread. I don't have any specific knowledge about how that works.
(In reply to comment #20) > We can look at the code, but I think SVG has hacks to avoid URL resolution. I > believe that was discussed in more detail on the whatwg thread. I can't find that in the thread. When you say "SVG has hacks", I assume you mean Webkit's SVG implementation? Given that changing <img src> to <iframe src> makes the #greenRect not resolve, it does look like something to do with SVG images specifically. I can't imagine that we really want SVG images to behave differently here ... do you agree? > I can't find that in the thread. It was mentioned in Comment #10 and Comment #13 in this bug. In the thread, it was mentioned in Boris Zbarsky's message on Sat, Sep 10, 2011 at 5:32 PM. > When you say "SVG has hacks", I assume you mean Webkit's SVG implementation? Yes. > Given that changing <img src> to <iframe src> makes the #greenRect not resolve, it does look like something to do with SVG images specifically. I can't imagine that we really want SVG images to behave differently here ... do you agree? My perspective is that SVG image itself a giant hack itself (or, at least, the WebKit implementation thereof is a giant hack). Our implementation is really very bad and doesn't have a path forward to not being full of bugs. I'm sorry to not be more helpful about this topic, but we've reached the limits of my knowledge. I can go read the code to understand what's going on in the SVG case, but I learned about it from Boris' email. I've studied the URL resolution behavior in more detail, and it works as I've described. If we'd had this discussion back in April, I would have been much more excited about digging in here and finishing the URL spec. At the moment, I'd rather give the IRIbis working group some breathing room and a chance to succeed. Once they've either succeeded or failed, we should talk more about writing a high-quality specification for URL handling. (In reply to comment #22) > > I can't find that in the thread. > > It was mentioned in Comment #10 and Comment #13 in this bug. In the thread, > it was mentioned in Boris Zbarsky's message on Sat, Sep 10, 2011 at 5:32 PM. Thanks. > If we'd had this discussion back in April, I would have been much more excited > about digging in here and finishing the URL spec. At the moment, I'd rather > give the IRIbis working group some breathing room and a chance to succeed. > Once they've either succeeded or failed, we should talk more about writing a > high-quality specification for URL handling. Fair enough, although the status quo is quite unsatisfactory. Of the three options: 1) Current behavior of Firefox and Opera: allow data: URIs to have fragment references, parse everything after the first # as a fragment reference, and allow relative URIs consisting of fragment references to resolve relative to a data: URI. 2) Your suggestion in comment #18 applied consistently to SVG images: disallow fragment references in data: URIs, resolve all URIs relative to a data: URI to an invalid URI. 3) Your suggestion in comment #18, except with something magical happening for SVG (possibly only in an image context) (e.g., have SVG special-case fragment-only relative URIs to bypass normal relative URI resolution) #2 would cripple data: URIs for SVG images. #3 seems unnecessarily complex, and limiting it to SVG would be difficult as we increase the integration of SVG with CSS and HTML. #1 seems best to me. Doesn't option (1) have compatibility problems? I thought that's why this discussion started. The issue arose on our side when one of our developers complained that an SVG image with a data: URI containing unescaped #s worked in Webkit but not in Firefox. I don't know whether or not Web content depends on this. The net-net of this discussion seems to be that you're not interesting in changing your behavior at this time and we're not interested in changing our behavior at this time. That means we'll probably need to revisit this topic in the future. Personally I wouldn't mind changing our behavior if there was some agreement on what to change it to. In fact, the WHATWG thread started with Daniel and I proposing a behavior change (which, unfortunately, everyone hated :-) ). It doesn't seem like there is an agreement based on Comment #10 and Comment #13. I agree that WebKit's current behavior isn't particularly aesthetic, but treating # in data URLs as the beginning of the fragment is also pretty unappealing. I certainly don't have a strong design to strong-arm you into adopting WebKit's behavior. I'm sorry that there doesn't appear to be a solution that's compelling enough for everyone to adopt. As you know, I'm a big fan of interoperability. (In reply to comment #18) > . . . > > When you do that it would be good if you can describe how relative URIs should work in data: URIs. > > It's pretty simple: > > 1) If the relative URL is an absolute URL, then the resolved URL is that absolute URL (appropriately canonicalized). > > 2) Otherwise, the resolved URL is an invalid URL. > > Separately, trying to navigate to an invalid URL results in a navigation to about:blank. > > In the spec, this is marked as a TODO: > > <t>TODO: If base-url's scheme is not hierarchical, we can't resolve as > a relative URL. We'll probably want to return an invalid URL. Check > what happens when resolving an empty string as a relative URL with a > non-hierarchical base.</t> > > From the point of view of the spec, data URLs are not hierarchical, which essentially means we parse out the scheme and leave the remainder of the URL to be parsed by a scheme-specific parser. I think we have to be careful here to distinguish relative URIs in general from same-document references. I claim that same-document references (and all the examples of relative data: URIs we've seen so far _are_ same-document references) _may_ make sense for non-hierarchical URI schemes, and should not be treated as errors. So, for example, clicking on the link in that appears at the beginning of the result of interpreting data:application/xhtml+xml,<html xmlns='http://www.w3.org/1999/xhtml'><a href='#target'>go green</a><div style='height:3000px; width: 300px; background: red'/><div id='target' style='height: 3000px; width: 300px; background: green;'/></html> should work, IMO. (In reply to comment #29) > So, for example, clicking on the link in that appears at the beginning of the result of interpreting > > data:application/xhtml+xml,<html xmlns='http://www.w3.org/1999/xhtml'><a href='#target'>go green</a> [etc] I would agree with you if you replaced "#" with "%23". But in the URI as you provided it, the "#" technically terminates the main portion of the URI and indicates the beginning of the fragment-identifier. So your URI just contains this (invalid) XML: > <html xmlns='http://www.w3.org/1999/xhtml'><a href=' and that's it. This is how both Opera and Firefox handle that URI, when they load it directly. I agree that this strict behavior is a bit unintuitive (see the whatwg thread I started about trying to make it better[1]), but the general consensus in that thread was that the strict behavior (that Firefox implements) is correct. http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2011-September/033162.html We've been discussing this on the Chromium bug tracker: http://crbug.com/123004 Does anyone have any new thoughts on this? Blink Intent to Deprecate and Remove: Support for '#' in data URI body https://groups.google.com/a/chromium.org/forum/#!topic/blink-dev/_g_HnAHkMPo FYI (without any comment on what anyone should or shouldn't do!): Blink's (second) attempt at removing this landed in our M72 milestone, and has now survived an entire milestone in the field (as M73 has started rolling out). Chromium is now aligned with the spec and Firefox; '#' in data URIs mark the end of the content and the start of the fragment. *** Bug 215820 has been marked as a duplicate of this bug. *** Created attachment 409982 [details]
Patch
Created attachment 409983 [details]
Patch
Created attachment 409989 [details]
Patch
Created attachment 409995 [details]
Patch
Created attachment 410000 [details]
Patch
Created attachment 410013 [details]
Patch
Comment on attachment 410013 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=410013&action=review > LayoutTests/ChangeLog:8 > + Adjusts tests thats use data URIs to escape the # character. Seems likely that some tested-only-with-WebKit content, like content embedded in old iOS apps, could accidentally depend on this behavior the way these many tests do. We might find we need a quirk. Keep in mind that when Blink decides what to do in their engine, they do not have this consideration. (In reply to Darin Adler from comment #41) > Comment on attachment 410013 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=410013&action=review > > > LayoutTests/ChangeLog:8 > > + Adjusts tests thats use data URIs to escape the # character. > > Seems likely that some tested-only-with-WebKit content, like content > embedded in old iOS apps, could accidentally depend on this behavior the way > these many tests do. We might find we need a quirk. Good point, let's see if that will happen. (In reply to Rob Buis from comment #43) > Good point, let's see if that will happen. The trick here is to figure out how that testing occurs; it’s not something the WebKit project directly manages, but rather something that various people inside Apple arrange for. For example, often we learn about such things from Apple’s public beta releases of iOS. We probably need a runtime flag for this, and preferably internal debug flag. (In reply to Ryosuke Niwa from comment #45) > We probably need a runtime flag for this, and preferably internal debug flag. Maybe this is something we should document. "What to do when you are making a web-exposed change for interoperability" so I don’t have to write "this could affect content in iOS apps" over and over again, and so people understand why there might need to be a flag for something even if other engines like Blink have just made an unconditional change. > Keep in mind that when Blink decides what to do in their engine, they do not have this consideration. Super irrelevant, but we do :). (Sorry, I was on the cc-list and couldn't help myself). And yeah, this change for us led to a non-trivial breakage in Android (because I missed the potential impact this could have on Android Apps, see https://crbug.com/123004#c106 and https://crbug.com/929083), and ultimately we landed a quirk delaying the change in Android until a major release (https://chromium-review.googlesource.com/c/chromium/src/+/1456638). Created attachment 410028 [details]
Patch
(In reply to Stephen McGruer from comment #47) > Super irrelevant, but we do :). (Sorry, I was on the cc-list and couldn't > help myself). > > And yeah, this change for us led to a non-trivial breakage in Android > (because I missed the potential impact this could have on Android Apps, see > https://crbug.com/123004#c106 and https://crbug.com/929083), and ultimately > we landed a quirk delaying the change in Android until a major release > (https://chromium-review.googlesource.com/c/chromium/src/+/1456638). Thanks for chiming in; good to know! Comment on attachment 410028 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=410028&action=review >>> Source/WebCore/platform/network/DataURLDecoder.cpp:121 >>> + RuntimeEnabledFeatures::sharedFeatures().stripFragmentIdentifierFromDataURL() ? url.stringWithoutFragmentIdentifier().toString() : url.string(), >> >> Since we're trying to remain compatible with existing WebKit-only content, wouldn't linkedOnOrAfter be better than a runtime enabled feature that starts as always on? > > I was thinking Ryosuke wanted a runtime enabled flag to easily disable this feature if we do run into WebKit-only content that relies on the old behavior? Seems linkedOnOrAfter is very static once in place. Anyway this is why I put up the review flag :) I'm ok with this in the hopes that we don't find any compatibility issues. (In reply to Alex Christensen from comment #52) > Comment on attachment 410028 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=410028&action=review > > >>> Source/WebCore/platform/network/DataURLDecoder.cpp:121 > >>> + RuntimeEnabledFeatures::sharedFeatures().stripFragmentIdentifierFromDataURL() ? url.stringWithoutFragmentIdentifier().toString() : url.string(), > >> > >> Since we're trying to remain compatible with existing WebKit-only content, wouldn't linkedOnOrAfter be better than a runtime enabled feature that starts as always on? > > > > I was thinking Ryosuke wanted a runtime enabled flag to easily disable this feature if we do run into WebKit-only content that relies on the old behavior? Seems linkedOnOrAfter is very static once in place. Anyway this is why I put up the review flag :) > > I'm ok with this in the hopes that we don't find any compatibility issues. Yeah, I think the ideal end result is that we can remove the old behavior from our codebase completely so I think this is a good start. If we find any compatibility issues, we'd go use linked on or after check instead. Comment on attachment 410028 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=410028&action=review > Source/WebCore/page/RuntimeEnabledFeatures.h:269 > + void setStripFragmentIdentifierFromDataURL(bool isEnabled) { m_stripFragmentIdentifierFromDataURL = isEnabled; } > + bool stripFragmentIdentifierFromDataURL() const { return m_stripFragmentIdentifierFromDataURL; } Can we use Settings instead? Created attachment 410102 [details]
Patch
(In reply to Darin Adler from comment #54) > Comment on attachment 410028 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=410028&action=review > > > Source/WebCore/page/RuntimeEnabledFeatures.h:269 > > + void setStripFragmentIdentifierFromDataURL(bool isEnabled) { m_stripFragmentIdentifierFromDataURL = isEnabled; } > > + bool stripFragmentIdentifierFromDataURL() const { return m_stripFragmentIdentifierFromDataURL; } > > Can we use Settings instead? I think so, I made that change. So here’s the stage this patch is at: 1) It’s not in unconditionally, which is good because we may find compatibility issues that require turning it on and off. 2) It’s a clean, easy to understand, code change. 3) It’s using Settings, not the deprecated RuntimeEnabledFeatures. 4) There’s no API or SPI exposed for tweaking the setting, so if we put it in a release of iOS or macOS, the new behavior will be unconditionally on for all clients, and the fact that there's a Setting inside WebCore won't affect that one way or the other. (1), (2), and (3) are all excellent! (4) is probably not OK for Apple, and because of that, checking this in would probably create a sort of "P1 bug" for WebKit at Apple. To deal with (4) we probably need to come up with a strategy like: A) "We're just going to try this and 'watch for breakage' and come back to it later before we release this in an iOS update." B) Like A on Mac, but on iOS use a "linked on or after" check to keep old apps working indefinitely. This would be a check, not a setting. C) Same as B, but do the same on macOS. Or maybe something involving adding API to control this. That seems unlikely. For Apple it’s not great to land this as is. Maybe we can make some small change so that it doesn’t yet do (4) above? Another possibility is: D) On Apple platforms, find way to change this only for "web browsers" and not other uses of WebKit. Or a mix of D with those A-C options. I think D is a bad idea. I think we want to minimize the differences in behavior between "web browsing" and "non-web-browsing" WebKit. Ryosuke seemed to be in favor of A. I'm in favor of B and C because I think there is a high chance that if we choose A we will find something and be forced to switch to B and C. It's conservative, but it's also not a very complex place to have a linkedOnOrAfter check, so I think it would be OK to put here for 5 or so years (until most apps have updated) without causing much of a maintenance burden. Like Alex, I think (C) is appealing. (In reply to Darin Adler from comment #60) > Like Alex, I think (C) is appealing. Yeah, now that we've learned that Android faced app compatibility issues, it would be a fool's errands to try to ship this outright. Created attachment 410279 [details]
Patch
Created attachment 410297 [details]
Patch
Created attachment 410304 [details]
Patch
Comment on attachment 410304 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=410304&action=review > Source/WebKit/UIProcess/Cocoa/WebProcessPoolCocoa.mm:309 > +static bool linkedOnOrAfterEverything() Why are we doing this now? We’ve had linked-on-or-after checks for a long time. What changed to make this be needed? Comment on attachment 410304 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=410304&action=review > Source/WebKit/UIProcess/Cocoa/WebProcessPoolCocoa.mm:318 > +#if PLATFORM(MAC) > + value = WebCore::MacApplication::isSafari(); > +#else > + value = WebCore::IOSApplication::isMobileSafari(); > +#endif Seems likely this is wrong for Safari View Controller, to cite one example. Comment on attachment 410304 [details]
Patch
WebKitTestRunner and DumpRenderTree use NSUserDefaults instead of SPI to test behavior of an app that is linked on or after everything, and Safari (even on older macOS) behaves as if it were linked on the most recent SDK. This is not an issue for Safari View Controller, which is always actually linked with the most recent SDK.
The linkedOnOrAfter implementation is basically duplicate code in WebKitLegacy and WebKit that we need to use here in WebCore. I was trying to make a minimal change to get that working, but I think it would be even better if I unified linkedOnOrAfter, used SPI instead of NSUserDefaults, and moved it to WebCore in a separate patch before this change.
(In reply to Alex Christensen from comment #67) > I was trying > to make a minimal change to get that working, but I think it would be even > better if I unified linkedOnOrAfter, used SPI instead of NSUserDefaults, and > moved it to WebCore in a separate patch before this change. Yes. I think the function and method names involved should mention testing, because otherwise it’s such a bizarre feature. Created attachment 410526 [details]
Patch
(In reply to Alex Christensen from comment #69) > Created attachment 410526 [details] > Patch Thanks for your work on this Alex! FWIW, this change broke playback on Starz.com in Safari, as they use a data URI which looks like: `data:application/x-mpegURL,#EXTM3U%0A#EXT-X-INDEPENDENT-SEGMENTS%0A...`. |