Bug 14776 - poorly escaped URLS are handled differently, umlaut in path breaks URLs
Summary: poorly escaped URLS are handled differently, umlaut in path breaks URLs
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 523.x (Safari 3)
Hardware: Mac OS X 10.4
: P1 Normal
Assignee: Nobody
URL: http://ninja.caboo.se/
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2007-07-26 23:51 PDT by Courtenay
Modified: 2023-01-02 12:27 PST (History)
7 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Courtenay 2007-07-26 23:51:36 PDT
I have a URL that looks like

  http://ninja.caboo.se/image_assets/8/Circuit_Nürburgring_tiny.png

That's a "u" with an umlaut.  This works as expected on 419.2.1, but fails on 522 (nightly)

The webserver reports this fetch on 522 (it 404s)
  GET /image_assets/8/Circuit_N%C3%BCrburgring_tiny.png HTTP/1.1 404

The webserver reports success on 419 with a different encoding -- it uses the "u" with the combining diaresis
  GET /image_assets/8/Circuit_Nu%CC%88rburgring_tiny.png HTTP/1.1 200

%C3%BC is the utf-8 character for ü as a single umlaut character.  
u%CC%88 is a u with modification, the combining diaresis (CC88 as the umlaut)

When you type Nu%CC%88 in the URL and hit enter, it converts the "u" to the umlauted u automatically, and the file can be found by the web server.  The request shows Nu%CC%88.

If I type it manually -- hit alt-u to get the dots, then hit "u" -- it requests the %C3%BC version which fails on 522.

It seems like the default character/rendering has changed between versions.  The new version is not the same as Firefox on the mac (it works)
Comment 1 Courtenay 2007-07-26 23:55:28 PDT
Of course this is very difficult to see because of the way the url handling is done.

You can check a sample page at http://ninja.caboo.se/tasks/981c84d3 -- the left-most image will be a question-mark on 522 and an image on 419.2.1
Comment 2 Courtenay 2007-07-27 00:10:44 PDT
Mea culpa, sorta.

I found that I was, in fact, neglecting to escape my URLS.  Once I specifically escape my URLs (FYI, this is with ruby/rails), it all works fine from 419 to 522.  So -- this bug is somewhat invalid, but may come back to bite some other lazy coders in the future.
Comment 3 Courtenay 2007-07-27 00:11:54 PDT
Oh yeah, one final thing. The unescaped version of my URL crashes MobileSafari on the iPhone. Even after restoring and clearing everything.
Comment 4 Alexey Proskuryakov 2007-07-27 06:53:03 PDT
(In reply to comment #3)
> Oh yeah, one final thing. The unescaped version of my URL crashes MobileSafari
> on the iPhone. Even after restoring and clearing everything.

It also crashes shipping 10.4.10 Safari/WebKit (open <http://ninja.caboo.se/tasks/981c84d3>, open the left image in new tab, copy its URL, open a new tab, paste the URL into address bar). The crash is in Safari itself, not in WebKit. I couldn't find a way to reproduce it with 3.0.2 beta.

As for the original bug - I think we are supposed to convert URLs to precomposed Unicode (%C3%BC, not u%CC%88), and we now do. In this particular case, it makes us incompatible with other browsers, but I still think that it's the right thing to do, because preferring decomposed Unicode is a Mac thing, and converting an URL to precomposed form makes it more Windows-like.
Comment 5 Darin Adler 2007-07-27 07:51:51 PDT
(In reply to comment #4)
> (In reply to comment #3)
> As for the original bug - I think we are supposed to convert URLs to
> precomposed Unicode (%C3%BC, not u%CC%88), and we now do. In this particular
> case, it makes us incompatible with other browsers, but I still think that it's
> the right thing to do, because preferring decomposed Unicode is a Mac thing,
> and converting an URL to precomposed form makes it more Windows-like.

I don't think there's a lot of value in being "Mac-like" in cases like this. I'd prefer to be consistent with the other browsers, even if that is "Windows-like".
Comment 6 David Kilzer (:ddkilzer) 2007-07-27 09:30:46 PDT
(In reply to comment #4)
> It also crashes shipping 10.4.10 Safari/WebKit (open
> <http://ninja.caboo.se/tasks/981c84d3>, open the left image in new tab, copy
> its URL, open a new tab, paste the URL into address bar). The crash is in
> Safari itself, not in WebKit. I couldn't find a way to reproduce it with 3.0.2
> beta.

I can't reproduce this using Safari 2.0.4 (419.3) with its original WebKit on Mac OS X 10.4.10 (8R218).  Alexey, was this the configuration you were using?  Safari 2 opened the image for me when I followed these steps.  Could you post a stack trace?

Comment 7 Alexey Proskuryakov 2007-07-27 11:02:40 PDT
(In reply to comment #5)
> I don't think there's a lot of value in being "Mac-like" in cases like this.
> I'd prefer to be consistent with the other browsers, even if that is
> "Windows-like".

What I meant was that being incompatible might actually make us Windows-like in more cases. As most servers expect precomposed URLs, because they were tested with Windows browsers, we'd better normalize them to NFC.

I don't have a browser-specific example ready, but a failure to normalize URLs to NFC causes numerous problems in other OS X parts (such as WebDAV, rdar://2861039).
Comment 8 Alexey Proskuryakov 2007-07-27 11:07:10 PDT
(In reply to comment #6)
> I can't reproduce this using Safari 2.0.4 (419.3) with its original WebKit on
> Mac OS X 10.4.10 (8R218).  Alexey, was this the configuration you were using? 

I was using the same configuration with an Intel Mac, the crash was 100% reproducible. I cannot reproduce this at home, so I'll e-mail you a stack trace next week - maybe it depends on some particular  state of browser history.
Comment 9 Courtenay 2007-07-28 15:34:25 PDT
I've updated my app so the URLs are escaped properly, so you won't be able to use the above URLs to reproduce/test.

I've saved a static version of the page which you can use at http://ninja.caboo.se/nurburgring.html
Comment 10 Alexey Proskuryakov 2007-07-28 22:18:42 PDT
It's definitely a WebKit bug that the two versions works differently - they should be absolutely identical form any point of view. Both should either work, or not.

Raising to P1, as this is a change in behavior from shipping Safari that should be either rolled back, or completed IMO.
Comment 11 Alexey Proskuryakov 2007-07-30 01:31:24 PDT
One more data point: the URL doesn't work when typed in the address bar manually (as typing Opt+u, u with the U.S. keyboard produces a precomposed character).
Comment 12 David Kilzer (:ddkilzer) 2007-07-30 08:52:49 PDT
<rdar://problem/5369847>