Bug 17105 - Not all of KURL's constructors handle IDN.
Summary: Not all of KURL's constructors handle IDN.
Status: RESOLVED INVALID
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks: 37641
  Show dependency treegraph
 
Reported: 2008-01-30 17:12 PST by Brett Wilson (Google)
Modified: 2023-05-22 03:40 PDT (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Brett Wilson (Google) 2008-01-30 17:12:20 PST
KURL(const char*) and KURL(DeprecatedString&) convert to UTF-8 and pass their data directly to KURL::parse. This is incorrect, since it skips the stuff done by encodeRelativeString for IDN. There might be some other handling that this code path misses as well.

These constructors should probably be the same as KURL(KURL(), arg).
Comment 1 Alexey Proskuryakov 2009-04-09 10:25:02 PDT
Is this bug still valid? DeprecatedString no longer exists, and I'm not sure if that const char* constructor has this problem now.
Comment 2 Brett Wilson (Google) 2009-04-10 08:40:06 PDT
This is still a big. IDN is handled by encodeHostnames, which is called only by encodeRelativeString, which is called only by init.

Any of the constructors that call parse() instead of init() don't do IDN. These are KURL::KURL(const char* url) and KURL::KURL(const String& url).
Comment 3 Alexey Proskuryakov 2009-04-10 09:32:50 PDT
Why is that an issue? The single argument constructors are only defined to work with what KURL::string() returns, not with arbitrary input. In no case can a non-ASCII string be passed to them.
Comment 4 Brett Wilson (Google) 2009-04-10 12:47:43 PDT
The first time the behavior was "defined" was by me in a patch that Darin adapted to remove DeprecatedString.

I did this because it was so confusing. Say I have a random URL in a WebCore String which holds unicode characters. If I use the String constructor, it will do the wrong thing. Instead I have to call KURL(KURL(), myString). It seems like an accident waiting to happen and I can't see why the current behavior would be desirable.
Comment 5 Brett Wilson (Google) 2009-04-10 12:49:51 PDT
There are assertions to verify this. (I don't know if I wrote them or not.) With these, I won't complain if you don't fix this bug, but I still think it's wrong behavior.
Comment 6 Alexey Proskuryakov 2009-04-10 22:41:17 PDT
I agree that this is very confusing. Not quite sure what the best approach is though - one of possibilities that were discussed was adding a new URLString class for KURL::string() result, and to remove the String single argument constructor.
Comment 7 Brett Wilson (Google) 2009-04-12 23:44:17 PDT
I think the best behavior would be to not convert URLs to strings in any place (possibly with URLString as a replacement or just using KURL everywhere), and then making all KURL constructors do everything. I assume these constructors are so simple is that it converts strings to KURLs and back with some frequency.
Comment 8 Alexey Proskuryakov 2009-04-12 23:55:53 PDT
Unfortunately, KURL is rather large, so using it everywhere would likely increase memory consumption noticeably. I don't have numbers though.
Comment 9 Brett Wilson (Google) 2009-04-13 10:34:46 PDT
The biggest user of space in KURL is that it stores the ASCII string in UTF-16. The Google version of KURL stores the canonical string in 8-bit (since this is what the canonicalization library wants), but there are so many requests for the string out of it (which is then often used to create another KURL later), that we have to keep a cache, making the object larger in practice. If everybody used some common string type or KURL directly, this could be replaced and the object would be much smaller.
Comment 10 Anne van Kesteren 2023-05-22 03:40:13 PDT
We no longer use KURL.