WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED INVALID
17105
Not all of KURL's constructors handle IDN.
https://bugs.webkit.org/show_bug.cgi?id=17105
Summary
Not all of KURL's constructors handle IDN.
Brett Wilson (Google)
Reported
2008-01-30 17:12:20 PST
KURL(const char*) and KURL(DeprecatedString&) convert to UTF-8 and pass their data directly to KURL::parse. This is incorrect, since it skips the stuff done by encodeRelativeString for IDN. There might be some other handling that this code path misses as well. These constructors should probably be the same as KURL(KURL(), arg).
Attachments
Add attachment
proposed patch, testcase, etc.
Alexey Proskuryakov
Comment 1
2009-04-09 10:25:02 PDT
Is this bug still valid? DeprecatedString no longer exists, and I'm not sure if that const char* constructor has this problem now.
Brett Wilson (Google)
Comment 2
2009-04-10 08:40:06 PDT
This is still a big. IDN is handled by encodeHostnames, which is called only by encodeRelativeString, which is called only by init. Any of the constructors that call parse() instead of init() don't do IDN. These are KURL::KURL(const char* url) and KURL::KURL(const String& url).
Alexey Proskuryakov
Comment 3
2009-04-10 09:32:50 PDT
Why is that an issue? The single argument constructors are only defined to work with what KURL::string() returns, not with arbitrary input. In no case can a non-ASCII string be passed to them.
Brett Wilson (Google)
Comment 4
2009-04-10 12:47:43 PDT
The first time the behavior was "defined" was by me in a patch that Darin adapted to remove DeprecatedString. I did this because it was so confusing. Say I have a random URL in a WebCore String which holds unicode characters. If I use the String constructor, it will do the wrong thing. Instead I have to call KURL(KURL(), myString). It seems like an accident waiting to happen and I can't see why the current behavior would be desirable.
Brett Wilson (Google)
Comment 5
2009-04-10 12:49:51 PDT
There are assertions to verify this. (I don't know if I wrote them or not.) With these, I won't complain if you don't fix this bug, but I still think it's wrong behavior.
Alexey Proskuryakov
Comment 6
2009-04-10 22:41:17 PDT
I agree that this is very confusing. Not quite sure what the best approach is though - one of possibilities that were discussed was adding a new URLString class for KURL::string() result, and to remove the String single argument constructor.
Brett Wilson (Google)
Comment 7
2009-04-12 23:44:17 PDT
I think the best behavior would be to not convert URLs to strings in any place (possibly with URLString as a replacement or just using KURL everywhere), and then making all KURL constructors do everything. I assume these constructors are so simple is that it converts strings to KURLs and back with some frequency.
Alexey Proskuryakov
Comment 8
2009-04-12 23:55:53 PDT
Unfortunately, KURL is rather large, so using it everywhere would likely increase memory consumption noticeably. I don't have numbers though.
Brett Wilson (Google)
Comment 9
2009-04-13 10:34:46 PDT
The biggest user of space in KURL is that it stores the ASCII string in UTF-16. The Google version of KURL stores the canonical string in 8-bit (since this is what the canonicalization library wants), but there are so many requests for the string out of it (which is then often used to create another KURL later), that we have to keep a cache, making the object larger in practice. If everybody used some common string type or KURL directly, this could be replaced and the object would be much smaller.
Anne van Kesteren
Comment 10
2023-05-22 03:40:13 PDT
We no longer use KURL.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug