WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
16538
KURL should use String instead of DeprecatedString
https://bugs.webkit.org/show_bug.cgi?id=16538
Summary
KURL should use String instead of DeprecatedString
Brett Wilson (Google)
Reported
2007-12-20 11:14:18 PST
KURL should use String instead of DeprecatedString
Attachments
Patch
(67.66 KB, patch)
2008-01-29 17:42 PST
,
Brett Wilson (Google)
darin
: review-
Details
Formatted Diff
Diff
Patch for KURL to use string internally
(77.12 KB, patch)
2008-02-01 11:28 PST
,
Brett Wilson (Google)
no flags
Details
Formatted Diff
Diff
New patch addressing most comments
(88.08 KB, patch)
2008-02-05 20:01 PST
,
Brett Wilson (Google)
no flags
Details
Formatted Diff
Diff
patch rebased and with backwards protocolIs fixed
(88.49 KB, patch)
2008-02-10 08:07 PST
,
Darin Adler
no flags
Details
Formatted Diff
Diff
more ambitious approach, eliminating more DeprecatedString
(163.95 KB, patch)
2008-02-10 10:54 PST
,
Darin Adler
no flags
Details
Formatted Diff
Diff
patch
(403.46 KB, patch)
2008-02-12 13:15 PST
,
Darin Adler
eric
: review+
Details
Formatted Diff
Diff
patch, now with fewer broken platforms
(430.01 KB, patch)
2008-02-14 18:15 PST
,
Darin Adler
no flags
Details
Formatted Diff
Diff
Show Obsolete
(5)
View All
Add attachment
proposed patch, testcase, etc.
Brett Wilson (Google)
Comment 1
2008-01-29 17:42:28 PST
Created
attachment 18785
[details]
Patch This replaces String with DeprecatedString in KURL's interface. The internal representation is still a DeprecatedString. I tried two different timer to do the change the internals, but it is challenging because the parsing code relies on the utf-16/utf-8 duality of the string. This might be more doable with a fresh look. MacDome was also expressing interest in doing this task. This interface patch is designed to be the first step in fixing KURL; clearly the internals will need to be fixed, but these are both complicated patches which I think are better separated. There are more conversions which will make the code slower, although I am pretty sure it will not be measurable: About half the callers already had a String, and this patch does not change the number of conversions in those cases. The other half (which had a DeprecatedString) will need an extra string conversion. This also breaks the optimization where the string inside KURL would just reference the argument without a malloc in some cases. I left this in with some FIXMEs for the arguments, so it will be easy to fix when the internals are also converted to String. ----- I made a number of changes to KURL aside from just String conversions: Added a lot of comments to the header file I made the constructor explicit. I kept finding places where there was a String, and it was being converted to a KURL when being passed to a function without the code realizing that this was a potentially slow operation. This highlights some really stupid usage, often involving resolution of a relative URL with a string base. For example, in CSSParser::parseBorderImage: context.commitImage(new CSSImageValue( KURL(KURL(styleElement->baseURL()), uri).string(), styleElement)); Now the fact that baseURL should probably return a KURL is more obvious, or, if not, at least tells you how expensive the operation you are doing is. I would say everybody should use KURL and the above code should be: context.commitImage(new CSSImageValue( KURL(styleElement->baseURL(), uri), styleElement)); A lot of the diff relates to this change. I added a new function, protocolIs(const char*). A lot of users were doing this pattern: if (url.protocol() == "http"); This is bad for several reasons. First, it does a case-sensitive compare, which is incorrect (KURL does not canonicalize the scheme). To work around this, some callers do something like this: if (url.protocol().lower() == "http") This makes the second problem even worse, which is that lots of string temporaries are getting created for an operation which doesn't need it. My new function handles different cases and does not create new objects, while making the usage more clear. I fixed some signed/unsigned types in findHostnamesInMailToURL and other places which are suspicious and generate warnings. ----- I made two changes outside of KURL relating to
bug 16538
, which this patch fixes. Some code expects the string in a KURL to be non-NULL, despite there being many ways to make a NULL KURL. This code is not tripped because URLs oftem go through string conversions, i.e. KURL->String->KURL, and this process will lose the isNull-ness of the string and it will report empty instead. I made a fix in IconDatabase.cpp, and one in DocLoader.cpp (which I previously filed
bug 16485
for). In loader.cpp I removed some dead code.
Eric Seidel (no email)
Comment 2
2008-01-30 03:06:26 PST
Comment on
attachment 18785
[details]
Patch This looks great. I think all additions of String( are probably bogus. String does not have any explicit constructors (I don't think). I think this patch clearly points out that certain things like: HTMLAnchorElement::href() Document::url() Element::baseURL() We can convert those to use KURL as soon as we teach the JS bindings how to silently convert to a UString in the getters. Perhaps we already have a StringImp* toJS(KURL) function... isAllASCII could move onto StringImpl if there are any other callers of DeprecatedString::isAllASCII in the sourcebase It's very sad that: char *strBuffer; uses manual memory management. However you didn't change that, so we can fix that later in a separate patch. I understand that you are not adding ifs here, but any ifs that you touch where you're changing: if (foo) { return DeprecatedString(); } you could update them to match our style guidelines and remove the { }. That's not required for this patch, but it's good to know for the future at least. Why not use your new protocolIs function here? return protocol().lower() == "file"; This seems incorrect: unsigned separator = findFirstOf(s, 0, "://"); // WILL include the "?". you use // Will on the next line. All * and & should be next to the argument names. Per our style guidelines. I think there is a String version of completeURL: KURL link(element->document()->completeURL(href.deprecatedString())); I'm going to leave this as ? since I expect darin may have further comments.
Eric Seidel (no email)
Comment 3
2008-01-30 03:07:15 PST
It's unfortunate that we don't have more URL handling tests. :sigh:
Darin Adler
Comment 4
2008-01-30 06:06:52 PST
Comment on
attachment 18785
[details]
Patch This patch looks great! I particularly like protocolIs().
> but it is challenging because the parsing code relies on the utf-16/utf-8 duality of the string
I'm not sure what you mean here. DeprecatedString does not have UTF-16/UTF-8 duality. It does have a feature where you can just call ascii() on it and see the characters as an 8-bit string but: 1) it doesn't actually do anything really efficient -- it pretty much just allocates a new 8-bit buffer 2) it chops characters and does not do a UTF-16 to UTF-8 conversion 3) there's very little code that actually takes advantage of this Would you be willing to take another pass at that aspect of the patch? I ask because I think it would be better to not land this with the comments "this optimization is broken for now". I really don't think we need to stage this so that there's a stage with the broken optimization. String is inefficient if you do thinks like concatenate; any modifications to string length require reallocating the buffer. The same is not true about DeprecatedString. So I'm concerned about replacing functions that build up a DeprecatedString by appending with functions that do the same thing with String. This is the most challenging issue to be considered when replacing DeprecatedString with String. encodeHostnames and substituteBackslashes are examples of this. To do this efficiently with String you need to build up the new string as a Vector<UChar> and then use adopt() to turn it into a string once done modifying it. There many not be the appropriate helper functions yet for String and Vector<UChar>, and we've even considered adding a new StringBuilder class. This is one of the reasons I personally haven't converted these functions yet yet. Just converting to String and hoping that performance is acceptable may not produce acceptable results. Please consider Vector<UChar> for functions that build strings by appending. It's pretty straightforward, and anything inconvenient can probably be papered over with a few helper functions. The mix of signed and unsigned integers is starting to get a little awkward. Since our existing string functions mostly use int, I'd prefer that you continue that to avoid ugly casts in new code. Or we could move in the other direction and try to move away from using int for string character indices. I'm not pleased with the mix. When checking for the 0 character this new code has a mix of styles. Sometimes it's "!c", other times "c == 0", other times "c == '\0'". I'd prefer more consistency. - ASSERT(!pageURLOriginal.isNull()); Was this assertion not actually guaranteed? If so, then I understand the bug fix. If not, then I'm puzzled by why you made the change here. +static bool isAllASCII(const String& str) +{ + for (unsigned i = 0; i < str.length(); i++) { + if (str[i] > 0x7f) + return false; + } + return true; +} This function already exists in RenderText.cpp, although it works on a StringImpl rather than a String. I would prefer that we unify and share a single one rather than adding another. The RenderText.cpp one uses an algorithm that's considerably faster for long strings where characters actually are all ASCII, and since all-ASCII is the normal case I'd prefer to do it that way. +// ASCII. The detination buffer must be large enough, and it is not NULL +// terminated unless the source has a NULL. Typo. Missing the "s" in destination. Normally NULL refers the pointer value and not the 0 character. The 0 character is usually called NUL, with one L. Or you can say "null-terminated" and "null character" and not SHOUT about the null character. There are many other places you call this character NULL -- those should all be fixed. + dest[i] = static_cast<char>(src[i]); There's no need for an explicit cast here. I don't know of any compiler WebKit supports that gives a warning for narrowing conversions like this one. And while it would be nice to be explicit that there's a type conversion going on, I think the general guideline to avoid casts when possible is probably slightly more important. +// Returns the index of the first index in string |s| of any of the characters +// in |toFind|. |toFind| should be a NULL-terminated string, all characters up +// to the NULL will be searched. Returns UINT_MAX if not found. +unsigned findFirstOf(const String& s, unsigned startPos, const char* toFind) I would prefer that if you need something like this that you add it to the String class rather than put in a local function inside KURL. Or if you really think it should be local, it should be marked static so it gets internal linkage. If you went to add it to the String class you'd see the mismatch, where you're using unsigned and the other functions use int. It would be best if we were consistent about this. If I was starting a new class I'd probably use unsigned, but I'd prefer to not have the single new function go in a different direction from all the existing ones. We could change them all in the future if you think unsigned is better for this sort of thing. + // Note that we don't pass anything as the second argument. When the + // innards of KURL are converted from DeprecatedString to String, this + // should be put back. It is an optimization that prevents allocating + // another string in the common case that nothing changes. Are you sure it's OK to lose this optimization? I'm pretty sure it's a real issue. We added this optimization for a reason, and I don't think we should take it out lightly. + for (i = 0; i < schemeEndPos; i++) { + if (!proto[i] || toASCIILower(urlString[i].unicode()) != proto[i]) + return false; + } + return proto[i] == 0; // We should have consumed all characters in the argument. Why use ! inside the loop and == 0 outside the loop? I'd prefer that you be consistent. I think we should add an assert here that the passed-in protocol does not contain any ASCII uppercase characters or non-ASCII characters. I also think that the name "proto" isn't so great, since it could easily mean "prototype" instead. Why not use the entire word protocol? - return protocol() == "file"; + return protocol().lower() == "file"; This should be using protocolIs(). + // DANGER! Thos does not do things like IDN, and the resulting host name Typo "this" not "thos". + // Returns true if this URL hsa a path. Note that "
http://foo.com/
" has a Typo "has" not "hsa". + // Returns true of the current URL's protocol is the same as the NULL- Typo "if" not "of". + void setProtocol(const String &); + void setHost(const String &); There should not be spaces before these "&" symbols. In general it would be good if you fixed the positions of * and & characters even in existing code you touch. Our WebKit coding style is Type* and Type& rather than Type * and Type &. For the functions where you're adding comments before the function declaration, please also add blank lines to group the comments with the functions they are describing. I find this hard to read: void f(); // Pass a full path. void g(const char*). void h(); I'd prefer that there be a blank line after f and before h in a case like this. + m_frame->loader()->begin(KURL("
http://localhost/placeholder.svg
")); // create the empty document Why is http a good idea here? Can we come up with something better? I think it would be bad if something decided to do I/O and talk to the local HTTP server! That could well be a valid URL to a different SVG. There are enough comments here that I'm going to say review- for now, but this seems like great work. Thanks for taking this on.
Brett Wilson (Google)
Comment 5
2008-02-01 11:28:45 PST
Created
attachment 18850
[details]
Patch for KURL to use string internally I tried to do as Darin suggested, changing the entire KURL innards to use String. This took me more than 1.5 days of solid work, and mostly works, but has some bugs. For example, GMail takes forever to load and some bogus looking URLs flash by in the URL bar in Safari. I have also seen some stack corruption that I have not been able to reproduce, and I broke dom/xhtml/level3/core/nodegetbaseuri04.xhtml in a way that I don't understand. I do not think that fixing this is technically worthwhile. Even if we fix the above bugs, I am certain that I have introduced some subtle bugs, incorrect behavior, and crashes in this patch that I do not know about. Lacking any tests, and given the poor state of this code, doing this change would be risky and take a very long time. You mentioned that the concern was the broken optimization for storing Strings without copies when they are unchanged, but I do not think that this is a particularly important operation. It is less important than fixing all the String<->DeprecatedString conversions in the way that KURL is called. I think that checking in the interface fix, and then fixing
bug 17098
is the best way to go, since that library contains a lot of tests that we can run to validate the results. With the interface patch in, the integration for that bug will be easy and much lower risk.
Brett Wilson (Google)
Comment 6
2008-02-05 20:01:37 PST
Created
attachment 18947
[details]
New patch addressing most comments MacDome's review: I think all additions of String( are probably bogus. String does not have any explicit constructors (I don't think). You are correct, I removed these.
> you could update them to match our style guidelines and remove the { }. That's > not required for this patch, but it's good to know for the future at least.
I fixed a bunch of these, although I don't pretend to have fixed all cases.
> All * and & should be next to the argument names. Per our style guidelines.
...as well as these when I noticed them. I fixed your other suggestions. ----------- Darin's review:
> The mix of signed and unsigned integers is starting to get a little awkward.
It's unfortunate that String uses both unsigned and int to refer to string indices. With this design, it is impossible for any code using strings to be very consistent. I changed some things when I got unsigneds out of string to use int. I have changed some of these back and used static cast on string.length instead. The code is a little more consistent now.
> encodeHostnames and substituteBackslashes are examples of this.
I replaced these functions with the versions from the patch in the middle I gave up on. These worked well, and were not the cause of the problems I reported (the problems were in KURL::parse). The rest of the functions still use DeprecatedString when concatenating, with the exception of substituteBackslashes. This function isn't very commonly used, it only does one string concat, and to replace, I would have to write my own version of replace() which seemed like a waste of time to me. I can change this too if you really want.
> - ASSERT(!pageURLOriginal.isNull());
>
> Was this assertion not actually guaranteed? If so, then I understand the bug > fix. If not, then I'm puzzled by why you made the change here.
That's correct. The code that calls it is closed source, but I think it gets the URL from some frame or loader related class, and blindly passes it to this function. Because old KURL would lose the NULLness of strings, it would be an empty but non-NULL URL in some cases when now it is NULL when it was constructed as a NULL KURL (not sure exactly what cases those are). Was this assertion not actually guaranteed? If so, then I understand the bug fix. If not, then I'm puzzled by why you made the change here.
> +static bool isAllASCII(const String& str) > ... > This function already exists in RenderText.cpp, although it works on a > StringImpl rather than a String. I would prefer that we unify and share a > single one rather than adding another. The RenderText.cpp one uses an algorithm > that's considerably faster for long strings where characters actually are all > ASCII, and since all-ASCII is the normal case I'd prefer to do it that way.
FIXME FIGURE THIS OUT
> + dest[i] = static_cast<char>(src[i]);
>
> There's no need for an explicit cast here. I don't know of any compiler WebKit > supports that gives a warning for narrowing conversions like this one. And > while it would be nice to be explicit that there's a type conversion going on, > I think the general guideline to avoid casts when possible is probably slightly > more important.
You are mistaken, Visual C++ gives: warning C4244: '=' : conversion from 'unsigned short' to 'char', possible loss of data. I think it's very important when you are losing data to know that you meant to lose it when you read the code, even if the compiler issued no warning.
> +unsigned findFirstOf(const String& s, unsigned startPos, const char* toFind)
>
> I would prefer that if you need something like this that you add it to the > String class rather than put in a local function inside KURL. Or if you really > think it should be local, it should be marked static so it gets internal > linkage.
I forgot the static here, I have added it. This is used on Vectors as well, so I can't add it to the String class. I think a general problem here with this function, isAllASCII, and substituteBackslashes is that you are requiring that String not be used for this type of manipulation (which I understand), while WebKit lacks the infrastructure to do anything beyond stdlib outside of the String/StringImpl class. I think there are four options: move all StringImpl's guts out to a file that takes lengths and offsets, make StringImpl able to use a buffer it doesn't own, have a nice place to put this type of function outside of String[Impl], or accumulate a whole lot of custom-written string manipulation functions in each file. Fixing this is decidedly outside the scope of this patch, but should be considered soon.
> + // Note that we don't pass anything as the second argument. When the > + // innards of KURL are converted from DeprecatedString to String, this > + // should be put back. It is an optimization that prevents allocating > + // another string in the common case that nothing changes.
>
> Are you sure it's OK to lose this optimization? I'm pretty sure it's a real > issue. We added this optimization for a reason, and I don't think we should > take it out lightly.
I think this is a nice optimization. See my previous comment about why I don't think this is practical in this patch. Plus, since probably half of the callers of the string class have a DeprecatedString to start out with, you already lost the optimization for those callers. The cost of not doing this is only the cost of doing a String<->DeprecatedString conversion, which is also done all over. I think an even better way of improving this would be to reduce the number of KURL->String->KURL operations. A large number of the cases where the string doesn't change and this optimization could apply are a result of this sequence of events. Making more things use KURL instead of String would make it even faster than this optimization. I have filed a few bugs on this and am hoping to do some work in this area once this patch is in. After talking to MacDome, I think fixing
bug 17098
would be a faster and, surprisingly, a less risky way of fixing this optimization. I would consider my KURL-to-use-String-internally patch pretty risky without any tests.
> + for (i = 0; i < schemeEndPos; i++) { > + if (!proto[i] || toASCIILower(urlString[i].unicode()) != proto[i]) > + return false; > + } > + return proto[i] == 0; // We should have consumed all characters in the > argument.
>
> Why use ! inside the loop and == 0 outside the loop? I'd prefer that you be > consistent. I think we should add an assert here that the passed-in protocol > does not contain any ASCII uppercase characters or non-ASCII characters. I also > think that the name "proto" isn't so great, since it could easily mean > "prototype" instead. Why not use the entire word protocol?
I fixed the name and 0 comparisons, and added the assertion you requested. I fixed the grammar and spelling errors, and placement of '&' in function prototypes.
> + m_frame->loader()->begin(KURL("
http://localhost/placeholder.svg
")); // > create the empty document
> Why is http a good idea here? Can we come up with something better? I think it > would be bad if something decided to do I/O and talk to the local HTTP server! > That could well be a valid URL to a different SVG.
Do you think KURL() would be OK here? I changed it to this. --------- The main thing I didn't address is where isAllASCII should live. I would like some specific guidance on this given that I don't always have a String to call it on.
Darin Adler
Comment 7
2008-02-06 06:57:37 PST
(In reply to
comment #6
)
> I think it's very important when you are losing data to know that you meant to > lose it when you read the code, even if the compiler issued no warning.
Not to be pedantic, but I don't agree with this principle. We are often losing data when working with integers in C. For example. short x; short y; x = y + 1; The above loses data. You might need a larger integer type, perhaps int, to hold the value "y + 1". I think the cast in this case is only useful because the warning exists. Unnecessary casts have costs -- they can do unnecessary or incorrect type conversions or extra work and suppress useful warnings -- so I prefer to include them only when they have a clear benefit.
> I think a general problem here with this function, isAllASCII, and > substituteBackslashes is that you are requiring that String not be used for > this type of manipulation (which I understand), while WebKit lacks the > infrastructure to do anything beyond stdlib outside of the String/StringImpl > class. I think there are four options: move all StringImpl's guts out to a file > that takes lengths and offsets, make StringImpl able to use a buffer it doesn't > own, have a nice place to put this type of function outside of String[Impl], or > accumulate a whole lot of custom-written string manipulation functions in each > file.
I think that any StringImpl function that can run on characters outside a string that we need outside the string classes should be moved into a separately accessible free function. Because StringImpl is such an ugly name, I'd suggest putting those functions into String.h (sadly, its real name is PlatformString.h) like this: bool isAllASCII(const UChar*, int length); They can be called from StringImpl.cpp and String.cpp as needed, or in some cases we can have a shared inline version if function call overhead is too much. There's no need to do this all at once. If you need 2 or 3 functions, you can move them. We'll get there eventually as long as we are clear on what the direction is.
> > + // Note that we don't pass anything as the second argument. When the > > + // innards of KURL are converted from DeprecatedString to String, this > > + // should be put back. It is an optimization that prevents allocating > > + // another string in the common case that nothing changes. > > > > Are you sure it's OK to lose this optimization? I'm pretty sure it's a real > > issue. We added this optimization for a reason, and I don't think we should > > take it out lightly. > > I think this is a nice optimization. See my previous comment about why I don't > think this is practical in this patch. Plus, since probably half of the callers > of the string class have a DeprecatedString to start out with, you already lost > the optimization for those callers. The cost of not doing this is only the cost > of doing a String<->DeprecatedString conversion, which is also done all over.
I really worry about arguments like this unless we can couple them with performance measurements. I don't want to end up in an intermediate state with a slowdown. Too many times I have believed my change was OK and it actually resulted in a measurable slowdown. I don't know how to mitigate that possibility for this patch.
> I think an even better way of improving this would be to reduce the number of > KURL->String->KURL operations.
Agreed. My concern is not about getting maximum possible future improvement, but rather about landing this patch without being in a slower state.
> After talking to MacDome, I think fixing
bug 17098
would be a faster and, > surprisingly, a less risky way of fixing this optimization. I would consider my > KURL-to-use-String-internally patch pretty risky without any tests.
I'm pretty skeptical about adding the new library layer and getting faster. I believe you did performance measurements and things were slower, not faster. Patches for WebKit need to make the project faster or keep them the same speed except under very unusual circumstances.
> > + m_frame->loader()->begin(KURL("
http://localhost/placeholder.svg
")); // > > create the empty document > > > Why is http a good idea here? Can we come up with something better? I think it > > would be bad if something decided to do I/O and talk to the local HTTP server! > > That could well be a valid URL to a different SVG. > > Do you think KURL() would be OK here? I changed it to this.
Probably.
> The main thing I didn't address is where isAllASCII should live. I would like > some specific guidance on this given that I don't always have a String to call > it on.
As I said above, PlatformString.h seems a fine place for now. It's not hard to move it later.
Darin Adler
Comment 8
2008-02-06 07:08:54 PST
(In reply to
comment #7
)
> > > + // Note that we don't pass anything as the second argument. When the > > > + // innards of KURL are converted from DeprecatedString to String, this > > > + // should be put back. It is an optimization that prevents allocating > > > + // another string in the common case that nothing changes. > > > > > > Are you sure it's OK to lose this optimization? I'm pretty sure it's a real > > > issue. We added this optimization for a reason, and I don't think we should > > > take it out lightly. > > > > I think this is a nice optimization. See my previous comment about why I don't > > think this is practical in this patch. Plus, since probably half of the callers > > of the string class have a DeprecatedString to start out with, you already lost > > the optimization for those callers. The cost of not doing this is only the cost > > of doing a String<->DeprecatedString conversion, which is also done all over. > > I really worry about arguments like this unless we can couple them with > performance measurements. I don't want to end up in an intermediate state with > a slowdown.
Let me be more clear here. It may be that this optimization only matters at exactly one call site, or a very small number of call sites. To prove to ourselves that we're OK, we need some kind of measurement. Perhaps a count of the number of allocations in some browsing session with and without this change or something along those lines would work. We need some kind of testing, not just "cross our fingers" mode. You're probably right that almost all callers don't depend on this optimization, but there may be a single heavily used callsite that currently does. If you can disprove this then I have no objection to getting rid of it. And I suggest removing it entirely rather than leaving it half in place. Another way to test the impact would be to set up the i-Bench HTML test suite -- it's challenging but possible to do this. (At Apple we have an internal page load speed test that we sometimes use to check changes like this to see if they slow things down, but we haven't been able to find one that doesn't rely on material that we don't hold the copyright to. The WebKit project doesn't have a shared open source speed test for web browsing yet.) I'll review your new patch as soon as I get a chance. PS: I also don't know which previous comment you're referring to when you say "why I don't think this is practical". But maybe that will become more clear when I re-review the patch and re-read the comments.
Darin Adler
Comment 9
2008-02-06 07:10:46 PST
(In reply to
comment #8
)
> PS: I also don't know which previous comment you're referring to when you say > "why I don't think this is practical". But maybe that will become more clear > when I re-review the patch and re-read the comments.
I see now. You tried again and it didn't work at all well.
Eric Seidel (no email)
Comment 10
2008-02-08 14:16:23 PST
Comment on
attachment 18947
[details]
New patch addressing most comments I think the parse() optimization hack should probably be pushed down into parse (so that it's in one place, instead of at all the call sites). the parse() signature would become: String parse(char*, StringImpl*), and the StringImpl* would just be ignored until parse() is converted to use Stirng/UChar internally. I wonder if it wouldn't be helpful to convert document->url() to KURL before this patch. I could potentially do that for you if you need. In general the patch looks fine. Darin should take a peak.
Eric Seidel (no email)
Comment 11
2008-02-08 14:18:40 PST
Comment on
attachment 18947
[details]
New patch addressing most comments Regarding the perf concern. I'm not sure you can do much about it. We can't be scared of this code. There is no externally-available Page Load Test (similar to moz's page cycler test), so one of the apple folks will just have to validate if it's faster/slower. I expect it won't be any slower. This patch gets rid of string conversions if anything.
Darin Adler
Comment 12
2008-02-09 23:03:46 PST
(In reply to
comment #11
)
> (From update of
attachment 18947
[details]
[edit]) > There is no externally-available Page Load Test > (similar to moz's page cycler test), so one of the apple folks will just have > to validate if it's faster/slower.
Eric, is it really true that there's no way for anyone to do any performance tests outside Apple? What about my suggestion above of doing some web browsing and counting the number of calls to the memory allocator?
Eric Seidel (no email)
Comment 13
2008-02-09 23:49:49 PST
(In reply to
comment #12
)
> (In reply to
comment #11
) > > (From update of
attachment 18947
[details]
[edit] [edit]) > > There is no externally-available Page Load Test > > (similar to moz's page cycler test), so one of the apple folks will just have > > to validate if it's faster/slower. > > Eric, is it really true that there's no way for anyone to do any performance > tests outside Apple?
My interpretation of the status quo: Passing the Apple PLT w/o regression would be sufficient for performance testing. Passing some other test of performance but not passing the Apple PLT would not be sufficient. If Marvin is capable of writing such a patch, I expect he's probably capable of coming up with some way to test it too. But since we have the PLT and it (has at least historically been) "the final say", we might as well just run it against the patch.
Darin Adler
Comment 14
2008-02-10 07:27:03 PST
I've applied the patch, and I'm doing some performance testing now. I don't understand the status of this relative to the two bugs blocking this. Do we need fixes for those? Are they included in this patch?
Darin Adler
Comment 15
2008-02-10 07:34:09 PST
My performance test indicates a 0.7% slowdown in page loading from this change. But I need to do some debugging too. I see this message: 2008-02-10 07:32:00.195 Safari[98730:10b] *** -[NSObject URL]: unrecognized selector sent to instance 0x1cc7480 2008-02-10 07:32:00.196 Safari[98730:10b] *** WebKit discarded an uncaught exception in the webView:resource:didFailLoadingWithError:fromDataSource: delegate: <NSInvalidArgumentException> *** -[NSObject URL]: unrecognized selector sent to instance 0x1cc7480 I'll figure out next what's causing that.
Darin Adler
Comment 16
2008-02-10 07:40:08 PST
With this patch applied, and a debug version of Safari, I immediately hit an assertion failure inside Safari. NSURLConnection is returning error -1002 "unsupported URL" so a load is failing, and then Safari's activity window decided that there's an inconsistency of some sort. I'll do some debugging to learn more.
Darin Adler
Comment 17
2008-02-10 07:46:30 PST
The problem is that the string for iconURL() is returning ":". I'll figure out why.
Darin Adler
Comment 18
2008-02-10 07:52:06 PST
The bug was here: - if (m_URL.protocol() != "http" && m_URL.protocol() != "https") - return ""; + if (m_URL.protocolIs("http") && m_URL.protocolIs("https")) + return KURL(); The above change reverses the sense of the protocol check.
Darin Adler
Comment 19
2008-02-10 08:07:09 PST
Created
attachment 19035
[details]
patch rebased and with backwards protocolIs fixed I tested this version of the patch carefully. I still see 0.7% slowdown. We need to come up with a version that's a speedup rather than a slowdown.
Darin Adler
Comment 20
2008-02-10 10:52:25 PST
I've been working on a version of the patch that *does* change KURL to use String internally. It going well. I measured a 1.3% speedup with this one. It's not done, but I'll attach my work in progress.
Darin Adler
Comment 21
2008-02-10 10:54:12 PST
Created
attachment 19037
[details]
more ambitious approach, eliminating more DeprecatedString Assigning this to myself for now, since this is nearly ready to review I think.
Brett Wilson (Google)
Comment 22
2008-02-11 08:52:35 PST
(In reply to
comment #21
)
> Created an attachment (id=19037) [edit] > more ambitious approach, eliminating more DeprecatedString > Assigning this to myself for now, since this is nearly ready to review I think.
OK! :) I hope my work was at least moderately helpful.
Darin Adler
Comment 23
2008-02-11 09:22:04 PST
(In reply to
comment #22
)
> I hope my work was at least moderately helpful.
Sure, I started with your work and left most of it as-is.
Darin Adler
Comment 24
2008-02-12 13:15:32 PST
Created
attachment 19095
[details]
patch
Eric Seidel (no email)
Comment 25
2008-02-12 14:02:06 PST
Comment on
attachment 19095
[details]
patch You're going for the "largest patch of the year" award it would seem. Sad to see all these style changes mixed in. :(
Darin Adler
Comment 26
2008-02-12 16:25:08 PST
Comment on
attachment 19095
[details]
patch It was pretty hard to whip the patch into shape, so I fell a little short of my usual standards for single-purpose patches. One big challenge was making sure the patch was a speedup, not a slowdown. It would be practical to break off various pieces (KJS namespace change, style tweaks, additions to PlatformString.h, specific cases of transition away from DeprecatedString that aren't involved with KURL). When someone reviews this I'll look for specific advice about that. The meaty bit of the patch is still pretty big though.
Eric Seidel (no email)
Comment 27
2008-02-12 16:55:22 PST
(In reply to
comment #26
)
> (From update of
attachment 19095
[details]
[edit]) > It was pretty hard to whip the patch into shape, so I fell a little short of my > usual standards for single-purpose patches.
It's not a big deal. I look forward to reviewing it... I just will have to find time this week to set aside the 1-2 hours required.
Eric Seidel (no email)
Comment 28
2008-02-13 23:53:17 PST
Comment on
attachment 19095
[details]
patch I survived! Seems odd to have this outside of setPort: case Port: { 180 int port = str.toInt(); 181 if (port < 0 || port > 0xFFFF) 182 port = 0; 183 url.setPort(port); 184 break; 185 } I think you should s/string/url/ on these comments: KJS::JSValue* jsStringOrNull(const KURL&); // null if the string is null These could be one line ifs: 102 if (HTMLAnchorElement* anchor = [self anchorElement]) { 1103 NSURL *href = anchor->href(); 1104 return href; 1105 } } else if (m_renderer->isImage() && m_renderer->element() && m_renderer->element()->hasTagName(imgTag)) { 1107 NSURL *src = static_cast<HTMLImageElement*>(m_renderer->element())->src(); 1108 return src; 1109 } Better if you could file a bug: 2185 // FIXME: What's the point of changing the value of a local variable 2186 // (title) that's not looked at afterward? Obviously there's no test 2187 // case for this! 2188 if (!n->isHTMLElement()) { 2189 String modifiedTitle = title.domString(); 2190 title = modifiedTitle.replace('&', "&&"); 2191 } 21872192 #endif Sigh. Sadly KURL Node::baseURI() const is yet-another recursive walk through your parent chain using virtual methods. :( We might want to add mailtoProtocol, httpProtocol, ftpProtocol AtomicStrings some day. Missing space before ':' page->chrome()->addMessageToConsole(HTMLMessageSource, 1481 isWarning(errorCode) ? WarningMessageLevel: ErrorMessageLevel, 1482 message, lineNumber, document->url().string()); Those would be different depending on how it's exposed to the DOM bindings: if (m_frame->tree() && m_frame->tree()->parent()) - return ""; + return KURL(); Seems like something we should test: KURL r = dl->doc()->url(); - if (r.protocol().startsWith("http") && r.path().isEmpty()) + if ((r.protocolIs("http") || r.protocolIs("https")) && r.path().isEmpty()) r.setPath("/"); Why? - DeprecatedString domain = r.host(); - if (dl->doc()->isHTMLDocument()) - domain = static_cast<HTMLDocument*>(dl->doc())->domain().deprecatedString(); Looks good to me!
Brett Wilson (Google)
Comment 29
2008-02-14 08:17:20 PST
> Seems like something we should test: > KURL r = dl->doc()->url(); > - if (r.protocol().startsWith("http") && r.path().isEmpty()) > + if ((r.protocolIs("http") || r.protocolIs("https")) && > r.path().isEmpty()) > r.setPath("/");
This seems a little suspicious to me. I think KURL should canonicalize empty schemes to "/" for all "hierarchical" URLs in parse(). If you can get here with an empty path for http, it seems like you could get here with an empty path for ftp, etc.
Darin Adler
Comment 30
2008-02-14 09:35:44 PST
(In reply to
comment #29
)
> > Seems like something we should test: > > KURL r = dl->doc()->url(); > > - if (r.protocol().startsWith("http") && r.path().isEmpty()) > > + if ((r.protocolIs("http") || r.protocolIs("https")) && > > r.path().isEmpty()) > > r.setPath("/"); > > This seems a little suspicious to me. I think KURL should canonicalize empty > schemes to "/" for all "hierarchical" URLs in parse(). If you can get here > with an empty path for http, it seems like you could get here with an empty > path for ftp, etc.
It's worth looking into. Note that the fact that there is no such thing as an empty path is a property of the HTTP protocol; I'm not certain that it's necessarily true of all hierarchical schemes. I believe it's possible that you could have an FTP server that could handle "/" different from no path at all. That's not possible with HTTP, though, since the path has to be sent in the GET or POST, and there's no syntax for an empty path. Or maybe it's true of FTP too, but not necessarily of every single hierarchical scheme. I'm not saying that what you're suggesting is wrong, but rather that it needs to be considered carefully. Moving the logic into KURL would be great! I don't like where it currently is. I'd be happy to see this simplified and resolved further, although that seems a separate issue from this patch. The attempt with this patch is to make a more efficient implementation *without* changing semantics. We can also fix bugs having to do with incorrect semantics, and we should do it!
Darin Adler
Comment 31
2008-02-14 09:41:56 PST
(In reply to
comment #28
)
> Seems odd to have this outside of setPort: > case Port: { > 180 int port = str.toInt(); > 181 if (port < 0 || port > 0xFFFF) > 182 port = 0; > 183 url.setPort(port); > 184 break; > 185 }
The setPort function takes an unsigned short. That's why the caller needs to check that the integer value fits into an unsigned short. This is simply "type conversion" code that knows that out of range values need to change to zeroes. There are probably many more-elegant ways of structuring this. If you have a suggestion, I'd be happy to improve it!
> I think you should s/string/url/ on these comments: > KJS::JSValue* jsStringOrNull(const KURL&); // null if the string is null
Will do.
> These could be one line ifs: > 102 if (HTMLAnchorElement* anchor = [self anchorElement]) { > 1103 NSURL *href = anchor->href(); > 1104 return href; > 1105 } > } else if (m_renderer->isImage() && m_renderer->element() && > m_renderer->element()->hasTagName(imgTag)) { > 1107 NSURL *src = > static_cast<HTMLImageElement*>(m_renderer->element())->src(); > 1108 return src; > 1109 }
Actually, they could not. We need the local variable to convert the type from KURL to NSURL *. The return type is "id", which is not specific enough. I originally wrote these as single lines. I decided that a local variable was superior to an explicit static_cast or a function call of some sort.
> Better if you could file a bug: > 2185 // FIXME: What's the point of changing the value of a > local variable > 2186 // (title) that's not looked at afterward? Obviously > there's no test > 2187 // case for this! > 2188 if (!n->isHTMLElement()) { > 2189 String modifiedTitle = title.domString(); > 2190 title = modifiedTitle.replace('&', "&&"); > 2191 } > 21872192 #endif
Will do.
> Sigh. Sadly KURL Node::baseURI() const is yet-another recursive walk through > your parent chain using virtual methods. :(
Very sad, yes.
> We might want to add mailtoProtocol, httpProtocol, ftpProtocol AtomicStrings > some day.
Could be. But the protocolIs function wants a const char*; an AtomicSting would not be helpful for those callsites. Perhaps some constants in the KURL header of type const char* would be good.
> Missing space before ':' > page->chrome()->addMessageToConsole(HTMLMessageSource, > 1481 isWarning(errorCode) ? WarningMessageLevel: ErrorMessageLevel, > 1482 message, lineNumber, document->url().string());
Will fix.
> Those would be different depending on how it's exposed to the DOM bindings: > if (m_frame->tree() && m_frame->tree()->parent()) > - return ""; > + return KURL();
Good point. The function is iconURL() and it's an internal function, not exposed to the DOM at all. I looked at the callers when I made this change.
> Seems like something we should test: > KURL r = dl->doc()->url(); > - if (r.protocol().startsWith("http") && r.path().isEmpty()) > + if ((r.protocolIs("http") || r.protocolIs("https")) && > r.path().isEmpty()) > r.setPath("/");
I'm not sure what you mean exactly. The old code checked for both "http" and "https" using startsWith. It would also handle "httpasdfads", which was silly. I don't think this merits a test case. I do think it's not really very good code. But I preserved its old semantics pretty closely.
> Why? > - DeprecatedString domain = r.host(); > - if (dl->doc()->isHTMLDocument()) > - domain = > static_cast<HTMLDocument*>(dl->doc())->domain().deprecatedString();
I mentioned this in the change log. The domain string was computed, but never used.
Darin Adler
Comment 32
2008-02-14 18:15:36 PST
Created
attachment 19132
[details]
patch, now with fewer broken platforms
Darin Adler
Comment 33
2008-02-16 08:18:24 PST
http://trac.webkit.org/projects/webkit/changeset/30243
Brett Wilson (Google)
Comment 34
2008-02-28 16:45:53 PST
Thanks for all the work, Darin!
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug