RESOLVED FIXED 228122
Add functionalities for parsing URL query string
https://bugs.webkit.org/show_bug.cgi?id=228122
Summary Add functionalities for parsing URL query string
Risul Islam
Reported 2021-07-20 13:28:43 PDT
It would be useful to have more ways to get information about URL query parameters.
Attachments
Patch (17.32 KB, patch)
2021-07-26 10:40 PDT, Risul Islam
no flags
Patch (19.96 KB, patch)
2021-07-29 10:23 PDT, Risul Islam
no flags
Patch (17.13 KB, patch)
2021-07-30 10:06 PDT, Risul Islam
no flags
Patch (17.27 KB, patch)
2021-07-30 16:05 PDT, Risul Islam
no flags
Patch (20.07 KB, patch)
2021-08-01 21:44 PDT, Risul Islam
no flags
Patch (16.82 KB, patch)
2021-08-03 12:05 PDT, Risul Islam
no flags
Patch (17.43 KB, patch)
2021-08-03 14:53 PDT, Risul Islam
no flags
Patch (17.43 KB, patch)
2021-08-03 15:23 PDT, Risul Islam
ews-feeder: commit-queue-
Risul Islam
Comment 1 2021-07-26 10:40:20 PDT
John Wilander
Comment 2 2021-07-26 11:14:49 PDT
Comment on attachment 434214 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434214&action=review Good work. I'll start with comments on the meat of your patch. > Source/WTF/wtf/URL.cpp:1171 > + // Assumption 1: Parameters are seperated by '&' although ';' is also possible and ';' is not handled here. You don't seem to have further assumptions so the number is superfluous. I'd go with "Assumes that parameters are …" if we're keeping this comment. I would prefer a reference to a spec though. https://datatracker.ietf.org/doc/html/rfc3986/#section-3.4 is a good starting point. You'd have to find some reference to the use of semicolons. > Source/WTF/wtf/URL.cpp:1172 > + Vector<QueryParameter> queryParametersKeyValues; This can be declared after the early return. > Source/WTF/wtf/URL.cpp:1178 > + return queryParametersKeyValues; I think this can return an { } instead and you can push the declaration of queryParametersKeyValues even further down. > Source/WTF/wtf/URL.cpp:1185 > + tempKeyValues.setValue(""); This may able too use emptyString. > Source/WTF/wtf/URL.cpp:1199 > +std::optional<Vector<URL::QueryParameter>> URL::differenceInQueryParameters(URL matchingURL) const I would call this function differingQueryParameters() since that sounds more like the getter it is. > Source/WTF/wtf/URL.cpp:1207 > + if (!thisQueryParameters || !matchingQueryParameters) I would write these as two early returns to not have to do all the work upfront. Like this: if (!thisQueryParameters = queryParameters()) return std::nullopt; if (!matchingQueryParameters = matchingURL.queryParameters()) return std::nullopt; > Source/WTF/wtf/URL.cpp:1212 > + return diffResult; You can probably return { } here and push the declaration of diffResult further down. > Source/WTF/wtf/URL.cpp:1215 > + return matchingQueryParameters; Ditto. > Source/WTF/wtf/URL.cpp:1218 > + return thisQueryParameters; Ditto. > Source/WTF/wtf/URL.cpp:1220 > + HashMap<URL::QueryParameter, int> frequencyOfParameters; The purpose of this code is unclear. I wonder if we can find a more verbose variable name? > Source/WTF/wtf/URL.cpp:1227 > + Here's where you should declare diffResult. > Source/WTF/wtf/URL.cpp:1249 > +std::optional<bool> URL::isEqualQuery(URL matchingURL) const I'd go with isQueryEqual(). > Source/WTF/wtf/URL.cpp:1254 > + std::optional<Vector<QueryParameter>> difference = differenceInQueryParameters(matchingURL); Use auto. > Source/WTF/wtf/URL.cpp:1255 > + if (!difference.value().size()) I think this may fail if you get a nullopt. Regardless, you can just do: return !difference || !difference->size(); > Source/WTF/wtf/URL.cpp:1267 > + unsigned thisPathLength = m_string[m_pathEnd-1] == '/' ? m_pathEnd-1 : m_pathEnd; Use auto. > Source/WTF/wtf/URL.cpp:1268 > + unsigned matchingPathLength = matchingURL.m_string[matchingURL.m_pathEnd-1] == '/' ? matchingURL.m_pathEnd-1 : matchingURL.m_pathEnd; Ditto. > Source/WTF/wtf/URL.cpp:1269 > + String thisURLWithoutQuery = m_string.substring(0, thisPathLength); Ditto. > Source/WTF/wtf/URL.cpp:1270 > + String matchingURLWithoutQuery = matchingURL.m_string.substring(0, matchingPathLength); Ditto. > Source/WTF/wtf/URL.cpp:1272 > + return thisURLWithoutQuery == matchingURLWithoutQuery; Can we do this without creating two new String objects?
Kate Cheney
Comment 3 2021-07-26 12:06:04 PDT
Comment on attachment 434214 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434214&action=review > Source/WTF/wtf/URL.cpp:1169 > +std::optional<Vector<URL::QueryParameter>> URL::queryParameters() const I wonder if this could return an empty vector for an invalid URL instead of being optional. >> Source/WTF/wtf/URL.cpp:1171 >> + // Assumption 1: Parameters are seperated by '&' although ';' is also possible and ';' is not handled here. > > You don't seem to have further assumptions so the number is superfluous. I'd go with "Assumes that parameters are …" if we're keeping this comment. > > I would prefer a reference to a spec though. https://datatracker.ietf.org/doc/html/rfc3986/#section-3.4 is a good starting point. You'd have to find some reference to the use of semicolons. I agree with John about removing the number. I don't really think a comment is necessary, it is quite clear in the code that parameters are being split by &. > Source/WTF/wtf/URL.cpp:1176 > + String queryPart = m_queryEnd == m_pathEnd ? "" : m_string.substring(m_pathEnd + 1, m_queryEnd - (m_pathEnd + 1)); Can you call query() here to avoid code duplication? > Source/WTF/wtf/URL.cpp:1180 > + for (String keyValuePair : queryPart.splitAllowingEmptyEntries('&')) { You can change String to be auto, it is shorter. >> Source/WTF/wtf/URL.cpp:1199 >> +std::optional<Vector<URL::QueryParameter>> URL::differenceInQueryParameters(URL matchingURL) const > > I would call this function differingQueryParameters() since that sounds more like the getter it is. matchingURL sounds like the URLs are the same. I would also change it to be something else, like otherURL. > Source/WTF/wtf/URL.cpp:1205 > + std::optional<Vector<QueryParameter>> matchingQueryParameters = matchingURL.queryParameters(); Ditto, I would change this to be otherQueryParameters. You can also use auto for both of these. > Source/WTF/wtf/URL.cpp:1210 > + Vector<QueryParameter> diffResult; I would avoid abbreviations as much as possible (that is a part of our style guidelines), so I would recommend changing this vector name. > Source/WTF/wtf/URL.cpp:1211 > + if (!thisQueryParameters.value().size() && !matchingQueryParameters.value().size()) Vector has an isEmpty() function that you can use here. > Source/WTF/wtf/URL.cpp:1214 > + if (!thisQueryParameters.value().size()) Ditto, using isEmpty() here would be easier to read. > Source/WTF/wtf/URL.cpp:1217 > + if (!matchingQueryParameters.value().size()) Ditto. > Source/WTF/wtf/URL.cpp:1221 > + for (QueryParameter keyValue : thisQueryParameters.value()) { auto& instead of QueryParameter. Also, the name "keyValue" gets confusing when you later are trying to set values in the map. I would stick to key or queryParameter instead. > Source/WTF/wtf/URL.cpp:1222 > + if (!frequencyOfParameters.contains(keyValue)) I *think* you can replace this if-else statement with the HashMap's ensure() function if you want, which creates a new entry if it doesn't exist, something like: auto& frequency = frequencyOfParameters.ensure(keyValue, []() { return 0; }).iterator->value; ++frequency; > Source/WTF/wtf/URL.cpp:1228 > + for (QueryParameter keyValue : matchingQueryParameters.value()) { Ditto about auto& and the keyValue naming. > Source/WTF/wtf/URL.cpp:1239 > + for (QueryParameter keyvalue : frequencyOfParameters.keys()) { auto&, and keyValue should have a name change or at least be camel case. > Source/WTF/wtf/URL.cpp:1240 > + if (frequencyOfParameters.get(keyvalue) > 0) { No need for > 0, you can just check if (frequencyOfParameters.get(keyvalue) > Source/WTF/wtf/URL.cpp:1246 > + return diffResult; This does not address query parameter ordering. should we consider ?key1=value1&key2=value2 the same as ?key2=value2&key1=value1 ? > Source/WTF/wtf/URL.cpp:1252 > + return std::nullopt; I don't think this should return an optional. I think if either or both URLs are invalid return false. >> Source/WTF/wtf/URL.cpp:1255 >> + if (!difference.value().size()) > > I think this may fail if you get a nullopt. Regardless, you can just do: > return !difference || !difference->size(); You can use isEmpty() here. > Source/WTF/wtf/URL.cpp:1265 > + return std::nullopt; I don't think this should return an optional. I think if either or both URLs are invalid return false.
Alex Christensen
Comment 4 2021-07-26 12:18:30 PDT
Comment on attachment 434214 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434214&action=review >> Source/WTF/wtf/URL.cpp:1269 >> + String thisURLWithoutQuery = m_string.substring(0, thisPathLength); > > Ditto. You don't need to allocate and copy the substrings just to check if they are equal. Use StringView. > Source/WTF/wtf/URL.h:229 > + WTF_EXPORT_PRIVATE std::optional<Vector<QueryParameter>> queryParameters() const; This should probably just use URLParser::parseURLEncodedForm instead of adding this. That also returns a URLEncodedForm, which basically makes your QueryParameter class unnecessary.
Radar WebKit Bug Importer
Comment 5 2021-07-27 13:29:18 PDT
Risul Islam
Comment 6 2021-07-28 16:08:14 PDT
Comment on attachment 434214 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434214&action=review >> Source/WTF/wtf/URL.cpp:1169 >> +std::optional<Vector<URL::QueryParameter>> URL::queryParameters() const > > I wonder if this could return an empty vector for an invalid URL instead of being optional. That would be good. Done. >>> Source/WTF/wtf/URL.cpp:1171 >>> + // Assumption 1: Parameters are seperated by '&' although ';' is also possible and ';' is not handled here. >> >> You don't seem to have further assumptions so the number is superfluous. I'd go with "Assumes that parameters are …" if we're keeping this comment. >> >> I would prefer a reference to a spec though. https://datatracker.ietf.org/doc/html/rfc3986/#section-3.4 is a good starting point. You'd have to find some reference to the use of semicolons. > > I agree with John about removing the number. I don't really think a comment is necessary, it is quite clear in the code that parameters are being split by &. on it. >> Source/WTF/wtf/URL.cpp:1172 >> + Vector<QueryParameter> queryParametersKeyValues; > > This can be declared after the early return. Agreed. Done. >> Source/WTF/wtf/URL.cpp:1176 >> + String queryPart = m_queryEnd == m_pathEnd ? "" : m_string.substring(m_pathEnd + 1, m_queryEnd - (m_pathEnd + 1)); > > Can you call query() here to avoid code duplication? yes. on it. Done. >> Source/WTF/wtf/URL.cpp:1178 >> + return queryParametersKeyValues; > > I think this can return an { } instead and you can push the declaration of queryParametersKeyValues even further down. yes. Done. >> Source/WTF/wtf/URL.cpp:1180 >> + for (String keyValuePair : queryPart.splitAllowingEmptyEntries('&')) { > > You can change String to be auto, it is shorter. Done using "auto" whenever applicable. >> Source/WTF/wtf/URL.cpp:1185 >> + tempKeyValues.setValue(""); > > This may able too use emptyString. Used another function (URLParser::parseURLEncodedForm) suggested by Alex. >>> Source/WTF/wtf/URL.cpp:1199 >>> +std::optional<Vector<URL::QueryParameter>> URL::differenceInQueryParameters(URL matchingURL) const >> >> I would call this function differingQueryParameters() since that sounds more like the getter it is. > > matchingURL sounds like the URLs are the same. I would also change it to be something else, like otherURL. Done using other and differingQueryParameters. >> Source/WTF/wtf/URL.cpp:1205 >> + std::optional<Vector<QueryParameter>> matchingQueryParameters = matchingURL.queryParameters(); > > Ditto, I would change this to be otherQueryParameters. You can also use auto for both of these. yes. >> Source/WTF/wtf/URL.cpp:1207 >> + if (!thisQueryParameters || !matchingQueryParameters) > > I would write these as two early returns to not have to do all the work upfront. Like this: > if (!thisQueryParameters = queryParameters()) > return std::nullopt; > if (!matchingQueryParameters = matchingURL.queryParameters()) > return std::nullopt; sure. btw, I really like this code style. >> Source/WTF/wtf/URL.cpp:1210 >> + Vector<QueryParameter> diffResult; > > I would avoid abbreviations as much as possible (that is a part of our style guidelines), so I would recommend changing this vector name. it would be good. Done. >> Source/WTF/wtf/URL.cpp:1211 >> + if (!thisQueryParameters.value().size() && !matchingQueryParameters.value().size()) > > Vector has an isEmpty() function that you can use here. Done using isEmpty() whenever possible. >> Source/WTF/wtf/URL.cpp:1220 >> + HashMap<URL::QueryParameter, int> frequencyOfParameters; > > The purpose of this code is unclear. I wonder if we can find a more verbose variable name? Used frequencyOfDistinctParameter >> Source/WTF/wtf/URL.cpp:1228 >> + for (QueryParameter keyValue : matchingQueryParameters.value()) { > > Ditto about auto& and the keyValue naming. Used singleQueryParameter instead of keyValue. >> Source/WTF/wtf/URL.cpp:1240 >> + if (frequencyOfParameters.get(keyvalue) > 0) { > > No need for > 0, you can just check if (frequencyOfParameters.get(keyvalue) Here, frequencyOfParameters.get(keyvalue) can also be negative. Rearranging the code so that it stays non-negative. >> Source/WTF/wtf/URL.cpp:1246 >> + return diffResult; > > This does not address query parameter ordering. should we consider ?key1=value1&key2=value2 the same as ?key2=value2&key1=value1 ? Addressed this issue. Now ensures ordering. >> Source/WTF/wtf/URL.cpp:1252 >> + return std::nullopt; > > I don't think this should return an optional. I think if either or both URLs are invalid return false. Done. >>> Source/WTF/wtf/URL.cpp:1269 >>> + String thisURLWithoutQuery = m_string.substring(0, thisPathLength); >> >> Ditto. > > You don't need to allocate and copy the substrings just to check if they are equal. Use StringView. on it. >> Source/WTF/wtf/URL.cpp:1272 >> + return thisURLWithoutQuery == matchingURLWithoutQuery; > > Can we do this without creating two new String objects? Done. >> Source/WTF/wtf/URL.h:229 >> + WTF_EXPORT_PRIVATE std::optional<Vector<QueryParameter>> queryParameters() const; > > This should probably just use URLParser::parseURLEncodedForm instead of adding this. That also returns a URLEncodedForm, which basically makes your QueryParameter class unnecessary. I am onto it. Done. Used the URLParser::parseURLEncodedForm. Works just the same and fine.
Risul Islam
Comment 7 2021-07-29 10:23:52 PDT
Alex Christensen
Comment 8 2021-07-29 10:39:44 PDT
Comment on attachment 434531 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434531&action=review > Source/WTF/wtf/URL.cpp:1177 > + if (queryPart.isEmpty()) > + return { }; Is this needed? > Source/WTF/wtf/URL.cpp:1231 > +bool URL::isEqualIgnoringQuery(URL otherURL) const Do we want to compare the fragments after the query? > Source/WTF/wtf/URL.cpp:1237 > + auto thisPathLength = m_string[m_pathEnd-1] == '/' ? m_pathEnd-1 : m_pathEnd; > + auto otherPathLength = otherURL.m_string[otherURL.m_pathEnd-1] == '/' ? otherURL.m_pathEnd-1 : otherURL.m_pathEnd; WebKit code style needs a space before and after the minus > Source/WTF/wtf/URL.cpp:1239 > + return m_string.substring(0, thisPathLength) == otherURL.m_string.substring(0, otherPathLength); Use StringViews. There's no need to allocate and copy just to compare. > Source/WTF/wtf/URL.cpp:1242 > +bool URL::isQueryEqual(URL otherURL) const This should probably be a const URL& to avoid an unnecessary copy every time this is called. Also, this should probably be renamed to containsSameQueryParameters or something like that to reflect the fact that query parameter order doesn't matter here. You should also add a test showing that http://example.com/?a=b&c=d and http://example.com/?c=d&a=b are equal here. > Source/WTF/wtf/URL.cpp:1245 > + return { }; This should probably be true or false instead of { }, which is less clear for booleans. > Source/WTF/wtf/URL.cpp:1255 > +void URL::removeQueryParameters(HashSet<String> keysToRemove) ditto. const reference. > Source/WTF/wtf/URL.cpp:1267 > + queryWithoutRemovalKey = queryWithoutRemovalKey + singleQueryParameter.key + "=" + singleQueryParameter.value + "&"; This should use makeString instead of operator+ to avoid all the intermediaries. > Source/WTF/wtf/URL.cpp:1272 > + queryWithoutRemovalKey = queryWithoutRemovalKey.substring(0, queryWithoutRemovalKey.length() - 1); You need to check if queryWithoutRemovalKey.length() is zero.
Risul Islam
Comment 9 2021-07-29 12:14:13 PDT
Comment on attachment 434531 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434531&action=review >> Source/WTF/wtf/URL.cpp:1177 >> + return { }; > > Is this needed? We can achieve the same functionality using query() and URLParser::parseURLEncodedForm() function. So, we can discard it. >> Source/WTF/wtf/URL.cpp:1231 >> +bool URL::isEqualIgnoringQuery(URL otherURL) const > > Do we want to compare the fragments after the query? For now, we are only interested in anything before query. I guess we can rename the function name to 'isEqualIgnoringQueryAndFragment'? >> Source/WTF/wtf/URL.cpp:1237 >> + auto otherPathLength = otherURL.m_string[otherURL.m_pathEnd-1] == '/' ? otherURL.m_pathEnd-1 : otherURL.m_pathEnd; > > WebKit code style needs a space before and after the minus On it. >> Source/WTF/wtf/URL.cpp:1239 >> + return m_string.substring(0, thisPathLength) == otherURL.m_string.substring(0, otherPathLength); > > Use StringViews. There's no need to allocate and copy just to compare. On it. >> Source/WTF/wtf/URL.cpp:1242 >> +bool URL::isQueryEqual(URL otherURL) const > > This should probably be a const URL& to avoid an unnecessary copy every time this is called. > Also, this should probably be renamed to containsSameQueryParameters or something like that to reflect the fact that query parameter order doesn't matter here. You should also add a test showing that http://example.com/?a=b&c=d and http://example.com/?c=d&a=b are equal here. Awesome idea, adding a test case. >> Source/WTF/wtf/URL.cpp:1245 >> + return { }; > > This should probably be true or false instead of { }, which is less clear for booleans. Right. >> Source/WTF/wtf/URL.cpp:1267 >> + queryWithoutRemovalKey = queryWithoutRemovalKey + singleQueryParameter.key + "=" + singleQueryParameter.value + "&"; > > This should use makeString instead of operator+ to avoid all the intermediaries. Great. That would make code clear. >> Source/WTF/wtf/URL.cpp:1272 >> + queryWithoutRemovalKey = queryWithoutRemovalKey.substring(0, queryWithoutRemovalKey.length() - 1); > > You need to check if queryWithoutRemovalKey.length() is zero. On it.
Risul Islam
Comment 10 2021-07-30 10:06:47 PDT
Alex Christensen
Comment 11 2021-07-30 11:31:48 PDT
Comment on attachment 434642 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434642&action=review Needs a bit more refinement. Also, your tests are missing queries with keys but no values, like http://example.com/?a=b&c&d=e > Source/WTF/wtf/KeyValuePair.h:62 > + We don't want to add empty whitespace here. > Source/WTF/wtf/URL.cpp:1172 > + Extra space. > Source/WTF/wtf/URL.cpp:1179 > + if (thisQueryParameters.isEmpty() && otherQueryParameters.isEmpty()) > + return { }; This is completely unnecessary, doesn't change any behavior, but does increase complexity of the code. Please remove. > Source/WTF/wtf/URL.cpp:1182 > + if (thisQueryParameters.isEmpty()) > + return otherQueryParameters; Probably also true with these, although these might be slight performance optimizations. > Source/WTF/wtf/URL.cpp:1188 > + HashMap<KeyValuePair<String, String>, int> frequencyOfDistinctParameter; Does the value of this map need to be signed? Could int be changed to unsigned or size_t? > Source/WTF/wtf/URL.cpp:1193 > + if (!frequencyOfDistinctParameter.contains(keyValue)) > + frequencyOfDistinctParameter.add(keyValue, 1); > + else > + frequencyOfDistinctParameter.set(keyValue, frequencyOfDistinctParameter.get(keyValue) + 1); This does up to 3 hash lookups when only one is necessary. Use HashMap::add(keyValue, 1) and look at the return value. If addResult.isNewEntry is true, then you're done. Otherwise, use addResult.iterator to increment the value. > Source/WTF/wtf/URL.cpp:1203 > + frequencyOfDistinctParameter.set(singleQueryParameter, frequencyOfDistinctParameter.get(singleQueryParameter) - 1); Ditto with the duplicate hash lookups. > Source/WTF/wtf/URL.cpp:1209 > + for (int i = 0; i < frequencyOfDistinctParameter.get(singleQueryParameter); i++) This does a get every loop iteration, when it should always return the same value, and I'm pretty sure the compiler optimizer doesn't realize it. Store the value outside the for loop. > Source/WTF/wtf/URL.cpp:1212 > + frequencyOfDistinctParameter.set(singleQueryParameter, 0); This is also a duplicate hash lookup. Use HashMap::take. > Source/WTF/wtf/URL.cpp:1224 > + auto thisPathLength = m_string[m_pathEnd - 1] == '/' ? m_pathEnd - 1 : m_pathEnd; I don't think this does what you think it does. You have a test with "http://www.webkit.org?" but when parsed and canonicalized it becomes "http://www.webkit.org/?". Add a test with a zero length path, like custom-scheme://host?a=b and http://example.com/?a=b > Source/WTF/wtf/URL.cpp:1225 > + auto otherPathLength = otherURL.m_string[otherURL.m_pathEnd - 1] == '/' ? otherURL.m_pathEnd - 1 : otherURL.m_pathEnd; Extra space. > Source/WTF/wtf/URL.cpp:1230 > +bool URL::containsSameQueryParameters(const URL& otherURL) const Do we want http://example.com/ and http://example.com/? to be considered to have the same query parameters? Add that test. > Source/WTF/wtf/URL.cpp:1239 > + if (queryDifference.isEmpty()) > + return true; > + > + return false; return queryDifference.isEmpty(); > Source/WTF/wtf/URL.cpp:1249 > + if (queryParametersList.isEmpty()) > + return; This is almost certainly unnecessary. > Source/WTF/wtf/URL.cpp:1254 > + queryWithoutRemovalKeys = makeString(queryWithoutRemovalKeys, singleQueryParameter.key, '=', singleQueryParameter.value, '&'); This. should actually be a StringBuilder instead. It's more efficient than this, which is currently an O(n^2) algorithm because of all the recopying of the current String. > Source/WTF/wtf/URL.cpp:1255 > + Extra space. > Source/WTF/wtf/URL.cpp:1259 > + queryWithoutRemovalKeys = queryWithoutRemovalKeys.substring(0, queryWithoutRemovalKeys.length() - 1); This is also a wasteful String allocation and copy. This could either be done with a StringView substring, which is just an integer subtraction, or by checking if queryWithoutRemovalKeys is empty in the loop and adding '&' before the new values if not.
Risul Islam
Comment 12 2021-07-30 12:14:44 PDT
Comment on attachment 434642 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434642&action=review >> Source/WTF/wtf/URL.cpp:1259 >> + queryWithoutRemovalKeys = queryWithoutRemovalKeys.substring(0, queryWithoutRemovalKeys.length() - 1); > > This is also a wasteful String allocation and copy. This could either be done with a StringView substring, which is just an integer subtraction, or by checking if queryWithoutRemovalKeys is empty in the loop and adding '&' before the new values if not. On all the above things.
Risul Islam
Comment 13 2021-07-30 14:05:50 PDT
Comment on attachment 434642 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434642&action=review >> Source/WTF/wtf/URL.cpp:1224 >> + auto thisPathLength = m_string[m_pathEnd - 1] == '/' ? m_pathEnd - 1 : m_pathEnd; > > I don't think this does what you think it does. You have a test with "http://www.webkit.org?" but when parsed and canonicalized it becomes "http://www.webkit.org/?". Add a test with a zero length path, like custom-scheme://host?a=b and http://example.com/?a=b Sorry for the confusion. I am changing thisPathLength to thisLengthOfURLIgnoringQueryAndFragments. >> Source/WTF/wtf/URL.cpp:1230 >> +bool URL::containsSameQueryParameters(const URL& otherURL) const > > Do we want http://example.com/ and http://example.com/? to be considered to have the same query parameters? Add that test. Yes, we want to consider them equal. Adding test. >> Source/WTF/wtf/URL.cpp:1254 >> + queryWithoutRemovalKeys = makeString(queryWithoutRemovalKeys, singleQueryParameter.key, '=', singleQueryParameter.value, '&'); > > This. should actually be a StringBuilder instead. It's more efficient than this, which is currently an O(n^2) algorithm because of all the recopying of the current String. Need a little bit more clarification.
Risul Islam
Comment 14 2021-07-30 16:05:10 PDT
Darin Adler
Comment 15 2021-07-30 16:13:51 PDT
Comment on attachment 434669 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434669&action=review > Source/WTF/ChangeLog:8 > + Added more functionalities for parsing URL query parameters. I don’t fully understand why adding more functions that are not yet used is important for WebKIt, but I trust there is some reason. > Source/WTF/wtf/KeyValuePair.h:67 > + template <typename K, typename V> > + bool operator==(const KeyValuePair<K, V>& other) const > + { > + return key == other.key && value == other.value; > + } This doesn’t need to be a member. Making it a free function that takes two arguments is a little better in terms of supporting conversions on the left side of the "==" so I think we might prefer that. > Source/WTF/wtf/URL.cpp:1176 > + auto thisQueryParameters = URLParser::parseURLEncodedForm(query()); > + auto otherQueryParameters = URLParser::parseURLEncodedForm(otherURL.query()); It seems to me would could implement this operation significantly more efficiently by sorting both vectors and walking through them, rather than using a hash table. > Source/WTF/wtf/URL.cpp:1185 > + HashMap<KeyValuePair<String, String>, unsigned> frequencyOfDistinctParameters; We have a data structure for this called HashCountedSet. Using that would simplify the code below. > Source/WTF/wtf/URL.cpp:1205 > + auto frequencyOfsingleQueryParameter = frequencyOfDistinctParameters.take(singleQueryParameter); Lowercase "s" here. > Source/WTF/wtf/URL.cpp:1222 > + auto thisLengthOfURLIgnoringQueryAndFragments = m_string[m_pathEnd - 1] == '/' ? m_pathEnd - 1 : m_pathEnd; > + auto otherLengthOfURLIgnoringQueryAndFragments = otherURL.m_string[otherURL.m_pathEnd - 1] == '/' ? otherURL.m_pathEnd - 1 : otherURL.m_pathEnd; > + return StringView(m_string).substring(0, thisLengthOfURLIgnoringQueryAndFragments) == StringView(otherURL.m_string).substring(0, otherLengthOfURLIgnoringQueryAndFragments); Seems like we should use a helper function rather than repeating this logic twice. Could be private or local to this file. > Source/WTF/wtf/URL.cpp:1231 > + auto queryDifference = differingQueryParameters(otherURL); > + return queryDifference.isEmpty(); No need for a local variable here. This seems like an inefficient algorithm; if we were using this somewhere performance critical we’d want to do this in a way that doesn’t allocate so much memory. > Source/WTF/wtf/URL.cpp:1242 > + if (keysToRemove.find(singleQueryParameter.key) == keysToRemove.end()) { The best idiom here is to call contains, rather than find == end. > Source/WTF/wtf/URL.cpp:1244 > + queryWithoutRemovalKeys = makeString(queryWithoutRemovalKeys, '&', singleQueryParameter.key, '=', singleQueryParameter.value); This is not an efficient way to build up a string. Concatenating each time means lots of memory allocation. We have a class named StringBuilder that implements this kind of idiom efficiently. > Source/WTF/wtf/URL.cpp:1250 > + setQuery(StringView(queryWithoutRemovalKeys)); There should not be a need to call StringView() here explicitly. > Source/WTF/wtf/URL.h:207 > + WTF_EXPORT_PRIVATE Vector<KeyValuePair<String, String>> differingQueryParameters(const URL&) const; > + WTF_EXPORT_PRIVATE bool isEqualIgnoringQueryAndFragments(const URL&) const; > + WTF_EXPORT_PRIVATE bool containsSameQueryParameters(const URL&) const; > + WTF_EXPORT_PRIVATE void removeQueryParameters(const HashSet<String>&); How did you decide that these should be members of the URL class rather than functions that take a URL? In particular, the ones that take two URLs seem like they should be functions that take two URLs.
Risul Islam
Comment 16 2021-07-30 17:03:44 PDT
Comment on attachment 434669 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434669&action=review >> Source/WTF/ChangeLog:8 >> + Added more functionalities for parsing URL query parameters. > > I don’t fully understand why adding more functions that are not yet used is important for WebKIt, but I trust there is some reason. Thank you Darin for your valuable comments and insights. We are trying to come up with a feature that restricts tracking parameters from the URL while loading or sharing by copying. Thats why we are adding the functionalities in this patch. >> Source/WTF/wtf/URL.cpp:1176 >> + auto otherQueryParameters = URLParser::parseURLEncodedForm(otherURL.query()); > > It seems to me would could implement this operation significantly more efficiently by sorting both vectors and walking through them, rather than using a hash table. Wouldn't sorting be costly in terms of time? We preferred to sacrifice some memory. Although the length of the URL and the number of the parameter is not that huge, still we can discuss about any kind of optimization. >> Source/WTF/wtf/URL.cpp:1185 >> + HashMap<KeyValuePair<String, String>, unsigned> frequencyOfDistinctParameters; > > We have a data structure for this called HashCountedSet. Using that would simplify the code below. Great idea, on to it. >> Source/WTF/wtf/URL.cpp:1205 >> + auto frequencyOfsingleQueryParameter = frequencyOfDistinctParameters.take(singleQueryParameter); > > Lowercase "s" here. Sorry for the mistake. Fixing it. >> Source/WTF/wtf/URL.cpp:1222 >> + return StringView(m_string).substring(0, thisLengthOfURLIgnoringQueryAndFragments) == StringView(otherURL.m_string).substring(0, otherLengthOfURLIgnoringQueryAndFragments); > > Seems like we should use a helper function rather than repeating this logic twice. Could be private or local to this file. We can write a static function in this file. >> Source/WTF/wtf/URL.cpp:1231 >> + return queryDifference.isEmpty(); > > No need for a local variable here. > > This seems like an inefficient algorithm; if we were using this somewhere performance critical we’d want to do this in a way that doesn’t allocate so much memory. I agree. we can directly use: return differingQueryParameters(otherURL).isEmpty(); Even discard the function. >> Source/WTF/wtf/URL.cpp:1242 >> + if (keysToRemove.find(singleQueryParameter.key) == keysToRemove.end()) { > > The best idiom here is to call contains, rather than find == end. On to it. >> Source/WTF/wtf/URL.cpp:1244 >> + queryWithoutRemovalKeys = makeString(queryWithoutRemovalKeys, '&', singleQueryParameter.key, '=', singleQueryParameter.value); > > This is not an efficient way to build up a string. Concatenating each time means lots of memory allocation. We have a class named StringBuilder that implements this kind of idiom efficiently. Using StringBuilder now. >> Source/WTF/wtf/URL.cpp:1250 >> + setQuery(StringView(queryWithoutRemovalKeys)); > > There should not be a need to call StringView() here explicitly. setQuery(StringView) takes StringView as argument. Will it type cast to StringView automatically if we provide String? >> Source/WTF/wtf/URL.h:207 >> + WTF_EXPORT_PRIVATE void removeQueryParameters(const HashSet<String>&); > > How did you decide that these should be members of the URL class rather than functions that take a URL? In particular, the ones that take two URLs seem like they should be functions that take two URLs. I am sorry I am a little bit unclear here. It would be very helpful if I could be a bit clear.
Darin Adler
Comment 17 2021-07-30 17:06:04 PDT
Comment on attachment 434669 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434669&action=review >>> Source/WTF/wtf/URL.cpp:1176 >>> + auto otherQueryParameters = URLParser::parseURLEncodedForm(otherURL.query()); >> >> It seems to me would could implement this operation significantly more efficiently by sorting both vectors and walking through them, rather than using a hash table. > > Wouldn't sorting be costly in terms of time? We preferred to sacrifice some memory. Although the length of the URL and the number of the parameter is not that huge, still we can discuss about any kind of optimization. No, I don’t think so. >>> Source/WTF/wtf/URL.cpp:1250 >>> + setQuery(StringView(queryWithoutRemovalKeys)); >> >> There should not be a need to call StringView() here explicitly. > > setQuery(StringView) takes StringView as argument. Will it type cast to StringView automatically if we provide String? Yes. >>> Source/WTF/wtf/URL.h:207 >>> + WTF_EXPORT_PRIVATE void removeQueryParameters(const HashSet<String>&); >> >> How did you decide that these should be members of the URL class rather than functions that take a URL? In particular, the ones that take two URLs seem like they should be functions that take two URLs. > > I am sorry I am a little bit unclear here. It would be very helpful if I could be a bit clear. Example:   bool areEqualIgnoringQueryAndFragments(const URL&, const URL&);
John Wilander
Comment 18 2021-07-30 17:06:56 PDT
Comment on attachment 434669 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434669&action=review Good suggestions from Darin. >> Source/WTF/ChangeLog:8 >> + Added more functionalities for parsing URL query parameters. > > I don’t fully understand why adding more functions that are not yet used is important for WebKIt, but I trust there is some reason. I'd say "Added parsing of URL query strings." > Tools/TestWebKitAPI/Tests/WTF/URL.cpp:522 > + EXPECT_EQ(url9.string(), url12.string()); You need to have a section of negative tests, for instance named URLMalformedQueryStrings. There your test things like: · http://www.webkit.org/?? · http://www.webkit.org/?/?test=test · http://www.webkit.org/?=test · http://www.webkit.org/?== · http://www.webkit.org/?=? · http://www.webkit.org/=? · http://www.webkit.org?? · http://www.webkit.org?/?test=test · http://www.webkit.org?=test · http://www.webkit.org?== · http://www.webkit.org?=? · http://www.webkit.org=?
Darin Adler
Comment 19 2021-07-30 17:10:42 PDT
Comment on attachment 434669 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434669&action=review >>>> Source/WTF/wtf/URL.cpp:1176 >>>> + auto otherQueryParameters = URLParser::parseURLEncodedForm(otherURL.query()); >>> >>> It seems to me would could implement this operation significantly more efficiently by sorting both vectors and walking through them, rather than using a hash table. >> >> Wouldn't sorting be costly in terms of time? We preferred to sacrifice some memory. Although the length of the URL and the number of the parameter is not that huge, still we can discuss about any kind of optimization. > > No, I don’t think so. We would be sorting a vector of pairs of pointers. It should not be more costly than hash tables. Both should be O(n log n). Including the speed of the memory allocation, I expect the sorting algorithm would be both faster and use less memory.
Risul Islam
Comment 20 2021-07-30 17:43:15 PDT
Comment on attachment 434669 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434669&action=review >>>> Source/WTF/ChangeLog:8 >>>> + Added more functionalities for parsing URL query parameters. >>> >>> I don’t fully understand why adding more functions that are not yet used is important for WebKIt, but I trust there is some reason. >> >> Thank you Darin for your valuable comments and insights. We are trying to come up with a feature that restricts tracking parameters from the URL while loading or sharing by copying. Thats why we are adding the functionalities in this patch. > > I'd say "Added parsing of URL query strings." Great. On it. >> Tools/TestWebKitAPI/Tests/WTF/URL.cpp:522 >> + EXPECT_EQ(url9.string(), url12.string()); > > You need to have a section of negative tests, for instance named URLMalformedQueryStrings. There your test things like: > · http://www.webkit.org/?? > · http://www.webkit.org/?/?test=test > · http://www.webkit.org/?=test > · http://www.webkit.org/?== > · http://www.webkit.org/?=? > · http://www.webkit.org/=? > · http://www.webkit.org?? > · http://www.webkit.org?/?test=test > · http://www.webkit.org?=test > · http://www.webkit.org?== > · http://www.webkit.org?=? > · http://www.webkit.org=? We assumed that the first '?' is the start of the query string. Then the query parameters are separated by '&' and finally parameter key and value is separated by '=' . Since there is no standard we got this assumption from https://www.freeformatter.com/url-parser-query-string-splitter.html There are some test cases added for the above kinds of URLs. Do we want more tests?
Risul Islam
Comment 21 2021-07-30 17:46:12 PDT
Comment on attachment 434669 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434669&action=review >>>>> Source/WTF/wtf/URL.cpp:1176 >>>>> + auto otherQueryParameters = URLParser::parseURLEncodedForm(otherURL.query()); >>>> >>>> It seems to me would could implement this operation significantly more efficiently by sorting both vectors and walking through them, rather than using a hash table. >>> >>> Wouldn't sorting be costly in terms of time? We preferred to sacrifice some memory. Although the length of the URL and the number of the parameter is not that huge, still we can discuss about any kind of optimization. >> >> No, I don’t think so. > > We would be sorting a vector of pairs of pointers. It should not be more costly than hash tables. Both should be O(n log n). Including the speed of the memory allocation, I expect the sorting algorithm would be both faster and use less memory. Clear now. Thanks Darin. >>>> Source/WTF/wtf/URL.h:207 >>>> + WTF_EXPORT_PRIVATE void removeQueryParameters(const HashSet<String>&); >>> >>> How did you decide that these should be members of the URL class rather than functions that take a URL? In particular, the ones that take two URLs seem like they should be functions that take two URLs. >> >> I am sorry I am a little bit unclear here. It would be very helpful if I could be a bit clear. > > Example: > >   bool areEqualIgnoringQueryAndFragments(const URL&, const URL&); Got it. Thank you for the clarification.
John Wilander
Comment 22 2021-07-30 18:58:39 PDT
(In reply to Risul Islam from comment #20) > Comment on attachment 434669 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=434669&action=review > > >>>> Source/WTF/ChangeLog:8 > >>>> + Added more functionalities for parsing URL query parameters. > >>> > >>> I don’t fully understand why adding more functions that are not yet used is important for WebKIt, but I trust there is some reason. > >> > >> Thank you Darin for your valuable comments and insights. We are trying to come up with a feature that restricts tracking parameters from the URL while loading or sharing by copying. Thats why we are adding the functionalities in this patch. > > > > I'd say "Added parsing of URL query strings." > > Great. On it. > > >> Tools/TestWebKitAPI/Tests/WTF/URL.cpp:522 > >> + EXPECT_EQ(url9.string(), url12.string()); > > > > You need to have a section of negative tests, for instance named URLMalformedQueryStrings. There your test things like: > > · http://www.webkit.org/?? > > · http://www.webkit.org/?/?test=test > > · http://www.webkit.org/?=test > > · http://www.webkit.org/?== > > · http://www.webkit.org/?=? > > · http://www.webkit.org/=? > > · http://www.webkit.org?? > > · http://www.webkit.org?/?test=test > > · http://www.webkit.org?=test > > · http://www.webkit.org?== > > · http://www.webkit.org?=? > > · http://www.webkit.org=? > > We assumed that the first '?' is the start of the query string. Then the > query parameters are separated by '&' and finally parameter key and value is > separated by '=' . Since there is no standard we got this assumption from > https://www.freeformatter.com/url-parser-query-string-splitter.html > > There are some test cases added for the above kinds of URLs. Do we want more > tests? Yes. Your tests are not just there to confirm intended functionality but also to defend against future mistakes. You want to explore a bunch of corner cases and make sure your code handles them gracefully. Then two years later, someone may change your code and make a simple mistake. That’s when your fleshed out test suite will pay off.
Risul Islam
Comment 23 2021-08-01 21:44:49 PDT
Darin Adler
Comment 24 2021-08-02 11:06:04 PDT
Comment on attachment 434736 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434736&action=review I appreciate the work to do a sorting-based algorithm instead of a hash table. > Source/WTF/wtf/HashFunctions.h:182 > + template<typename T, typename U> struct KeyValuePairHash { > + static unsigned hash(const KeyValuePair<T, U>& p) > + { > + return pairIntHash(DefaultHash<T>::hash(p.key), DefaultHash<U>::hash(p.value)); > + } > + static bool equal(const KeyValuePair<T, U>& a, const KeyValuePair<T, U>& b) > + { > + return DefaultHash<T>::equal(a.key, b.key) && DefaultHash<U>::equal(a.value, b.value); > + } > + static constexpr bool safeToCompareToEmptyOrDeleted = DefaultHash<T>::safeToCompareToEmptyOrDeleted && DefaultHash<U>::safeToCompareToEmptyOrDeleted; > + }; We aren’t using a KeyValuePair as a hash table key. Can we either not do this at all at this time, or do it in a separate patch? > Source/WTF/wtf/HashFunctions.h:282 > + template<typename T, typename U> struct DefaultHash<KeyValuePair<T, U>> : KeyValuePairHash<T, U> { }; We aren’t using a KeyValuePair as a hash table key. Can we either not do this at all at this time, or do it in a separate patch? > Source/WTF/wtf/HashTraits.h:431 > +using WTF::KeyValuePairHashTraits; We aren’t using a KeyValuePair as a hash table key. Can we either not do this at all at this time, or do it in a separate patch? > Source/WTF/wtf/KeyValuePair.h:67 > + template <typename K, typename V> > + bool operator==(const KeyValuePair<K, V>& other) const > + { > + return key == other.key && value == other.value; > + } I’d like to see this be done as a free function, after the class definition: template<typename KeyType, typename ValueType> constexpr bool operator=(const KeyValuePair<KeyType, ValueType>& a, const KeyValuePair<KeyType, ValueType>& b) { return a.key == b.key && a.value == b.value; } But also, this change is no longer needed in this patch. We aren’t using a KeyValuePair as a hash table key. Can we either not do this at all at this time, or do it in a separate patch? > Source/WTF/wtf/URL.cpp:1189 > + StringBuilder firstKeyValuePairAppended; > + StringBuilder secondKeyValuePairAppended; > + firstKeyValuePairAppended.append(firstKeyValuePair.key, firstKeyValuePair.value); > + secondKeyValuePairAppended.append(secondKeyValuePair.key, secondKeyValuePair.value); > + return firstKeyValuePairAppended.toString().utf8() < secondKeyValuePairAppended.toString().utf8(); 1) There is no reason to repeat this lambda twice! We can put it in a local variable and use it twice. Also, we repeat the comparison below, and should reuse for that too. 2) Concatenating strings using StringBuilder, then converting to UTF-8 is a super-expensive way to compare and makes this unnecessarily inefficient. Seems unlikely the sorting order matters at all: Code should be more like this: auto compare = [&] (const KeyValuePair<String, String>& a, const KeyValuePair<String, String>& b) { if (int result = codePointCompare(a.key, b.key)) return result; return codePointCompare(a.value, b.value); }; auto comparesLessThan = [&] (const KeyValuePair<String, String>& a, const KeyValuePair<String, String>& b) { return compare(a, b) < 0; }; std::sort(firstQueryParameters.begin(), firstQueryParameters.end(), comparesLessThan); std::sort(secondQueryParameters.begin(), secondQueryParameters.end(), comparesLessThan); > Source/WTF/wtf/URL.cpp:1219 > + if (tempParameterFromFirst.toString().utf8() < tempParameterFromSecond.toString().utf8()) { > + differingQueryParameters.append(firstQueryParameters[indexInFirstQueryParameter]); > + indexInFirstQueryParameter++; > + } else if (tempParameterFromSecond.toString().utf8() < tempParameterFromFirst.toString().utf8()) { > + differingQueryParameters.append(secondQueryParameters[indexInSecondQueryParameter]); > + indexInSecondQueryParameter++; > + } else { > + indexInFirstQueryParameter++; > + indexInSecondQueryParameter++; > + } Same issue here, should not be building temporary strings and converting to UTF-8 just compare. Also, since it’s important to use the same comparison function so we do it consistently, this makes the refactoring above more valuable. int comparison = compare(firstQueryParameters[indexInFirstQueryParameter], secondQueryParameters[indexInSecondQueryParameter]); if (comparison < 0) { ... } else if (comparison > 0) { ... } else { ... } > Source/WTF/wtf/URL.cpp:1240 > + return url.string()[url.pathEnd() - 1] == '/' ? url.pathEnd() - 1 : url.pathEnd(); I don’t understand why this "/" removal is needed. Which test case falls if we remove this logic? > Source/WTF/wtf/URL.cpp:1251 > +bool isEqualIgnoringQueryAndFragments(const URL& firstURL, const URL& secondURL) > +{ > + if (!firstURL.isValid() || !secondURL.isValid()) > + return false; > + > + auto lengthOfFirstURLIgnoringQueryAndFragments = lengthOfURLIgnoringQueryAndFragments(firstURL); > + auto secondLengthOfSecondURLIgnoringQueryAndFragments = lengthOfURLIgnoringQueryAndFragments(secondURL); > + return StringView(firstURL.string()).substring(0, lengthOfFirstURLIgnoringQueryAndFragments) == StringView(secondURL.string()).substring(0, secondLengthOfSecondURLIgnoringQueryAndFragments); > +} Seems a little peculiar that two invalid URLs always compare as unequal even if they are identical. Is that really the behavior we want? From a factoring point of view, one more function and shorter argument names can make this code much clearer and easier to read: static StringView substringIgnoringQueryAndFragments(const URL& url) { return StringView { url.string() }.left(lengthOfURLIgnoringQueryAndFragments(url)); } bool isEqualIgnoringQueryAndFragments(const URL& a, const URL& b) { return a.isValid() && b.isValid() && substringIgnoringQueryAndFragments(a) == substringIgnoringQueryAndFragments(b); }
Darin Adler
Comment 25 2021-08-02 11:15:44 PDT
Comment on attachment 434736 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434736&action=review >> Source/WTF/wtf/URL.cpp:1189 >> + return firstKeyValuePairAppended.toString().utf8() < secondKeyValuePairAppended.toString().utf8(); > > 1) There is no reason to repeat this lambda twice! We can put it in a local variable and use it twice. Also, we repeat the comparison below, and should reuse for that too. > > 2) Concatenating strings using StringBuilder, then converting to UTF-8 is a super-expensive way to compare and makes this unnecessarily inefficient. Seems unlikely the sorting order matters at all: > > Code should be more like this: > > auto compare = [&] (const KeyValuePair<String, String>& a, const KeyValuePair<String, String>& b) { > if (int result = codePointCompare(a.key, b.key)) > return result; > return codePointCompare(a.value, b.value); > }; > auto comparesLessThan = [&] (const KeyValuePair<String, String>& a, const KeyValuePair<String, String>& b) { > return compare(a, b) < 0; > }; > > std::sort(firstQueryParameters.begin(), firstQueryParameters.end(), comparesLessThan); > std::sort(secondQueryParameters.begin(), secondQueryParameters.end(), comparesLessThan); By the way, the capture here is wrong. This would be better: auto compare = [] (const KeyValuePair<String, String>& a, const KeyValuePair<String, String>& b) { if (int result = codePointCompare(a.key, b.key)) return result; return codePointCompare(a.value, b.value); }; auto comparesLessThan = [] (const KeyValuePair<String, String>& a, const KeyValuePair<String, String>& b) { return compare(a, b) < 0; };
Risul Islam
Comment 26 2021-08-02 18:23:04 PDT
Comment on attachment 434736 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434736&action=review >> Source/WTF/wtf/KeyValuePair.h:67 >> + } > > I’d like to see this be done as a free function, after the class definition: > > template<typename KeyType, typename ValueType> constexpr bool operator=(const KeyValuePair<KeyType, ValueType>& a, const KeyValuePair<KeyType, ValueType>& b) > { > return a.key == b.key && a.value == b.value; > } > > But also, this change is no longer needed in this patch. We aren’t using a KeyValuePair as a hash table key. Can we either not do this at all at this time, or do it in a separate patch? Thank you Darin for your valuable comments. All of your suggestions are great learning for me. I have removed the hash related codes. But removing this == operator overloading causes error in our test functions, specifically in EXPECT_EQ(differingQueryParameters(url1, url2), resultVector); Any workaround suggestion? >>> Source/WTF/wtf/URL.cpp:1189 >>> + return firstKeyValuePairAppended.toString().utf8() < secondKeyValuePairAppended.toString().utf8(); >> >> 1) There is no reason to repeat this lambda twice! We can put it in a local variable and use it twice. Also, we repeat the comparison below, and should reuse for that too. >> >> 2) Concatenating strings using StringBuilder, then converting to UTF-8 is a super-expensive way to compare and makes this unnecessarily inefficient. Seems unlikely the sorting order matters at all: >> >> Code should be more like this: >> >> auto compare = [&] (const KeyValuePair<String, String>& a, const KeyValuePair<String, String>& b) { >> if (int result = codePointCompare(a.key, b.key)) >> return result; >> return codePointCompare(a.value, b.value); >> }; >> auto comparesLessThan = [&] (const KeyValuePair<String, String>& a, const KeyValuePair<String, String>& b) { >> return compare(a, b) < 0; >> }; >> >> std::sort(firstQueryParameters.begin(), firstQueryParameters.end(), comparesLessThan); >> std::sort(secondQueryParameters.begin(), secondQueryParameters.end(), comparesLessThan); > > By the way, the capture here is wrong. This would be better: > > auto compare = [] (const KeyValuePair<String, String>& a, const KeyValuePair<String, String>& b) > { > if (int result = codePointCompare(a.key, b.key)) > return result; > return codePointCompare(a.value, b.value); > }; > auto comparesLessThan = [] (const KeyValuePair<String, String>& a, const KeyValuePair<String, String>& b) > { > return compare(a, b) < 0; > }; Wow, elegant code. >> Source/WTF/wtf/URL.cpp:1219 >> + } > > Same issue here, should not be building temporary strings and converting to UTF-8 just compare. Also, since it’s important to use the same comparison function so we do it consistently, this makes the refactoring above more valuable. > > int comparison = compare(firstQueryParameters[indexInFirstQueryParameter], secondQueryParameters[indexInSecondQueryParameter]); > if (comparison < 0) { > ... > } else if (comparison > 0) { > ... > } else { > ... > } Great idea. Code reuse at its best. >> Source/WTF/wtf/URL.cpp:1240 >> + return url.string()[url.pathEnd() - 1] == '/' ? url.pathEnd() - 1 : url.pathEnd(); > > I don’t understand why this "/" removal is needed. Which test case falls if we remove this logic? Thank you and Alex for this suggestion. Canonicalization removes the requirement of this log. Discarding it. >> Source/WTF/wtf/URL.cpp:1251 >> +} > > Seems a little peculiar that two invalid URLs always compare as unequal even if they are identical. Is that really the behavior we want? > > From a factoring point of view, one more function and shorter argument names can make this code much clearer and easier to read: > > static StringView substringIgnoringQueryAndFragments(const URL& url) > { > return StringView { url.string() }.left(lengthOfURLIgnoringQueryAndFragments(url)); > } > > bool isEqualIgnoringQueryAndFragments(const URL& a, const URL& b) > { > return a.isValid() && b.isValid() && substringIgnoringQueryAndFragments(a) == substringIgnoringQueryAndFragments(b); > } Done.
Risul Islam
Comment 27 2021-08-02 18:29:45 PDT
Comment on attachment 434736 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434736&action=review >>>> Source/WTF/wtf/URL.cpp:1189 >>>> + return firstKeyValuePairAppended.toString().utf8() < secondKeyValuePairAppended.toString().utf8(); >>> >>> 1) There is no reason to repeat this lambda twice! We can put it in a local variable and use it twice. Also, we repeat the comparison below, and should reuse for that too. >>> >>> 2) Concatenating strings using StringBuilder, then converting to UTF-8 is a super-expensive way to compare and makes this unnecessarily inefficient. Seems unlikely the sorting order matters at all: >>> >>> Code should be more like this: >>> >>> auto compare = [&] (const KeyValuePair<String, String>& a, const KeyValuePair<String, String>& b) { >>> if (int result = codePointCompare(a.key, b.key)) >>> return result; >>> return codePointCompare(a.value, b.value); >>> }; >>> auto comparesLessThan = [&] (const KeyValuePair<String, String>& a, const KeyValuePair<String, String>& b) { >>> return compare(a, b) < 0; >>> }; >>> >>> std::sort(firstQueryParameters.begin(), firstQueryParameters.end(), comparesLessThan); >>> std::sort(secondQueryParameters.begin(), secondQueryParameters.end(), comparesLessThan); >> >> By the way, the capture here is wrong. This would be better: >> >> auto compare = [] (const KeyValuePair<String, String>& a, const KeyValuePair<String, String>& b) >> { >> if (int result = codePointCompare(a.key, b.key)) >> return result; >> return codePointCompare(a.value, b.value); >> }; >> auto comparesLessThan = [] (const KeyValuePair<String, String>& a, const KeyValuePair<String, String>& b) >> { >> return compare(a, b) < 0; >> }; > > Wow, elegant code. One issue found here. Had to write comparesLessThan like this: auto comparesLessThan = [&compare] (const KeyValuePair<String, String>& a, const KeyValuePair<String, String>& b) { return compare(a, b) < 0; }; Otherwise, it does not recognize 'compare'.
Risul Islam
Comment 28 2021-08-03 11:11:02 PDT
Comment on attachment 434736 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434736&action=review >>> Source/WTF/wtf/KeyValuePair.h:67 >>> + } >> >> I’d like to see this be done as a free function, after the class definition: >> >> template<typename KeyType, typename ValueType> constexpr bool operator=(const KeyValuePair<KeyType, ValueType>& a, const KeyValuePair<KeyType, ValueType>& b) >> { >> return a.key == b.key && a.value == b.value; >> } >> >> But also, this change is no longer needed in this patch. We aren’t using a KeyValuePair as a hash table key. Can we either not do this at all at this time, or do it in a separate patch? > > Thank you Darin for your valuable comments. All of your suggestions are great learning for me. > I have removed the hash related codes. But removing this == operator overloading causes error in our test functions, specifically in EXPECT_EQ(differingQueryParameters(url1, url2), resultVector); > Any workaround suggestion? Making the == operator a free function was a great idea. We do need this function for the purpose of testability of differingQueryParameters(url, url) function. I am submitting the new patch with it. If we find any workaround, we may discard the == operator overloaded function quickly. >>> Source/WTF/wtf/URL.cpp:1240 >>> + return url.string()[url.pathEnd() - 1] == '/' ? url.pathEnd() - 1 : url.pathEnd(); >> >> I don’t understand why this "/" removal is needed. Which test case falls if we remove this logic? > > Thank you and Alex for this suggestion. Canonicalization removes the requirement of this log. Discarding it. Thank you and Alex for this suggestion. Canonicalization removes the requirement of this *logic. Discarded it.
Risul Islam
Comment 29 2021-08-03 12:05:46 PDT
Darin Adler
Comment 30 2021-08-03 13:08:29 PDT
Comment on attachment 434848 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434848&action=review This seems good, but there is still room for improvement. > Source/WTF/ChangeLog:3 > + Add functionalities for parsing URL query string I think we can call these "functions"; don’t need to say "functionalities". > Source/WTF/wtf/URL.cpp:1173 > + if (!firstURL.isValid() || !secondURL.isValid()) > + return { }; Not 100% sure this is the best behavior if only one of the URLs is invalid. Depends on how we intend to use these functions. Tests should cover this behavior. > Source/WTF/wtf/URL.cpp:1190 > + auto comparesLessThan = [&compare] (const KeyValuePair<String, String>& a, const KeyValuePair<String, String>& b) { I suspect just [compare] would be fine, don’t necessarily need [&compare]. > Source/WTF/wtf/URL.cpp:1236 > +static StringView substringIgnoringQueryAndFragments(const URL& url) > +{ > + return StringView(url.string()).left(url.pathEnd()); > +} > + > +bool isEqualIgnoringQueryAndFragments(const URL& a, const URL& b) > +{ > + return a.isValid() && b.isValid() && substringIgnoringQueryAndFragments(a) == substringIgnoringQueryAndFragments(b); > +} This comment of mine may have been lost in the shuffle, I did not see a response: Why do we want to always return false if one of the two URLs is invalid? That means that an invalid URL would compare as not equal to itself. Is that the best behavior for this function? Maybe if one of the URLs is invalid we should simply compare the two strings, not ignoring anything? If so, we could implement that behavior by altering substringIgnoringQueryAndFragments to return the entire string when the URL is invalid. It’s hard to judge something like this when we are just adding a function without adding any uses of it. And I don’t see test cases that cover this behavior; we need to test such edge cases. > Source/WTF/wtf/URL.cpp:1243 > + auto queryParametersList = URLParser::parseURLEncodedForm(url.query()); This local variable is not needed. Could put this expression inside the for loop. for (auto& parameter : URLParser::parseURLEncodedForm(url.query())) { ... } > Source/WTF/wtf/URL.cpp:1245 > + for (auto& singleQueryParameter : queryParametersList) { Given the context, I think this could be called parameter instead of having a 3-word name. > Source/WTF/wtf/URL.cpp:1250 > + if (!queryWithoutRemovalKeys.isEmpty()) > + queryWithoutRemovalKeys.append('&', singleQueryParameter.key, '=', singleQueryParameter.value); > + else > + queryWithoutRemovalKeys.append(singleQueryParameter.key, '=', singleQueryParameter.value); Another way to write this: queryWithoutRemovalKeys.append(queryWithoutRemovalKeys.isEmpty() ? "" : "&", singleQueryParameter.key, '=', singleQueryParameter.value); Or: auto separator = queryWithoutRemovalKeys.isEmpty() ? "" : "&"; queryWithoutRemovalKeys.append(separator singleQueryParameter.key, '=', singleQueryParameter.value); Not sure that either of my ways are better, but they are less repetitive. > Source/WTF/wtf/URL.cpp:1254 > + url.setQuery(queryWithoutRemovalKeys); I am surprised this compiles. I would have thought we’d have to call one of the toString functions on the StringBuilder. > Tools/TestWebKitAPI/Tests/WTF/URL.cpp:362 > +TEST_F(WTF_URL, URLDifferingQueryParameters) I don’t think we have enough test cases. For example, all the keys are sorted in our tests; we aren’t testing the cases where the order isn’t right. So if our sort algorithm was broken we’d never know.
Darin Adler
Comment 31 2021-08-03 13:11:49 PDT
Comment on attachment 434848 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434848&action=review >> Source/WTF/wtf/URL.cpp:1254 >> + url.setQuery(queryWithoutRemovalKeys); > > I am surprised this compiles. I would have thought we’d have to call one of the toString functions on the StringBuilder. I guess setQuery takes a StringView and StringBuilder can produce one of those without a function call, so: great, looks good.
Alex Christensen
Comment 32 2021-08-03 13:17:24 PDT
Comment on attachment 434848 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434848&action=review >> Source/WTF/wtf/URL.cpp:1173 >> + return { }; > > Not 100% sure this is the best behavior if only one of the URLs is invalid. Depends on how we intend to use these functions. > > Tests should cover this behavior. I think invalid URLs' queries are empty StringViews, so this is just a performance optimization which is probably unnecessary. >> Source/WTF/wtf/URL.cpp:1190 >> + auto comparesLessThan = [&compare] (const KeyValuePair<String, String>& a, const KeyValuePair<String, String>& b) { > > I suspect just [compare] would be fine, don’t necessarily need [&compare]. or [&]
Risul Islam
Comment 33 2021-08-03 13:48:51 PDT
Comment on attachment 434848 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434848&action=review >> Source/WTF/ChangeLog:3 >> + Add functionalities for parsing URL query string > > I think we can call these "functions"; don’t need to say "functionalities". On it. >> Source/WTF/wtf/URL.cpp:1173 >> + return { }; > > Not 100% sure this is the best behavior if only one of the URLs is invalid. Depends on how we intend to use these functions. > > Tests should cover this behavior. Earlier we returned optional null if any of the url is null. Following recommendation, we changed that to returning empty vector. URL url1 = createURL("www.webkit.org/?"); URL url2 = createURL("http://www.webkit.org/?key1=val1"); Vector<KeyValuePair<String, String>> testVector { }; EXPECT_EQ(differingQueryParameters(url1, url2), testVector); This test is covering that behavior. We can add more. >> Source/WTF/wtf/URL.cpp:1190 >> + auto comparesLessThan = [&compare] (const KeyValuePair<String, String>& a, const KeyValuePair<String, String>& b) { > > I suspect just [compare] would be fine, don’t necessarily need [&compare]. on it. >> Source/WTF/wtf/URL.cpp:1236 >> +} > > This comment of mine may have been lost in the shuffle, I did not see a response: > > Why do we want to always return false if one of the two URLs is invalid? > > That means that an invalid URL would compare as not equal to itself. Is that the best behavior for this function? Maybe if one of the URLs is invalid we should simply compare the two strings, not ignoring anything? If so, we could implement that behavior by altering substringIgnoringQueryAndFragments to return the entire string when the URL is invalid. > > It’s hard to judge something like this when we are just adding a function without adding any uses of it. > > And I don’t see test cases that cover this behavior; we need to test such edge cases. good idea, on it. Sorry, the comment got skipped in shuffling. >> Source/WTF/wtf/URL.cpp:1243 >> + auto queryParametersList = URLParser::parseURLEncodedForm(url.query()); > > This local variable is not needed. Could put this expression inside the for loop. > > for (auto& parameter : URLParser::parseURLEncodedForm(url.query())) { > ... > } on it. >> Source/WTF/wtf/URL.cpp:1245 >> + for (auto& singleQueryParameter : queryParametersList) { > > Given the context, I think this could be called parameter instead of having a 3-word name. sure. on it. >> Source/WTF/wtf/URL.cpp:1250 >> + queryWithoutRemovalKeys.append(singleQueryParameter.key, '=', singleQueryParameter.value); > > Another way to write this: > > queryWithoutRemovalKeys.append(queryWithoutRemovalKeys.isEmpty() ? "" : "&", singleQueryParameter.key, '=', singleQueryParameter.value); > > Or: > > auto separator = queryWithoutRemovalKeys.isEmpty() ? "" : "&"; > queryWithoutRemovalKeys.append(separator singleQueryParameter.key, '=', singleQueryParameter.value); > > Not sure that either of my ways are better, but they are less repetitive. on it. >> Source/WTF/wtf/URL.cpp:1254 >> + url.setQuery(queryWithoutRemovalKeys); > > I am surprised this compiles. I would have thought we’d have to call one of the toString functions on the StringBuilder. Surprisingly, it complies (checked). >> Tools/TestWebKitAPI/Tests/WTF/URL.cpp:362 >> +TEST_F(WTF_URL, URLDifferingQueryParameters) > > I don’t think we have enough test cases. For example, all the keys are sorted in our tests; we aren’t testing the cases where the order isn’t right. So if our sort algorithm was broken we’d never know. great idea, on it.
Risul Islam
Comment 34 2021-08-03 14:03:48 PDT
Comment on attachment 434848 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434848&action=review >>>> Source/WTF/wtf/URL.cpp:1173 >>>> + return { }; >>> >>> Not 100% sure this is the best behavior if only one of the URLs is invalid. Depends on how we intend to use these functions. >>> >>> Tests should cover this behavior. >> >> I think invalid URLs' queries are empty StringViews, so this is just a performance optimization which is probably unnecessary. > > Earlier we returned optional null if any of the url is null. Following recommendation, we changed that to returning empty vector. > URL url1 = createURL("www.webkit.org/?"); > URL url2 = createURL("http://www.webkit.org/?key1=val1"); > Vector<KeyValuePair<String, String>> testVector { }; > EXPECT_EQ(differingQueryParameters(url1, url2), testVector); > > This test is covering that behavior. We can add more. Alex: if both urls are invalid, then the code will automatically return empty vector. Thats fine. But if one is invalid and other is not, then discarding the initial isValid() condition will cause returning the parameters of valid url. But we agreed to return empty vector if one of the url is invalid. Do you suggest that we should return the valid url's parameter if another is invalid?
Risul Islam
Comment 35 2021-08-03 14:27:13 PDT
Comment on attachment 434848 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434848&action=review >>>>> Source/WTF/wtf/URL.cpp:1173 >>>>> + return { }; >>>> >>>> Not 100% sure this is the best behavior if only one of the URLs is invalid. Depends on how we intend to use these functions. >>>> >>>> Tests should cover this behavior. >>> >>> I think invalid URLs' queries are empty StringViews, so this is just a performance optimization which is probably unnecessary. >> >> Earlier we returned optional null if any of the url is null. Following recommendation, we changed that to returning empty vector. >> URL url1 = createURL("www.webkit.org/?"); >> URL url2 = createURL("http://www.webkit.org/?key1=val1"); >> Vector<KeyValuePair<String, String>> testVector { }; >> EXPECT_EQ(differingQueryParameters(url1, url2), testVector); >> >> This test is covering that behavior. We can add more. > > Alex: if both urls are invalid, then the code will automatically return empty vector. Thats fine. But if one is invalid and other is not, then discarding the initial isValid() condition will cause returning the parameters of valid url. But we agreed to return empty vector if one of the url is invalid. Do you suggest that we should return the valid url's parameter if another is invalid? I think we can discard the validity check in this function and get the query parameter difference results ignoring the validity. This will give us the freedom to use the returned results as per our need when called. When calling this function, we can check the urls validity and take our decision based on our use case.
Risul Islam
Comment 36 2021-08-03 14:53:10 PDT
Risul Islam
Comment 37 2021-08-03 15:23:21 PDT
EWS
Comment 38 2021-08-03 18:37:31 PDT
Committed r280626 (240239@main): <https://commits.webkit.org/240239@main> All reviewed patches have been landed. Closing bug and clearing flags on attachment 434864 [details].
Alex Christensen
Comment 39 2021-08-03 20:13:19 PDT
Comment on attachment 434864 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434864&action=review Some post-commit feedback: > Source/WTF/wtf/URL.cpp:1225 > +static StringView substringIgnoringQueryAndFragments(const URL& url) Could you move this to be a lambda inside isEqualIgnoringQueryAndFragments? It's only used in that scope. > Source/WTF/wtf/URL.h:248 > +WTF_EXPORT_PRIVATE bool isEqualIgnoringQueryAndFragments(const URL&, const URL&); I think this should either be renamed to areEqualIgnoringQueriesAndFragments or it should be made a member of the URL class. bool equalsIgnoringQueryAndFragment(const URL&) const; > Source/WTF/wtf/URL.h:249 > +WTF_EXPORT_PRIVATE void removeQueryParameters(URL&, const HashSet<String>&); Could you add another important test case to verify that we don't regress behavior in the future? Make a URL from "https://example.com/?%C3%A4=value" See how removing "%C3%A4" changes the URL. See how removing "ä" changes the URL. I think rather than passing a non-const URL as a parameter, this should be a member of the URL class. void removeQueryParametersWithKeys(const HashSet<String>& keys);
John Wilander
Comment 40 2021-08-23 15:19:15 PDT
(In reply to Alex Christensen from comment #39) > Comment on attachment 434864 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=434864&action=review > > Some post-commit feedback: > > > Source/WTF/wtf/URL.cpp:1225 > > +static StringView substringIgnoringQueryAndFragments(const URL& url) > > Could you move this to be a lambda inside isEqualIgnoringQueryAndFragments? > It's only used in that scope. > > > Source/WTF/wtf/URL.h:248 > > +WTF_EXPORT_PRIVATE bool isEqualIgnoringQueryAndFragments(const URL&, const URL&); > > I think this should either be renamed to areEqualIgnoringQueriesAndFragments > or it should be made a member of the URL class. > bool equalsIgnoringQueryAndFragment(const URL&) const; > > > Source/WTF/wtf/URL.h:249 > > +WTF_EXPORT_PRIVATE void removeQueryParameters(URL&, const HashSet<String>&); > > Could you add another important test case to verify that we don't regress > behavior in the future? > Make a URL from "https://example.com/?%C3%A4=value" > See how removing "%C3%A4" changes the URL. > See how removing "ä" changes the URL. > > I think rather than passing a non-const URL as a parameter, this should be a > member of the URL class. > void removeQueryParametersWithKeys(const HashSet<String>& keys); Could you file follow-up Bugzillas, please? I can do it too if you're short on time. I just don't want us to lose these valuable enhancements.
John Wilander
Comment 41 2021-08-23 15:19:56 PDT
(In reply to John Wilander from comment #40) > Could you file follow-up Bugzillas, please? I can do it too if you're short > on time. I just don't want us to lose these valuable enhancements. You can assign them to me.
Anne van Kesteren
Comment 42 2021-09-22 01:29:51 PDT
FWIW: removeQueryParameters looks buggy as instead of using a serializer for the resulting query you append the raw keys and values to a string. Meaning that %3D would become = and %26 becomes &, etc. unless I'm missing something. You probably want to use the internal equivalent of URLSearchParams instead.
John Wilander
Comment 43 2021-09-22 08:16:42 PDT
(In reply to Anne van Kesteren from comment #42) > FWIW: removeQueryParameters looks buggy as instead of using a serializer for > the resulting query you append the raw keys and values to a string. Meaning > that %3D would become = and %26 becomes &, etc. unless I'm missing > something. You probably want to use the internal equivalent of > URLSearchParams instead. Thank you, Anne! This needs a separate bug so I filed it here: https://bugs.webkit.org/show_bug.cgi?id=230628
Note You need to log in before you can comment on or make changes to this bug.