RESOLVED LATER 92872
A smart util class that grabs latin1 string from a WTF::String
https://bugs.webkit.org/show_bug.cgi?id=92872
Summary A smart util class that grabs latin1 string from a WTF::String
Yong Li
Reported 2012-08-01 07:30:16 PDT
In many cases, we know a WTF::String is 99% a 8bit string internally, but to avoid crash in the 1%, we have to do this for best performance: str.is8Bit() ? str.characters8() : static_cast<LChar>(str.latin1.data()) Not convenient at all. And most developer will just use str.latin1.data() when they need to get a const char*. So how about adding a util class that either has a valid CString object or uses String's 8bit data? class Latin1StringHolder { public: explicit Latin1StringHolder (String string) { if (string.is8Bit()) m_string = string; else m_latin1 = string.latin1(); } const LChar* data() const { return m_string.isNull() ? static_cast<LChar>(m_latin1.data()) : m_string.characters8; } size_t length() const { return m_string.isNull() ? m_latin1.length() : m_string.lenght(); } private: CString m_latin1; String m_string; };
Attachments
Alexey Proskuryakov
Comment 1 2012-08-01 10:56:09 PDT
Can CString be changed to use String's internal 8-bit buffer when possible without copying? Introducing a new class seems a little heavy-handed.
Michael Saboff
Comment 2 2012-08-01 11:06:39 PDT
It seems to me that the only case you always want a const char* that is Latin 1 is for debugging or logging. If the string is a 16 bit string, then there is loss of data calling String::latin1(). For the performant cases of handling strings, you need to do the is8Bit() check and then handle the 8 and 16 bit cases separately. Therefore use str.latin1.data() for those debug / logging cases.
Yong Li
Comment 3 2012-08-01 11:35:20 PDT
(In reply to comment #2) > It seems to me that the only case you always want a const char* that is Latin 1 is for debugging or logging. If the string is a 16 bit string, then there is loss of data calling String::latin1(). > > For the performant cases of handling strings, you need to do the is8Bit() check and then handle the 8 and 16 bit cases separately. > > Therefore use str.latin1.data() for those debug / logging cases. We also want to convert WTF::String to const char* when communicating with 3rd party libraries or port-specific API. For example, when forwarding a network request to libcurl, we have to convert a url from KURL to const char*. In most cases KURL has a 8-bit string internally, but it is not safe to assume that's always true. (In reply to comment #1) > Can CString be changed to use String's internal 8-bit buffer when possible without copying? Introducing a new class seems a little heavy-handed. I think it is possible: let CString have a RefPtr<StringImpl> as a member, and it also has to make sure the 8-bit buffer has a null terminator... Another idea: Add a non-const method ensure8Bit() to String, so developer can get a copy of the string, call this method, and then directly use characters8() without worrying about crashes.
Yong Li
Comment 4 2012-08-01 12:00:59 PDT
> > Another idea: Add a non-const method ensure8Bit() to String, so developer can get a copy of the string, call this method, and then directly use characters8() without worrying about crashes. Actually it can be just one call: characters8MustBeThere(). the same way as we use String::charactersWithNullTermination().
Alexey Proskuryakov
Comment 5 2012-08-01 12:41:51 PDT
URLs are one of the very few cases where you have strings that are guaranteed to fit in Latin-1 without data loss. I agree with Michael that there doesn't seem to be a problem to address here. It's just one line of code in URL loading system, no need to over-engineer it.
Yong Li
Comment 6 2012-08-01 12:44:24 PDT
OK. closing it for now until there is more reason to do so.
Yong Li
Comment 7 2012-08-03 07:07:35 PDT
(In reply to comment #5) > URLs are one of the very few cases where you have strings that are guaranteed to fit in Latin-1 without data loss. > > I agree with Michael that there doesn't seem to be a problem to address here. It's just one line of code in URL loading system, no need to over-engineer it. How about making KURL always use CString internally?
Alexey Proskuryakov
Comment 8 2012-08-03 10:31:37 PDT
A CString cannot hold invalid URLs, because those are not necessarily ASCII or even valid UTF-16. Not sure exactly how much fidelity we need for those, but it's tricky.
Note You need to log in before you can comment on or make changes to this bug.