WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED LATER
92872
A smart util class that grabs latin1 string from a WTF::String
https://bugs.webkit.org/show_bug.cgi?id=92872
Summary
A smart util class that grabs latin1 string from a WTF::String
Yong Li
Reported
2012-08-01 07:30:16 PDT
In many cases, we know a WTF::String is 99% a 8bit string internally, but to avoid crash in the 1%, we have to do this for best performance: str.is8Bit() ? str.characters8() : static_cast<LChar>(str.latin1.data()) Not convenient at all. And most developer will just use str.latin1.data() when they need to get a const char*. So how about adding a util class that either has a valid CString object or uses String's 8bit data? class Latin1StringHolder { public: explicit Latin1StringHolder (String string) { if (string.is8Bit()) m_string = string; else m_latin1 = string.latin1(); } const LChar* data() const { return m_string.isNull() ? static_cast<LChar>(m_latin1.data()) : m_string.characters8; } size_t length() const { return m_string.isNull() ? m_latin1.length() : m_string.lenght(); } private: CString m_latin1; String m_string; };
Attachments
Add attachment
proposed patch, testcase, etc.
Alexey Proskuryakov
Comment 1
2012-08-01 10:56:09 PDT
Can CString be changed to use String's internal 8-bit buffer when possible without copying? Introducing a new class seems a little heavy-handed.
Michael Saboff
Comment 2
2012-08-01 11:06:39 PDT
It seems to me that the only case you always want a const char* that is Latin 1 is for debugging or logging. If the string is a 16 bit string, then there is loss of data calling String::latin1(). For the performant cases of handling strings, you need to do the is8Bit() check and then handle the 8 and 16 bit cases separately. Therefore use str.latin1.data() for those debug / logging cases.
Yong Li
Comment 3
2012-08-01 11:35:20 PDT
(In reply to
comment #2
)
> It seems to me that the only case you always want a const char* that is Latin 1 is for debugging or logging. If the string is a 16 bit string, then there is loss of data calling String::latin1(). > > For the performant cases of handling strings, you need to do the is8Bit() check and then handle the 8 and 16 bit cases separately. > > Therefore use str.latin1.data() for those debug / logging cases.
We also want to convert WTF::String to const char* when communicating with 3rd party libraries or port-specific API. For example, when forwarding a network request to libcurl, we have to convert a url from KURL to const char*. In most cases KURL has a 8-bit string internally, but it is not safe to assume that's always true. (In reply to
comment #1
)
> Can CString be changed to use String's internal 8-bit buffer when possible without copying? Introducing a new class seems a little heavy-handed.
I think it is possible: let CString have a RefPtr<StringImpl> as a member, and it also has to make sure the 8-bit buffer has a null terminator... Another idea: Add a non-const method ensure8Bit() to String, so developer can get a copy of the string, call this method, and then directly use characters8() without worrying about crashes.
Yong Li
Comment 4
2012-08-01 12:00:59 PDT
> > Another idea: Add a non-const method ensure8Bit() to String, so developer can get a copy of the string, call this method, and then directly use characters8() without worrying about crashes.
Actually it can be just one call: characters8MustBeThere(). the same way as we use String::charactersWithNullTermination().
Alexey Proskuryakov
Comment 5
2012-08-01 12:41:51 PDT
URLs are one of the very few cases where you have strings that are guaranteed to fit in Latin-1 without data loss. I agree with Michael that there doesn't seem to be a problem to address here. It's just one line of code in URL loading system, no need to over-engineer it.
Yong Li
Comment 6
2012-08-01 12:44:24 PDT
OK. closing it for now until there is more reason to do so.
Yong Li
Comment 7
2012-08-03 07:07:35 PDT
(In reply to
comment #5
)
> URLs are one of the very few cases where you have strings that are guaranteed to fit in Latin-1 without data loss. > > I agree with Michael that there doesn't seem to be a problem to address here. It's just one line of code in URL loading system, no need to over-engineer it.
How about making KURL always use CString internally?
Alexey Proskuryakov
Comment 8
2012-08-03 10:31:37 PDT
A CString cannot hold invalid URLs, because those are not necessarily ASCII or even valid UTF-16. Not sure exactly how much fidelity we need for those, but it's tricky.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug