Use UTF-8 internally from Strings inside SQLiteStatement. Our SQLite databases use UTF-8 internally for text encoding (see we open the database with sqlite3_open_v2() and not sqlite3_open16). Previously, we were providing UTF-16 strings to SQLite, which meant it had to convert it internally to UTF-8. Also, whenever we were requesting a string from SQLite, it had to convert it from UTF-8 to UTF-16 before returning it to us. Our String constructed from returned from SQLiteStatement would also be 16-bit encoded, even if all ASCII so we were potentially wasting memory too. We now provide UTF-8 to SQLite by using StringView::utf8(). We also request UTF-8 strings from SQLite and rely on String::fromUTF8() to convert it to a WTF::String. For all ASCII characters, this means we'll end up with a 8-bit String and save memory.
Created attachment 428876 [details] Patch
Created attachment 428878 [details] Patch
Created attachment 428883 [details] Patch
Created attachment 428892 [details] Patch
Created attachment 428904 [details] Patch
Is this compatible with existing databases in files?
(In reply to Alex Christensen from comment #6) > Is this compatible with existing databases in files? Yes, there is no compatibility issue here. Our databases already use UTF-8 encoding. Now we also use UTF-8 in SQLiteStatement to reduce the amount of conversion.
Comment on attachment 428904 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=428904&action=review > Source/WebCore/platform/sql/SQLiteStatement.cpp:120 > + auto utf8Text = text.utf8(); Seems like we should consider an idiom that uses the characters in place for all-ASCII strings. No need to make a copy on the heap in that case, and that may be the common case, and the scan through the characters to check they are all ASCII is probably efficient enough that it’s an optimization on balance. We could make a function that returns a structure that sometimes contains a CString but always contains a const char* pointing to the UTF-8 characters, and a length. Or we could make a function that takes a lambda and calls it with the UTF-8 characters and length. Or we can just write it out here: CString buffer; const char* characters; unsigned length; if (text.is8Bit() && text.isAllASCII()) { characters = reinterpret_cast<const char*>(text.characters8()); length = text.length(); } else { buffer = text.utf8(); characters = utf8Buffer.data(); length = utf8Buffer.length(); } return sqlite3_bind_text(m_statement, index, characters, length, SQLITE_TRANSIENT);
(In reply to Darin Adler from comment #8) > Comment on attachment 428904 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=428904&action=review > > > Source/WebCore/platform/sql/SQLiteStatement.cpp:120 > > + auto utf8Text = text.utf8(); > > Seems like we should consider an idiom that uses the characters in place for > all-ASCII strings. No need to make a copy on the heap in that case, and that > may be the common case, and the scan through the characters to check they > are all ASCII is probably efficient enough that it’s an optimization on > balance. We could make a function that returns a structure that sometimes > contains a CString but always contains a const char* pointing to the UTF-8 > characters, and a length. Or we could make a function that takes a lambda > and calls it with the UTF-8 characters and length. Or we can just write it > out here: > > CString buffer; > const char* characters; > unsigned length; > if (text.is8Bit() && text.isAllASCII()) { > characters = reinterpret_cast<const char*>(text.characters8()); > length = text.length(); > } else { > buffer = text.utf8(); > characters = utf8Buffer.data(); > length = utf8Buffer.length(); > } > return sqlite3_bind_text(m_statement, index, characters, length, > SQLITE_TRANSIENT); Yes, I agree all ASCII is likely a common case. I'll make this optimization.
Created attachment 428936 [details] Patch
Comment on attachment 428936 [details] Patch This last minute optimization caused a regression, I must have messed up something. I'll investigate.
Created attachment 428950 [details] Patch
Committed r277667 (237869@main): <https://commits.webkit.org/237869@main> All reviewed patches have been landed. Closing bug and clearing flags on attachment 428950 [details].
<rdar://problem/78166478>