RESOLVED FIXED 225890
Use UTF-8 internally from Strings inside SQLiteStatement
https://bugs.webkit.org/show_bug.cgi?id=225890
Summary Use UTF-8 internally from Strings inside SQLiteStatement
Chris Dumez
Reported 2021-05-17 15:00:31 PDT
Use UTF-8 internally from Strings inside SQLiteStatement. Our SQLite databases use UTF-8 internally for text encoding (see we open the database with sqlite3_open_v2() and not sqlite3_open16). Previously, we were providing UTF-16 strings to SQLite, which meant it had to convert it internally to UTF-8. Also, whenever we were requesting a string from SQLite, it had to convert it from UTF-8 to UTF-16 before returning it to us. Our String constructed from returned from SQLiteStatement would also be 16-bit encoded, even if all ASCII so we were potentially wasting memory too. We now provide UTF-8 to SQLite by using StringView::utf8(). We also request UTF-8 strings from SQLite and rely on String::fromUTF8() to convert it to a WTF::String. For all ASCII characters, this means we'll end up with a 8-bit String and save memory.
Attachments
Patch (4.63 KB, patch)
2021-05-17 15:02 PDT, Chris Dumez
no flags
Patch (4.63 KB, patch)
2021-05-17 15:32 PDT, Chris Dumez
no flags
Patch (4.71 KB, patch)
2021-05-17 15:56 PDT, Chris Dumez
ews-feeder: commit-queue-
Patch (5.88 KB, patch)
2021-05-17 16:52 PDT, Chris Dumez
no flags
Patch (6.99 KB, patch)
2021-05-17 18:41 PDT, Chris Dumez
no flags
Patch (7.26 KB, patch)
2021-05-18 07:42 PDT, Chris Dumez
no flags
Patch (7.25 KB, patch)
2021-05-18 10:09 PDT, Chris Dumez
no flags
Chris Dumez
Comment 1 2021-05-17 15:02:21 PDT
Chris Dumez
Comment 2 2021-05-17 15:32:22 PDT
Chris Dumez
Comment 3 2021-05-17 15:56:59 PDT
Chris Dumez
Comment 4 2021-05-17 16:52:20 PDT
Chris Dumez
Comment 5 2021-05-17 18:41:01 PDT
Alex Christensen
Comment 6 2021-05-17 20:13:37 PDT
Is this compatible with existing databases in files?
Chris Dumez
Comment 7 2021-05-17 20:15:36 PDT
(In reply to Alex Christensen from comment #6) > Is this compatible with existing databases in files? Yes, there is no compatibility issue here. Our databases already use UTF-8 encoding. Now we also use UTF-8 in SQLiteStatement to reduce the amount of conversion.
Darin Adler
Comment 8 2021-05-17 23:18:25 PDT
Comment on attachment 428904 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=428904&action=review > Source/WebCore/platform/sql/SQLiteStatement.cpp:120 > + auto utf8Text = text.utf8(); Seems like we should consider an idiom that uses the characters in place for all-ASCII strings. No need to make a copy on the heap in that case, and that may be the common case, and the scan through the characters to check they are all ASCII is probably efficient enough that it’s an optimization on balance. We could make a function that returns a structure that sometimes contains a CString but always contains a const char* pointing to the UTF-8 characters, and a length. Or we could make a function that takes a lambda and calls it with the UTF-8 characters and length. Or we can just write it out here: CString buffer; const char* characters; unsigned length; if (text.is8Bit() && text.isAllASCII()) { characters = reinterpret_cast<const char*>(text.characters8()); length = text.length(); } else { buffer = text.utf8(); characters = utf8Buffer.data(); length = utf8Buffer.length(); } return sqlite3_bind_text(m_statement, index, characters, length, SQLITE_TRANSIENT);
Chris Dumez
Comment 9 2021-05-18 07:38:32 PDT
(In reply to Darin Adler from comment #8) > Comment on attachment 428904 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=428904&action=review > > > Source/WebCore/platform/sql/SQLiteStatement.cpp:120 > > + auto utf8Text = text.utf8(); > > Seems like we should consider an idiom that uses the characters in place for > all-ASCII strings. No need to make a copy on the heap in that case, and that > may be the common case, and the scan through the characters to check they > are all ASCII is probably efficient enough that it’s an optimization on > balance. We could make a function that returns a structure that sometimes > contains a CString but always contains a const char* pointing to the UTF-8 > characters, and a length. Or we could make a function that takes a lambda > and calls it with the UTF-8 characters and length. Or we can just write it > out here: > > CString buffer; > const char* characters; > unsigned length; > if (text.is8Bit() && text.isAllASCII()) { > characters = reinterpret_cast<const char*>(text.characters8()); > length = text.length(); > } else { > buffer = text.utf8(); > characters = utf8Buffer.data(); > length = utf8Buffer.length(); > } > return sqlite3_bind_text(m_statement, index, characters, length, > SQLITE_TRANSIENT); Yes, I agree all ASCII is likely a common case. I'll make this optimization.
Chris Dumez
Comment 10 2021-05-18 07:42:20 PDT
Chris Dumez
Comment 11 2021-05-18 08:56:49 PDT
Comment on attachment 428936 [details] Patch This last minute optimization caused a regression, I must have messed up something. I'll investigate.
Chris Dumez
Comment 12 2021-05-18 10:09:54 PDT
EWS
Comment 13 2021-05-18 11:59:37 PDT
Committed r277667 (237869@main): <https://commits.webkit.org/237869@main> All reviewed patches have been landed. Closing bug and clearing flags on attachment 428950 [details].
Radar WebKit Bug Importer
Comment 14 2021-05-18 12:00:23 PDT
Note You need to log in before you can comment on or make changes to this bug.