WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
225890
Use UTF-8 internally from Strings inside SQLiteStatement
https://bugs.webkit.org/show_bug.cgi?id=225890
Summary
Use UTF-8 internally from Strings inside SQLiteStatement
Chris Dumez
Reported
2021-05-17 15:00:31 PDT
Use UTF-8 internally from Strings inside SQLiteStatement. Our SQLite databases use UTF-8 internally for text encoding (see we open the database with sqlite3_open_v2() and not sqlite3_open16). Previously, we were providing UTF-16 strings to SQLite, which meant it had to convert it internally to UTF-8. Also, whenever we were requesting a string from SQLite, it had to convert it from UTF-8 to UTF-16 before returning it to us. Our String constructed from returned from SQLiteStatement would also be 16-bit encoded, even if all ASCII so we were potentially wasting memory too. We now provide UTF-8 to SQLite by using StringView::utf8(). We also request UTF-8 strings from SQLite and rely on String::fromUTF8() to convert it to a WTF::String. For all ASCII characters, this means we'll end up with a 8-bit String and save memory.
Attachments
Patch
(4.63 KB, patch)
2021-05-17 15:02 PDT
,
Chris Dumez
no flags
Details
Formatted Diff
Diff
Patch
(4.63 KB, patch)
2021-05-17 15:32 PDT
,
Chris Dumez
no flags
Details
Formatted Diff
Diff
Patch
(4.71 KB, patch)
2021-05-17 15:56 PDT
,
Chris Dumez
ews-feeder
: commit-queue-
Details
Formatted Diff
Diff
Patch
(5.88 KB, patch)
2021-05-17 16:52 PDT
,
Chris Dumez
no flags
Details
Formatted Diff
Diff
Patch
(6.99 KB, patch)
2021-05-17 18:41 PDT
,
Chris Dumez
no flags
Details
Formatted Diff
Diff
Patch
(7.26 KB, patch)
2021-05-18 07:42 PDT
,
Chris Dumez
no flags
Details
Formatted Diff
Diff
Patch
(7.25 KB, patch)
2021-05-18 10:09 PDT
,
Chris Dumez
no flags
Details
Formatted Diff
Diff
Show Obsolete
(6)
View All
Add attachment
proposed patch, testcase, etc.
Chris Dumez
Comment 1
2021-05-17 15:02:21 PDT
Created
attachment 428876
[details]
Patch
Chris Dumez
Comment 2
2021-05-17 15:32:22 PDT
Created
attachment 428878
[details]
Patch
Chris Dumez
Comment 3
2021-05-17 15:56:59 PDT
Created
attachment 428883
[details]
Patch
Chris Dumez
Comment 4
2021-05-17 16:52:20 PDT
Created
attachment 428892
[details]
Patch
Chris Dumez
Comment 5
2021-05-17 18:41:01 PDT
Created
attachment 428904
[details]
Patch
Alex Christensen
Comment 6
2021-05-17 20:13:37 PDT
Is this compatible with existing databases in files?
Chris Dumez
Comment 7
2021-05-17 20:15:36 PDT
(In reply to Alex Christensen from
comment #6
)
> Is this compatible with existing databases in files?
Yes, there is no compatibility issue here. Our databases already use UTF-8 encoding. Now we also use UTF-8 in SQLiteStatement to reduce the amount of conversion.
Darin Adler
Comment 8
2021-05-17 23:18:25 PDT
Comment on
attachment 428904
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=428904&action=review
> Source/WebCore/platform/sql/SQLiteStatement.cpp:120 > + auto utf8Text = text.utf8();
Seems like we should consider an idiom that uses the characters in place for all-ASCII strings. No need to make a copy on the heap in that case, and that may be the common case, and the scan through the characters to check they are all ASCII is probably efficient enough that it’s an optimization on balance. We could make a function that returns a structure that sometimes contains a CString but always contains a const char* pointing to the UTF-8 characters, and a length. Or we could make a function that takes a lambda and calls it with the UTF-8 characters and length. Or we can just write it out here: CString buffer; const char* characters; unsigned length; if (text.is8Bit() && text.isAllASCII()) { characters = reinterpret_cast<const char*>(text.characters8()); length = text.length(); } else { buffer = text.utf8(); characters = utf8Buffer.data(); length = utf8Buffer.length(); } return sqlite3_bind_text(m_statement, index, characters, length, SQLITE_TRANSIENT);
Chris Dumez
Comment 9
2021-05-18 07:38:32 PDT
(In reply to Darin Adler from
comment #8
)
> Comment on
attachment 428904
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=428904&action=review
> > > Source/WebCore/platform/sql/SQLiteStatement.cpp:120 > > + auto utf8Text = text.utf8(); > > Seems like we should consider an idiom that uses the characters in place for > all-ASCII strings. No need to make a copy on the heap in that case, and that > may be the common case, and the scan through the characters to check they > are all ASCII is probably efficient enough that it’s an optimization on > balance. We could make a function that returns a structure that sometimes > contains a CString but always contains a const char* pointing to the UTF-8 > characters, and a length. Or we could make a function that takes a lambda > and calls it with the UTF-8 characters and length. Or we can just write it > out here: > > CString buffer; > const char* characters; > unsigned length; > if (text.is8Bit() && text.isAllASCII()) { > characters = reinterpret_cast<const char*>(text.characters8()); > length = text.length(); > } else { > buffer = text.utf8(); > characters = utf8Buffer.data(); > length = utf8Buffer.length(); > } > return sqlite3_bind_text(m_statement, index, characters, length, > SQLITE_TRANSIENT);
Yes, I agree all ASCII is likely a common case. I'll make this optimization.
Chris Dumez
Comment 10
2021-05-18 07:42:20 PDT
Created
attachment 428936
[details]
Patch
Chris Dumez
Comment 11
2021-05-18 08:56:49 PDT
Comment on
attachment 428936
[details]
Patch This last minute optimization caused a regression, I must have messed up something. I'll investigate.
Chris Dumez
Comment 12
2021-05-18 10:09:54 PDT
Created
attachment 428950
[details]
Patch
EWS
Comment 13
2021-05-18 11:59:37 PDT
Committed
r277667
(
237869@main
): <
https://commits.webkit.org/237869@main
> All reviewed patches have been landed. Closing bug and clearing flags on
attachment 428950
[details]
.
Radar WebKit Bug Importer
Comment 14
2021-05-18 12:00:23 PDT
<
rdar://problem/78166478
>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug