Bug 232380

Summary: JavaScript URL result should be treated as UTF-8 bytes
Product: WebKit Reporter: Chris Dumez <cdumez>
Component: Page LoadingAssignee: Chris Dumez <cdumez>
Status: RESOLVED FIXED    
Severity: Normal CC: achristensen, beidson, darin, ews-watchlist, ggaren, japhet, sam, webkit-bug-importer, youennf
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on:    
Bug Blocks: 232334    
Attachments:
Description Flags
Patch
none
Patch none

Description Chris Dumez 2021-10-27 07:54:12 PDT
JavaScript URL result should be treated as UTF-8 bytes:
- https://github.com/whatwg/html/pull/6781
Comment 1 Chris Dumez 2021-10-27 08:10:08 PDT
Created attachment 442591 [details]
Patch
Comment 2 Darin Adler 2021-10-27 10:06:39 PDT
Comment on attachment 442591 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=442591&action=review

> Source/WebCore/loader/DocumentWriter.cpp:80
> +    setEncoding("UTF-8"_s, false);

Ah, false. So much better than true. (Why I like enumerations, I guess?)

> Source/WebCore/loader/DocumentWriter.cpp:96
> +        if (DocumentParser* parser = m_frame->document()->parser()) {
> +            auto utf8Source = source.utf8();
> +            parser->appendBytes(*this, reinterpret_cast<const uint8_t*>(utf8Source.data()), utf8Source.length());
> +        }

If this is performance-critical, and all-ASCII is common, and the string is likely in 8-bit, then we could consider a fast path for that case. Or even come up with an alternative to String::utf8 that doesn’t allocate and free memory.

Likely, though, it’s not important to optimize.
Comment 3 Chris Dumez 2021-10-27 10:20:39 PDT
(In reply to Darin Adler from comment #2)
> Comment on attachment 442591 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=442591&action=review
> 
> > Source/WebCore/loader/DocumentWriter.cpp:80
> > +    setEncoding("UTF-8"_s, false);
> 
> Ah, false. So much better than true. (Why I like enumerations, I guess?)

Yes, it is a pre-existing issue but I'll replace with an enum class.

> 
> > Source/WebCore/loader/DocumentWriter.cpp:96
> > +        if (DocumentParser* parser = m_frame->document()->parser()) {
> > +            auto utf8Source = source.utf8();
> > +            parser->appendBytes(*this, reinterpret_cast<const uint8_t*>(utf8Source.data()), utf8Source.length());
> > +        }
> 
> If this is performance-critical, and all-ASCII is common, and the string is
> likely in 8-bit, then we could consider a fast path for that case. Or even
> come up with an alternative to String::utf8 that doesn’t allocate and free
> memory.
> 
> Likely, though, it’s not important to optimize.

I doubt this is worth optimizing at this point.
Comment 4 Chris Dumez 2021-10-27 11:05:56 PDT
Created attachment 442608 [details]
Patch
Comment 5 EWS 2021-10-27 12:05:53 PDT
Committed r284934 (243604@main): <https://commits.webkit.org/243604@main>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 442608 [details].
Comment 6 Radar WebKit Bug Importer 2021-10-27 12:06:21 PDT
<rdar://problem/84721697>