Bug 232380 - JavaScript URL result should be treated as UTF-8 bytes
Summary: JavaScript URL result should be treated as UTF-8 bytes
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Page Loading (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Chris Dumez
URL:
Keywords: InRadar
Depends on:
Blocks: 232334
  Show dependency treegraph
 
Reported: 2021-10-27 07:54 PDT by Chris Dumez
Modified: 2021-10-27 12:06 PDT (History)
9 users (show)

See Also:


Attachments
Patch (5.18 KB, patch)
2021-10-27 08:10 PDT, Chris Dumez
no flags Details | Formatted Diff | Diff
Patch (7.35 KB, patch)
2021-10-27 11:05 PDT, Chris Dumez
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Chris Dumez 2021-10-27 07:54:12 PDT
JavaScript URL result should be treated as UTF-8 bytes:
- https://github.com/whatwg/html/pull/6781
Comment 1 Chris Dumez 2021-10-27 08:10:08 PDT
Created attachment 442591 [details]
Patch
Comment 2 Darin Adler 2021-10-27 10:06:39 PDT
Comment on attachment 442591 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=442591&action=review

> Source/WebCore/loader/DocumentWriter.cpp:80
> +    setEncoding("UTF-8"_s, false);

Ah, false. So much better than true. (Why I like enumerations, I guess?)

> Source/WebCore/loader/DocumentWriter.cpp:96
> +        if (DocumentParser* parser = m_frame->document()->parser()) {
> +            auto utf8Source = source.utf8();
> +            parser->appendBytes(*this, reinterpret_cast<const uint8_t*>(utf8Source.data()), utf8Source.length());
> +        }

If this is performance-critical, and all-ASCII is common, and the string is likely in 8-bit, then we could consider a fast path for that case. Or even come up with an alternative to String::utf8 that doesn’t allocate and free memory.

Likely, though, it’s not important to optimize.
Comment 3 Chris Dumez 2021-10-27 10:20:39 PDT
(In reply to Darin Adler from comment #2)
> Comment on attachment 442591 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=442591&action=review
> 
> > Source/WebCore/loader/DocumentWriter.cpp:80
> > +    setEncoding("UTF-8"_s, false);
> 
> Ah, false. So much better than true. (Why I like enumerations, I guess?)

Yes, it is a pre-existing issue but I'll replace with an enum class.

> 
> > Source/WebCore/loader/DocumentWriter.cpp:96
> > +        if (DocumentParser* parser = m_frame->document()->parser()) {
> > +            auto utf8Source = source.utf8();
> > +            parser->appendBytes(*this, reinterpret_cast<const uint8_t*>(utf8Source.data()), utf8Source.length());
> > +        }
> 
> If this is performance-critical, and all-ASCII is common, and the string is
> likely in 8-bit, then we could consider a fast path for that case. Or even
> come up with an alternative to String::utf8 that doesn’t allocate and free
> memory.
> 
> Likely, though, it’s not important to optimize.

I doubt this is worth optimizing at this point.
Comment 4 Chris Dumez 2021-10-27 11:05:56 PDT
Created attachment 442608 [details]
Patch
Comment 5 EWS 2021-10-27 12:05:53 PDT
Committed r284934 (243604@main): <https://commits.webkit.org/243604@main>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 442608 [details].
Comment 6 Radar WebKit Bug Importer 2021-10-27 12:06:21 PDT
<rdar://problem/84721697>