Threaded HTML Parser should be able to issue preload requests
Created attachment 184457 [details] incomplete approximation
The goal of this was to re-work the existing preload scanner to make it happy on either thread. This is still just a sketch (it doesn't compile yet), but I'm about to get to the beat-it-until-it-compiles stage. :)
Another approach is to scan the speculatively produced CompactHTMLTokens instead of re-tokenizing the input.
Yup. The goal here is to re-architect the current preload scanner logic into something which could run on another thread. This will not land as a single patch, and our final solution may not look much like this. :)
The preload scanner wants to keep a bunch of state (like if we're in templates or script tags), but yes, we will likely work from CompactHTMLTokens instead of HTMLToken on the tokenizer thread.
Created attachment 184599 [details] A more complete patch
Comment on attachment 184599 [details] A more complete patch View in context: https://bugs.webkit.org/attachment.cgi?id=184599&action=review > Source/WebCore/html/parser/HTMLDocumentParser.cpp:89 > + , m_preloaderWeakFactory(m_preloader.get()) Usually we put the factory on object that it is vending weak pointers for. In this case, we'd put the factory on HTMLResourcePreloader. That way the factory always gets destroyed with the object it's creating pointers for gets destroyed.
Comment on attachment 184599 [details] A more complete patch View in context: https://bugs.webkit.org/attachment.cgi?id=184599&action=review >> Source/WebCore/html/parser/HTMLDocumentParser.cpp:89 >> + , m_preloaderWeakFactory(m_preloader.get()) > > Usually we put the factory on object that it is vending weak pointers for. In this case, we'd put the factory on HTMLResourcePreloader. That way the factory always gets destroyed with the object it's creating pointers for gets destroyed. Yup, that's fine. I'm just not yet sure whether HTMLResourcePreloader even want's to be a separate object. :) But I'll move it in my next iteration.
Comment on attachment 184599 [details] A more complete patch View in context: https://bugs.webkit.org/attachment.cgi?id=184599&action=review > Source/WebCore/html/parser/CSSPreloadScanner.cpp:201 > + OwnPtr<PreloadRequest> request = PreloadRequest::create( > + cachedResourceRequestInitiators().css, url, baseElementURL, CachedResource::CSSStyleSheet); I didn't see the definition of PreloadRequest, but it looks like it contains a bunch of Strings. We'll probably need to call isolatedCopy to make sure its safe for transport across threads. > Source/WebCore/html/parser/HTMLPreloadScanner.cpp:56 > +static bool threadSafeEqual(StringImpl* a, StringImpl* b) > +{ > + if (a->hash() != b->hash()) > + return false; > + return StringHash::equal(a, b); > +} > + > +static bool threadSafeMatch(const String& localName, const QualifiedName& qName) > +{ > + return threadSafeEqual(localName.impl(), qName.localName().impl()); > +} Maybe we should share these functions with BackgroundHTMLParser? > Source/WebCore/html/parser/HTMLPreloadScanner.cpp:276 > + String hrefValue = StringImpl::create8BitIfPossible(iter->m_value.data(), iter->m_value.size()); I'm surprised to see "StringImpl" here. I think elsewhere we call a static function on String > Source/WebCore/html/parser/HTMLPreloadScanner.cpp:292 > + AtomicString tagName(token.name().data(), token.name().size()); Should we have an ASSERT(isMainThread()) here as well? Given that you're using the threadsafe comparison functions, should we use String instead?
The general approach in this patch looks good. The comments above are just minor nits.
(In reply to comment #9) > (From update of attachment 184599 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=184599&action=review > > > Source/WebCore/html/parser/CSSPreloadScanner.cpp:201 > > + OwnPtr<PreloadRequest> request = PreloadRequest::create( > > + cachedResourceRequestInitiators().css, url, baseElementURL, CachedResource::CSSStyleSheet); > > I didn't see the definition of PreloadRequest, but it looks like it contains a bunch of Strings. We'll probably need to call isolatedCopy to make sure its safe for transport across threads. Must have failed to add the files. And yes, it does. :) > > Source/WebCore/html/parser/HTMLPreloadScanner.cpp:56 > > +static bool threadSafeEqual(StringImpl* a, StringImpl* b) > > +{ > > + if (a->hash() != b->hash()) > > + return false; > > + return StringHash::equal(a, b); > > +} > > + > > +static bool threadSafeMatch(const String& localName, const QualifiedName& qName) > > +{ > > + return threadSafeEqual(localName.impl(), qName.localName().impl()); > > +} > > Maybe we should share these functions with BackgroundHTMLParser? I think we're going to move back to AtomicString eventually, or some more elegant solution. This is very hackish and difficult to maintain. (Very easy to add an AtomicString compare, have it do the wrong thing adn not notice). > > Source/WebCore/html/parser/HTMLPreloadScanner.cpp:276 > > + String hrefValue = StringImpl::create8BitIfPossible(iter->m_value.data(), iter->m_value.size()); > > I'm surprised to see "StringImpl" here. I think elsewhere we call a static function on String K. Was existing code, will fix. > > Source/WebCore/html/parser/HTMLPreloadScanner.cpp:292 > > + AtomicString tagName(token.name().data(), token.name().size()); > > Should we have an ASSERT(isMainThread()) here as well? Given that you're using the threadsafe comparison functions, should we use String instead? It doesn't really matter which we use. This does fewer allocs, and is safe (as each thread already has an AtomicString table) the comparison with main-thread QualifiedNames won't do what you want of course. :)
(In reply to comment #9) > (From update of attachment 184599 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=184599&action=review > > > Source/WebCore/html/parser/HTMLPreloadScanner.cpp:276 > > + String hrefValue = StringImpl::create8BitIfPossible(iter->m_value.data(), iter->m_value.size()); > > I'm surprised to see "StringImpl" here. I think elsewhere we call a static function on String Turns out this function only exists on StringImpl.
Created attachment 184710 [details] Actually works for the main thread
I'm happy to break this down however. The next step is to make this work on the background parser, but I believe that to be trivial with this new design.
Comment on attachment 184710 [details] Actually works for the main thread Attachment 184710 [details] did not pass qt-ews (qt): Output: http://queues.webkit.org/results/16116274
I'm also happy to wait to land this until folks can see the next patch too. :) Oh, and I guess I need to add these files to all teh other build systems. Sigh.
Comment on attachment 184710 [details] Actually works for the main thread Attachment 184710 [details] did not pass win-ews (win): Output: http://queues.webkit.org/results/16114316
Comment on attachment 184710 [details] Actually works for the main thread Attachment 184710 [details] did not pass efl-ews (efl): Output: http://queues.webkit.org/results/16111536
Comment on attachment 184710 [details] Actually works for the main thread Attachment 184710 [details] did not pass qt-wk2-ews (qt): Output: http://queues.webkit.org/results/16110677
Comment on attachment 184710 [details] Actually works for the main thread Attachment 184710 [details] did not pass mac-ews (mac): Output: http://queues.webkit.org/results/16112480
Comment on attachment 184710 [details] Actually works for the main thread Attachment 184710 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://queues.webkit.org/results/16113420
Created attachment 184902 [details] Now with project changes
check-webkit-style seemed to hang for me, attempting to check this patch. Hopefully the style queue will not suffer the same fate. Or maybe check-webkit-style is busted on TOT?
Comment on attachment 184902 [details] Now with project changes Attachment 184902 [details] did not pass mac-ews (mac): Output: http://queues.webkit.org/results/16161126
Comment on attachment 184902 [details] Now with project changes Attachment 184902 [details] did not pass win-ews (win): Output: http://queues.webkit.org/results/16153146
Comment on attachment 184902 [details] Now with project changes Attachment 184902 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://queues.webkit.org/results/16153148
callOnMainThread is not properly exported for Windows WTF. Not sure how we're supposed to do that, will have to investigate. I also just failed to include the xcode proj changes.
Created attachment 184904 [details] should build on Mac and win now
Comment on attachment 184904 [details] should build on Mac and win now Attachment 184904 [details] did not pass win-ews (win): Output: http://queues.webkit.org/results/16146393
Comment on attachment 184904 [details] should build on Mac and win now View in context: https://bugs.webkit.org/attachment.cgi?id=184904&action=review This looks great. > Source/WebCore/html/parser/CSSPreloadScanner.cpp:201 > + cachedResourceRequestInitiators().css, url, baseElementURL, CachedResource::CSSStyleSheet); Is cachedResourceRequestInitiators().css safe to call on a background thread? > Source/WebCore/html/parser/HTMLPreloadScanner.cpp:56 > +static bool threadSafeEqual(StringImpl* a, StringImpl* b) > +{ > + if (a->hash() != b->hash()) > + return false; > + return StringHash::equal(a, b); > +} > + > +static bool threadSafeMatch(const String& localName, const QualifiedName& qName) > +{ > + return threadSafeEqual(localName.impl(), qName.localName().impl()); > +} We should share this code instead of duplicating it. > Source/WebCore/html/parser/HTMLPreloadScanner.cpp:183 > + void issuePreloadRequest(WeakPtr<HTMLResourcePreloader>& preloader, const KURL& predictedBaseURL) const reference? > Source/WebCore/html/parser/HTMLPreloadScanner.cpp:310 > + AtomicString tagName(token.name().data(), token.name().size()); We might want to use a regular string here. It's not clear to me whether we want to boot up an atomic string table on the background thread, especially in the world where we're using a threadpool. > Source/WebCore/html/parser/HTMLResourcePreloader.cpp:44 > +static bool isStringSafeToSendToAnotherThread(const String& string) > +{ > + StringImpl* impl = string.impl(); > + if (!impl) > + return true; > + if (impl->hasOneRef()) > + return true; > + if (string.isEmpty()) > + return true; > + return false; > +} We should share this code instead of duplicating it. > Source/WebCore/html/parser/HTMLResourcePreloader.h:40 > + bool isSafeToSendToAnotherThread() const; Should this be #ifndef NDEBUG?
(In reply to comment #30) > (From update of attachment 184904 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=184904&action=review > > This looks great. Thank you for the review. > > Source/WebCore/html/parser/CSSPreloadScanner.cpp:201 > > + cachedResourceRequestInitiators().css, url, baseElementURL, CachedResource::CSSStyleSheet); > > Is cachedResourceRequestInitiators().css safe to call on a background thread? Yup: inline const CachedResourceRequestInitiators& cachedResourceRequestInitiators() { return threadGlobalData().cachedResourceRequestInitiators(); } The return value will be a per-thread AtomicString, so it may not be safe for our thread... unless we turn on AtomicStrings. :( Which I think we're eventually going to do again. :) > > Source/WebCore/html/parser/HTMLPreloadScanner.cpp:56 > > +static bool threadSafeEqual(StringImpl* a, StringImpl* b) > > +{ > > + if (a->hash() != b->hash()) > > + return false; > > + return StringHash::equal(a, b); > > +} > > + > > +static bool threadSafeMatch(const String& localName, const QualifiedName& qName) > > +{ > > + return threadSafeEqual(localName.impl(), qName.localName().impl()); > > +} > > We should share this code instead of duplicating it. Yup. I think we're going to kill this code eventually, but I agree. > > Source/WebCore/html/parser/HTMLPreloadScanner.cpp:183 > > + void issuePreloadRequest(WeakPtr<HTMLResourcePreloader>& preloader, const KURL& predictedBaseURL) > > const reference? > > > Source/WebCore/html/parser/HTMLPreloadScanner.cpp:310 > > + AtomicString tagName(token.name().data(), token.name().size()); > > We might want to use a regular string here. It's not clear to me whether we want to boot up an atomic string table on the background thread, especially in the world where we're using a threadpool. That turned out to be required for turnign it on: bug 108027. > > Source/WebCore/html/parser/HTMLResourcePreloader.cpp:44 > > +static bool isStringSafeToSendToAnotherThread(const String& string) > > +{ > > + StringImpl* impl = string.impl(); > > + if (!impl) > > + return true; > > + if (impl->hasOneRef()) > > + return true; > > + if (string.isEmpty()) > > + return true; > > + return false; > > +} > > We should share this code instead of duplicating it. > > > Source/WebCore/html/parser/HTMLResourcePreloader.h:40 > > + bool isSafeToSendToAnotherThread() const; > > Should this be #ifndef NDEBUG?
(In reply to comment #31) > (In reply to comment #30) > > (From update of attachment 184904 [details] [details]) > > View in context: https://bugs.webkit.org/attachment.cgi?id=184904&action=review > > > > > Source/WebCore/html/parser/CSSPreloadScanner.cpp:201 > > > + cachedResourceRequestInitiators().css, url, baseElementURL, CachedResource::CSSStyleSheet); > > > > Is cachedResourceRequestInitiators().css safe to call on a background thread? > > Yup: > inline const CachedResourceRequestInitiators& cachedResourceRequestInitiators() > { > return threadGlobalData().cachedResourceRequestInitiators(); > } > > The return value will be a per-thread AtomicString, so it may not be safe for our thread... unless we turn on AtomicStrings. :( Which I think we're eventually going to do again. :) Right, but then we send that string to another thread, which seems like bad news bears.
(In reply to comment #32) > (In reply to comment #31) > > (In reply to comment #30) > > > (From update of attachment 184904 [details] [details] [details]) > > > View in context: https://bugs.webkit.org/attachment.cgi?id=184904&action=review > > The return value will be a per-thread AtomicString, so it may not be safe for our thread... unless we turn on AtomicStrings. :( Which I think we're eventually going to do again. :) > > Right, but then we send that string to another thread, which seems like bad news bears. Yes, your implicit point that we're missing an isolatedCopy() here is correct. But it's probably also note safe to have this use AtomicString if we haven't called AtomicString::init(). I suspect one of the preload test is crashing (or would crash after we fix doc.write) and I didn't notice.
I bet we can just change cachedResourceRequestInitiators to be an enum.
(In reply to comment #34) > I bet we can just change cachedResourceRequestInitiators to be an enum. I'm sure we can. I think we're also fighting the rest of WebCore. :) Eventually I believe we are going to re-accept our AtomicString overlords.
Very likely. Certainly I don't think we should block this patch on converting to an enum.
Would you be willing to run the HTML parser benchmark on this change before landing to make sure we're not slowing it down? (I don't think we are, but it's good to double-check.)
Happily.
Looks like it might be a slowdown: Before: Running 1 tests Running Parser/html-parser.html (1 of 1) RESULT Parser: html-parser= 2032.85120001 ms median= 2017.63250004 ms, stdev= 73.7682286267 ms, min= 1942.15300004 ms, max= 2234.08299999 ms RESULT Parser: html-parser: JSHeap= 2119526.8 bytes median= 2164094.0 bytes, stdev= 133089.003711 bytes, min= 1663408.0 bytes, max= 2178108.0 bytes RESULT Parser: html-parser: Malloc= 40682212.4 bytes median= 40346100.0 bytes, stdev= 1194471.7551 bytes, min= 40178056.0 bytes, max= 45341632.0 bytes Finished: 45.511458 s After: Running 1 tests Running Parser/html-parser.html (1 of 1) RESULT Parser: html-parser= 2108.1245 ms median= 2116.98500009 ms, stdev= 63.8100291878 ms, min= 1990.56499999 ms, max= 2275.501 ms RESULT Parser: html-parser: JSHeap= 2004266.2 bytes median= 2037086.0 bytes, stdev= 95033.0246463 bytes, min= 1663408.0 bytes, max= 2049172.0 bytes RESULT Parser: html-parser: Malloc= 40585128.0 bytes median= 40359988.0 bytes, stdev= 1156623.92917 bytes, min= 39334792.0 bytes, max= 45296112.0 bytes Finished: 47.336479 s I'm not shocked, given all the avoiding of AtomicString (thus additional mallocs and less efficient compares). Will have to investigate.
Created attachment 187501 [details] Patch
The slowdown from the other patch is not present in this one, btw: Before: Running Parser/html-parser.html (1 of 1) RESULT Parser: html-parser= 1864.1045 ms median= 1863.81349999 ms, stdev= 10.56774216 ms, min= 1846.79899999 ms, max= 1884.556 ms RESULT Parser: html-parser: JSHeap= 2025204.0 bytes median= 2056024.0 bytes, stdev= 94620.1494687 bytes, min= 1687240.0 bytes, max= 2069624.0 bytes RESULT Parser: html-parser: Malloc= 41353497.2 bytes median= 41190176.0 bytes, stdev= 979648.045835 bytes, min= 40282784.0 bytes, max= 45394296.0 bytes Finished: 41.853883 s After: Running Parser/html-parser.html (1 of 1) RESULT Parser: html-parser= 1832.69735 ms median= 1839.06099999 ms, stdev= 14.0988717905 ms, min= 1804.87300002 ms, max= 1852.063 ms RESULT Parser: html-parser: JSHeap= 2157159.2 bytes median= 2191020.0 bytes, stdev= 95286.9975034 bytes, min= 1818320.0 bytes, max= 2200564.0 bytes RESULT Parser: html-parser: Malloc= 41040379.6 bytes median= 41106120.0 bytes, stdev= 1146075.08036 bytes, min= 40095656.0 bytes, max= 45390272.0 bytes Finished: 41.082746 s
Comment on attachment 187501 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=187501&action=review > Source/WebCore/ChangeLog:37 > + The preload scanners are handed a WeakPtr to the HTMLResourcePreloader and speak to > + it using callOnMainThread. Is this still true? > Source/WebCore/html/parser/CSSPreloadScanner.h:67 > + // Only non-null during scan() null -> zero > Source/WebCore/html/parser/HTMLDocumentParser.cpp:90 > + , m_preloader(adoptPtr(new HTMLResourcePreloader(document))) It seems like we could hold this as a member instead of via an OwnPtr. > Source/WebCore/html/parser/HTMLPreloadScanner.cpp:48 > + return token.type() == HTMLTokenTypes::StartTag; // Holy verbosity, batman. This comment probably isn't necessary. > Source/WebCore/html/parser/HTMLPreloadScanner.cpp:302 > + PreloadTask task(tagName, token.attributes()); The PreloadTask seems like a historical appendage at this point. > Source/WebCore/html/parser/HTMLResourcePreloader.cpp:59 > +CachedResourceRequest PreloadRequest::resourceRequest(Document* document) resourceRequest -> createResourceRequest? > Source/WebCore/html/parser/HTMLResourcePreloader.cpp:65 > + // FIXME: It's possible CORS should work for other request types? Yes. It should work for images too. Perhaps we should fix that in a followup patch? > Source/WebCore/html/parser/HTMLResourcePreloader.cpp:71 > +void HTMLResourcePreloader::preload(PassOwnPtr<PreloadRequest> preload) It seems odd to separate PreloadRequest::resourceRequest from this function. Is there some advantage to creating the CachedResourceRequest in a separate function? > Source/WebCore/html/parser/HTMLResourcePreloader.h:34 > +class PreloadRequest { In principle, PreloadRequest should have its own file. > Source/WebCore/html/parser/HTMLResourcePreloader.h:59 > +public: It seems odd to have some of the members be private and some be public. Perhaps these should be private and have getters/setters as appropriate? > Source/WebCore/loader/cache/CachedResourceRequest.h:29 > +#include "Element.h" Rather than adding this include here, why not add the include to the file that needs it?
Comment on attachment 187501 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=187501&action=review >> Source/WebCore/ChangeLog:37 >> + it using callOnMainThread. > > Is this still true? Fixed. >> Source/WebCore/html/parser/CSSPreloadScanner.h:67 >> + // Only non-null during scan() > > null -> zero Fixed. >> Source/WebCore/html/parser/HTMLDocumentParser.cpp:90 >> + , m_preloader(adoptPtr(new HTMLResourcePreloader(document))) > > It seems like we could hold this as a member instead of via an OwnPtr. Definitely could. The rest of the class seems to use OwnPtrs for members so I matched the style. It would be slightly faster to allocate and slighltly slower to compile. (Generally a trade we like to make.) >> Source/WebCore/html/parser/HTMLPreloadScanner.cpp:48 >> + return token.type() == HTMLTokenTypes::StartTag; // Holy verbosity, batman. > > This comment probably isn't necessary. These inlines exist because the compares are so horrible. But I agree the comment is superflous. >> Source/WebCore/html/parser/HTMLPreloadScanner.cpp:302 >> + PreloadTask task(tagName, token.attributes()); > > The PreloadTask seems like a historical appendage at this point. Completely. I think we'll kill it in a second pass. It's really just mis-named. It's a "start tag processing context" or "start tag preload scanner?" >> Source/WebCore/html/parser/HTMLResourcePreloader.cpp:59 >> +CachedResourceRequest PreloadRequest::resourceRequest(Document* document) > > resourceRequest -> createResourceRequest? Normally create* functions return heap allocated objects, is my understanding. >> Source/WebCore/html/parser/HTMLResourcePreloader.cpp:65 >> + // FIXME: It's possible CORS should work for other request types? > > Yes. It should work for images too. Perhaps we should fix that in a followup patch? Totally. This should be no change in behavior as is. There were several FIXMEs like thsi which became obvious in teh refactoring. >> Source/WebCore/html/parser/HTMLResourcePreloader.cpp:71 >> +void HTMLResourcePreloader::preload(PassOwnPtr<PreloadRequest> preload) > > It seems odd to separate PreloadRequest::resourceRequest from this function. Is there some advantage to creating the CachedResourceRequest in a separate function? The CachedResourceRequest seemed like something part of the PreloadRequest (makes the code smaller as less request-> code). But issuing the preload seemed like something the Request shouldn't know how to do. The PreloadRequest is created on the background thread, and the CachedResourceRequest and preload execued on the main thread. >> Source/WebCore/html/parser/HTMLResourcePreloader.h:34 >> +class PreloadRequest { > > In principle, PreloadRequest should have its own file. Yes. I think I will ignore that for now. :) See my wishes email. >> Source/WebCore/html/parser/HTMLResourcePreloader.h:59 >> +public: > > It seems odd to have some of the members be private and some be public. Perhaps these should be private and have getters/setters as appropriate? Yeah. It really is an awkward design. The original goal was to make the constructor take all the "Expected" arguments, and let the special cases be set manually. This also started as a struct and morphed to a class. :) I think I'll add setters, to keep with the pattern. Making the constructor take "expected" argumetns was helpful in finding FIXMEs in the CSSPreloadScanner. :) >> Source/WebCore/loader/cache/CachedResourceRequest.h:29 >> +#include "Element.h" > > Rather than adding this include here, why not add the include to the file that needs it? This file actually does need Element, due to RefPtr<Element> m_initiatorElement; but was not including it. So I fixed that. :)
Created attachment 187509 [details] Patch for landing
Comment on attachment 187509 [details] Patch for landing Clearing flags on attachment: 187509 Committed r142427: <http://trac.webkit.org/changeset/142427>
All reviewed patches have been landed. Closing bug.
(In reply to comment #43) > (From update of attachment 187501 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=187501&action=review > >> Source/WebCore/html/parser/HTMLPreloadScanner.cpp:302 > >> + PreloadTask task(tagName, token.attributes()); > > > > The PreloadTask seems like a historical appendage at this point. > > Completely. I think we'll kill it in a second pass. > > It's really just mis-named. It's a "start tag processing context" or "start tag preload scanner?" I've filed bug 109406