112081 – Printing to PDF should produce internal links when HTML has internal links

David Lattimore

Reported 2013-03-11 16:56:20 PDT

Currently when outputting to PDF with Skia, WebKit treats links to elsewhere in the same HTML document the same as links to elsewhere on the Web. For example, a HTML page with a table of contents that links to headings later in the page will result in a PDF with links back to the original HTML page. A nicer user experience would be for these links to be put out as internal PDF links, so if the user clicks one, they are taken to the relevant page within the PDF. Proposed solution: 1) When any element with an ID is encountered, output a named destination to the PDF. For example <h1 id="summary">Summary</h1> would result in a destination named "summary" in the PDF. Even if there are no links within the document to this ID, having this named destination allows links elsewhere on the Web to point to this part of the document (e.g. http://example.com/document.pdf#nameddest=summary). 2) When a link is encountered that points to the current document, the URL fragment (part after the "#") is taken and a link is produced that points to the corresponding named destination. Note, I already have code to do this. APIs to support this have already been added to Skia in change: https://code.google.com/p/skia/source/detail?r=8034

David Lattimore

Comment 1 2013-03-11 17:26:48 PDT

Created attachment 192604 [details] Patch

WebKit Review Bot

Comment 2 2013-03-11 18:30:24 PDT

Comment on attachment 192604 [details] Patch Attachment 192604 [details] did not pass cr-linux-debug-ews (chromium-xvfb): Output: http://webkit-commit-queue.appspot.com/results/17169360

Peter Beverloo (cr-android ews)

Comment 3 2013-03-11 18:44:01 PDT

Comment on attachment 192604 [details] Patch Attachment 192604 [details] did not pass cr-android-ews (chromium-android): Output: http://webkit-commit-queue.appspot.com/results/17121466

WebKit Review Bot

Comment 4 2013-03-11 19:06:20 PDT

Comment on attachment 192604 [details] Patch Attachment 192604 [details] did not pass chromium-ews (chromium-xvfb): Output: http://webkit-commit-queue.appspot.com/results/17013944

David Lattimore

Comment 5 2013-03-11 19:30:18 PDT

Compilation failures looks like my Skia change (landed three days ago) is not present. Is updating to the latest version of Skia a manual process?

Alexey Proskuryakov

Comment 6 2013-03-11 22:10:03 PDT

Comment on attachment 192604 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=192604&action=review Looks fine in general. > Source/WebCore/ChangeLog:8 > + This change doesn't have any tests because PDF generation doesn't have any existing tests. How did you test manually? > Source/WebCore/platform/graphics/GraphicsContext.h:271 > + void setShouldOutputInternalLinks(bool); This function is never called in this patch. > Source/WebCore/platform/graphics/GraphicsContext.h:431 > void setURLForRect(const KURL&, const IntRect&); > > + void setLinkToDestination(const char*, const IntRect&); This first parameter needs a name, it's not easily deducible from type. The parameter should be a String, it's the job of GraphicsContext to convert it to a type suitable for a specific platform. It would be better to have these functions follow a common style. I'd name the new one "setDestinationForRect". > Source/WebCore/platform/graphics/GraphicsContext.h:432 > + void addDestination(const String&, IntPoint); addDestinationAtPoint? Also, we normally pass IntPoint as const reference, too. > Source/WebCore/rendering/RenderObject.cpp:1118 > + KURL url = n->document()->completeURL(href); > + CString urlString = url.string().utf8(); The URL may be invalid, which needs to be accounted for. > Source/WebCore/rendering/RenderObject.cpp:1120 > + context->setLinkToDestination(strchr(urlString.data(), '#') + 1, pixelSnappedIntRect(rect)); A better way to determine what the identifier is is would be via KURL::fragmentIdentifier(). > Source/WebCore/rendering/RenderObject.cpp:1127 > + Node* n = this->node(); The reason to say "this->node()" is to avoid conflicts with a local variable of the same name, which you don't have here. But using abbreviations is discouraged in WebKit code, so you should. > Source/WebCore/rendering/RenderObject.cpp:1139 > + this->addDestination(graphicsContext); No reason for "this->" here. > Source/WebCore/rendering/RenderObject.cpp:1142 > + while (child) { > + child->addDestinationsRecursive(graphicsContext); Wow, adding a destination for every element with an id is quite some work. You explained the reason in the bug, but I'm not convinced. If there is already an HTML document on the Web, other pages should link to it, not to a PDF. And if it's not on the Web, a real authoring tool should be used to produce the file to publish, not a browser's "Save As" command. Also, anchors link to names, not to ids. > Source/WebCore/rendering/RenderObject.h:983 > + void addDestination(GraphicsContext*); This function is never called from outside, and should be private.

Stephen White

Comment 7 2013-03-12 08:02:38 PDT

(In reply to comment #5) > Compilation failures looks like my Skia change (landed three days ago) is not present. Is updating to the latest version of Skia a manual process? Skia is rolled manually, usually once a day, into Chrome. You can check which version Chrome is using by examining the src/DEPS file in a Chrome checkout. Skia is also rolled manually into WebKit, via the Source/WebKit/chromium/DEPS file. This happens more infrequently, and you may have to ask someone to do it for you, or add it to your change. This is the version used by the EWS bots, so that will have to happen before you can satisfy the EWS bots on this patch.

Stephen White

Comment 8 2013-03-12 08:04:18 PDT

(In reply to comment #7) > Skia is also rolled manually into WebKit, via the Source/WebKit/chromium/DEPS file. This happens more infrequently, and you may have to ask someone to do it for you, or add it to your change. This is the version used by the EWS bots, so that will have to happen before you can satisfy the EWS bots on this patch. Let me amend that: *chrome* is rolled into WebKit via Source/WebKit/chromium/DEPS, which pulls with it the current version of Skia that Chrome is using (in its src/DEPS file).

David Lattimore

Comment 9 2013-03-12 21:08:30 PDT

Created attachment 192865 [details] Patch

David Lattimore

Comment 10 2013-03-12 21:13:38 PDT

Comment on attachment 192604 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=192604&action=review >> Source/WebCore/ChangeLog:8 >> + This change doesn't have any tests because PDF generation doesn't have any existing tests. > > How did you test manually? I wrote a test HTML file with various different elements (<h1>, <p>, <a>, <div>, <td>). These had IDs. One of the <a> elements had a name instead. I then had links to each of these. I also had an element with an ID but no link to it and a link to something that didn't exist. I generated PDFs from this HTML using both a headless version of WebKit and file->print->save as PDF in Chromium. I then opened these PDFs in a text editor to make sure the links and the destinations looked right (and to make sure there was no destination defined for the element with no link to it). I then opened the PDFs in Evince, Acroread and the PDF viewer on Mac OS and confirmed that clicking the links scrolled to the appropriate page (or in the case of Evince to the actual element itself). >> Source/WebCore/platform/graphics/GraphicsContext.h:271 >> + void setShouldOutputInternalLinks(bool); > > This function is never called in this patch. Was thinking of providing a way to disable it in case some users of WebKit didn't want it. Probably less needed now that I'm only outputting named destinations that are used. Removed this. >> Source/WebCore/platform/graphics/GraphicsContext.h:431 >> + void setLinkToDestination(const char*, const IntRect&); > > This first parameter needs a name, it's not easily deducible from type. The parameter should be a String, it's the job of GraphicsContext to convert it to a type suitable for a specific platform. > > It would be better to have these functions follow a common style. I'd name the new one "setDestinationForRect". Done >> Source/WebCore/platform/graphics/GraphicsContext.h:432 >> + void addDestination(const String&, IntPoint); > > addDestinationAtPoint? Also, we normally pass IntPoint as const reference, too. Done >> Source/WebCore/rendering/RenderObject.cpp:1118 >> + CString urlString = url.string().utf8(); > > The URL may be invalid, which needs to be accounted for. Done >> Source/WebCore/rendering/RenderObject.cpp:1120 >> + context->setLinkToDestination(strchr(urlString.data(), '#') + 1, pixelSnappedIntRect(rect)); > > A better way to determine what the identifier is is would be via KURL::fragmentIdentifier(). Done >> Source/WebCore/rendering/RenderObject.cpp:1127 >> + Node* n = this->node(); > > The reason to say "this->node()" is to avoid conflicts with a local variable of the same name, which you don't have here. > > But using abbreviations is discouraged in WebKit code, so you should. This code has now been removed. >> Source/WebCore/rendering/RenderObject.cpp:1139 >> + this->addDestination(graphicsContext); > > No reason for "this->" here. Code removed. >> Source/WebCore/rendering/RenderObject.cpp:1142 >> + child->addDestinationsRecursive(graphicsContext); > > Wow, adding a destination for every element with an id is quite some work. You explained the reason in the bug, but I'm not convinced. If there is already an HTML document on the Web, other pages should link to it, not to a PDF. And if it's not on the Web, a real authoring tool should be used to produce the file to publish, not a browser's "Save As" command. > > Also, anchors link to names, not to ids. IDs are checked first. Names are checked second and only for <a> elements. http://www.w3.org/html/wg/drafts/html/master/browsers.html#scroll-to-fragid Fortunately I was able to take advantage of Document::findAnchor, so this patch no longer has to worry about those details. I've changed the patch to: 1) only output links to named destinations where such a named destination exists. 2) only output named destinations if something links to it. >> Source/WebCore/rendering/RenderObject.h:983 >> + void addDestination(GraphicsContext*); > > This function is never called from outside, and should be private. Removed method.

David Lattimore

Comment 11 2013-03-12 21:17:06 PDT

Thanks for the quick review. I think I've addressed all your comments, although it was a substantial change so I expect there will likely be additional comments on the new code. I had a look at the version of Skia being pulled by Chromium and the version of Chromium being pulled by WebKit and it looks like it'll probably be OK to me - someone else updated it a few hours ago. So fingers crossed that it passes this time.

Alexey Proskuryakov

Comment 12 2013-03-12 22:31:51 PDT

Comment on attachment 192865 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=192865&action=review Thank you, this looks much better. r- for leaving a dangling pointer. > Source/WebCore/ChangeLog:9 > + This change doesn't have any tests because PDF generation doesn't > + have any existing tests. Manual testing was performed in Chromium That said, adding a way to test PDF generation would be a valuable project. > Source/WebCore/ChangeLog:17 > + (WebCore): It's just a bug of prepare-ChangeLog that this cruft is added. The purpose of generating file and name links is to simplify adding per-function comments to ChangeLogs, and unhelpful garbage should be removed (yes, a lot of people don't do that, sadly). > Source/WebCore/dom/Document.h:1200 > + HashMap<String, Element*>& usedAnchors() { return m_usedAnchors; } Looks like this could be const (on both sides). But see other comments. > Source/WebCore/dom/Document.h:1465 > + HashMap<String, Element*> m_usedAnchors; This is very surprising to have as a permanent member of Document. This causes a serious bug: if one generates PDF twice, and an element is removed in between the calls, a dangling pointer is left in the HashMap, and then dereferenced. The HashMap should at least be cleared when no longer needed, but I think that I have a better suggestion below. > Source/WebCore/page/FrameView.cpp:3434 > + p->addDestinationAtPoint(i->key, IntPoint(0, y)); Why no horizontal offset? The user can have the document zoomed in, and then scrolling to (0, y) may not reveal the element. Can the result of absoluteBoundingBoxRect() have negative coordinates? Perhaps we should clamp to 0. Ditto for being out of page bounds on the other side. > Source/WebCore/rendering/RenderObject.cpp:1124 > + document()->addUsedAnchor(name, element); Would it work if we just called addDestinationAtPoint here? The same destination could be added several times, but I don't see any reason why that would be a problem. We are just using an intermediate map instead of one in PDF context. > Source/WebCore/rendering/RenderObject.cpp:1129 > + return; > + } > + } > + context->setURLForRect(url, pixelSnappedIntRect(rect)); Unsure if it's meaningful to add a same-document URL if anchor couldn't be found. Perhaps the early return should be outside "if (element)"?

David Lattimore

Comment 13 2013-03-12 23:30:09 PDT

Comment on attachment 192865 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=192865&action=review >> Source/WebCore/rendering/RenderObject.cpp:1124 >> + document()->addUsedAnchor(name, element); > > Would it work if we just called addDestinationAtPoint here? The same destination could be added several times, but I don't see any reason why that would be a problem. We are just using an intermediate map instead of one in PDF context. Unfortunately not. GraphicsContext::addDestinationAtPoint needs to be called on the GraphicsContext for the page containing the destination. If we called it here, the link would only work if the destination was on the same page as the link. It's currently being called for all pages and then dropped when the point falls outside the bounds of the page (i.e. all but the correct page). I'll have a look and see if I can find another way of structuring it without storing stuff on Document. If you have any ideas, let me know. >> Source/WebCore/rendering/RenderObject.cpp:1129 >> + context->setURLForRect(url, pixelSnappedIntRect(rect)); > > Unsure if it's meaningful to add a same-document URL if anchor couldn't be found. Perhaps the early return should be outside "if (element)"? The patch as it currently stands puts out a regular link - i.e. falls back to the old behavior. I'll move the return, which will mean same-document URLs that are broken will become non-links.

Alexey Proskuryakov

Comment 14 2013-03-13 08:35:47 PDT

> GraphicsContext::addDestinationAtPoint needs to be called on the GraphicsContext for the page containing the destination. I'm confused. Do we have multiple GraphicsContexts when printing to a PDF? But even then, aren't all internal links in-document?

David Lattimore

Comment 15 2013-03-14 22:17:40 PDT

Created attachment 193232 [details] Patch

David Lattimore

Comment 16 2013-03-14 22:21:19 PDT

Comment on attachment 192865 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=192865&action=review >> Source/WebCore/ChangeLog:9 >> + have any existing tests. Manual testing was performed in Chromium > > That said, adding a way to test PDF generation would be a valuable project. Agreed. Not sure I have sufficient familiarity with WebKit to know where and how to write such a test. Ideally it'd run WebKit headlessly on an input HTML file, render to PDF then make assertions about the contents of the PDF. I've got no idea how straightforward it would be to run WebKit in this manner. As an interim measure, is there somewhere I can put my test file, in case the code is changed later and someone wants to manually test that it still works? Or should I just attach it to this bug? >> Source/WebCore/ChangeLog:17 >> + (WebCore): > > It's just a bug of prepare-ChangeLog that this cruft is added. The purpose of generating file and name links is to simplify adding per-function comments to ChangeLogs, and unhelpful garbage should be removed (yes, a lot of people don't do that, sadly). Makes sense. Removed. >> Source/WebCore/dom/Document.h:1200 >> + HashMap<String, Element*>& usedAnchors() { return m_usedAnchors; } > > Looks like this could be const (on both sides). But see other comments. Code removed. >> Source/WebCore/dom/Document.h:1465 >> + HashMap<String, Element*> m_usedAnchors; > > This is very surprising to have as a permanent member of Document. This causes a serious bug: if one generates PDF twice, and an element is removed in between the calls, a dangling pointer is left in the HashMap, and then dereferenced. > > The HashMap should at least be cleared when no longer needed, but I think that I have a better suggestion below. I've removed all code from Document. >> Source/WebCore/page/FrameView.cpp:3434 >> + p->addDestinationAtPoint(i->key, IntPoint(0, y)); > > Why no horizontal offset? The user can have the document zoomed in, and then scrolling to (0, y) may not reveal the element. > > Can the result of absoluteBoundingBoxRect() have negative coordinates? Perhaps we should clamp to 0. Ditto for being out of page bounds on the other side. I now check that the point is inside the page's rectangle and ignore it if it isn't. I think Skia does this as well, but it doesn't hurt to do it in WebKit as well. >>> Source/WebCore/rendering/RenderObject.cpp:1124 >>> + document()->addUsedAnchor(name, element); >> >> Would it work if we just called addDestinationAtPoint here? The same destination could be added several times, but I don't see any reason why that would be a problem. We are just using an intermediate map instead of one in PDF context. > > Unfortunately not. GraphicsContext::addDestinationAtPoint needs to be called on the GraphicsContext for the page containing the destination. If we called it here, the link would only work if the destination was on the same page as the link. It's currently being called for all pages and then dropped when the point falls outside the bounds of the page (i.e. all but the correct page). I'll have a look and see if I can find another way of structuring it without storing stuff on Document. If you have any ideas, let me know. You asked if a new GraphicsContext is created for each page... I'd assumed that one was - the headless code I'm using does that, but I see that PrintContext::spoolAllPagesWithBoundaries doesn't do that - it instead reuses the same GraphicsContext. In any case, at the Skia level, we definitely have a different SkPDFDevice/SkCanvas for each page and the destinations need to get written to the correct SkPDFDevice for the page that the destination appears on. So unless there is some way to get access to a random page's SkCanvas, the destinations need to be put out while the page containing those destinations is being rendered. Instead, I've added code to FrameView that finds links, gets the elements they point to and outputs them as destinations if they're on the current page.

Alexey Proskuryakov

Comment 17 2013-03-14 23:13:21 PDT

> Agreed. Not sure I have sufficient familiarity with WebKit to know where and how to write such a test. Ideally it'd run WebKit headlessly on an input HTML file, render to PDF then make assertions about the contents of the PDF. I've got no idea how straightforward it would be to run WebKit in this manner. The hardest part here is to make assertions about contents of produced PDFs, and to a lesser degree, adding plumbing to produce a PDFD. Headless operation is what regression tests do normally, and we have 30000 of them. Tests in LayoutTests/webarchive directory are somewhat similar, although their way to assert correctness is by comparing produced binary file to expected one, which would not work so great for PDFs. > As an interim measure, is there somewhere I can put my test file, in case the code is changed later and someone wants to manually test that it still works? Or should I just attach it to this bug? There is a ManualTests directory at the root of WebKit checkout, but I would discourage using it. No one ever runs these tests, and adding something that requires multiple manual steps just makes existing tests slightly less useful. > In any case, at the Skia level, we definitely have a different SkPDFDevice/SkCanvas for each page and the destinations need to get written to the correct SkPDFDevice for the page that the destination appears on. So unless there is some way to get access to a random page's SkCanvas, the destinations need to be put out while the page containing those destinations is being rendered. Are you talking about pages in a produced PDF, or about subframes? There is something here that feels wrong, but I can't put my finger on it.

David Lattimore

Comment 18 2013-03-15 00:05:01 PDT

(In reply to comment #17) > > Agreed. Not sure I have sufficient familiarity with WebKit to know where and how to write such a test. Ideally it'd run WebKit headlessly on an input HTML file, render to PDF then make assertions about the contents of the PDF. I've got no idea how straightforward it would be to run WebKit in this manner. > > The hardest part here is to make assertions about contents of produced PDFs, and to a lesser degree, adding plumbing to produce a PDFD. Headless operation is what regression tests do normally, and we have 30000 of them. The assertions I think I can manage, since both the links and the destinations end up in the PDF in plain text. It's the plumbing parts that I'm unsure of. > > In any case, at the Skia level, we definitely have a different SkPDFDevice/SkCanvas for each page and the destinations need to get written to the correct SkPDFDevice for the page that the destination appears on. So unless there is some way to get access to a random page's SkCanvas, the destinations need to be put out while the page containing those destinations is being rendered. > > Are you talking about pages in a produced PDF, or about subframes? There is something here that feels wrong, but I can't put my finger on it. AFAIK there should be one SkPDFDevice and on SkCanvas per page in the produced PDF. I've got some ideas that I'll investigate on Monday (Sydney time) that may mean it would work to add the destinations from RenderObject (like you originally suggested).

David Lattimore

Comment 19 2013-03-17 17:19:27 PDT

(In reply to comment #17) > > In any case, at the Skia level, we definitely have a different SkPDFDevice/SkCanvas for each page and the destinations need to get written to the correct SkPDFDevice for the page that the destination appears on. So unless there is some way to get access to a random page's SkCanvas, the destinations need to be put out while the page containing those destinations is being rendered. > > Are you talking about pages in a produced PDF, or about subframes? There is something here that feels wrong, but I can't put my finger on it. I thought it might be possible to call addDestinationAtPoint from RenderObject::addPDFURLRect. I've now tried it and it doesn't work. Destinations only get written if they are on the same page as something that links to them. I thought that addPDFURLRect was getting called for each page and for each URL, but, probably fortunately for performance, it isn't. I did some investigation as to where the GraphicsContext is getting created. It seems there is a new GraphicsContext for each page, created (at least for Chromium) in WebFrameImpl::printPage. Given the above, I don't see any obvious options other than what I'm doing in my latest patch. Is it a problem that I'm traversing the render tree for each page? The paint code already does that. One alternative to that would be to traverse it only for the first page, then store a HashMap on FrameView from names to elements that have been linked to. This could then be reused for subsequent pages.

Alexey Proskuryakov

Comment 20 2013-03-18 22:09:12 PDT

> Is it a problem that I'm traversing the render tree for each page? Well, this effectively adds an O(n^2) algorithm. Can you measure how long it takes to generate a PDF for single page HTML5 spec with and without this patch? IIRC Safari prints the whole document at once, not page by page (except when generating preview). So, ideally I'm interested in the above measurement for Safari.

David Lattimore

Comment 21 2013-03-20 20:37:03 PDT

Created attachment 194177 [details] Patch

David Lattimore

Comment 22 2013-03-20 20:54:20 PDT

Does Safari use Skia for generating PDFs? When I build WebKit and run Safari (using Tools/Scripts/run-safari) then print to PDF, the resulting PDF appears as though my change was not present. It seems like the code in RenderObject is being run but not the code in GraphicsContextSkia. In any case, I've updated my change so that it only traverses the DOM once and caches the linked elements for use on subsequent pages.

Tim Horton

Comment 23 2013-03-20 20:56:17 PDT

(In reply to comment #22) > Does Safari use Skia for generating PDFs? When I build WebKit and run Safari (using Tools/Scripts/run-safari) then print to PDF, the resulting PDF appears as though my change was not present. It seems like the code in RenderObject is being run but not the code in GraphicsContextSkia. No, Safari does not use Skia, it uses CG. Skia is a Chromium-port thing.

Tim Horton

Comment 24 2013-03-21 11:30:28 PDT

Comment on attachment 194177 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=194177&action=review > Source/WebCore/page/FrameView.cpp:3490 > + if (!m_linkedDestinationsValid) { > + collectLinkedDestinations(node); > + m_linkedDestinationsValid = true; > + } What if you print, the page is mutated via JavaScript, and then you print again? m_linkedDestinations will be stale, right? And contain stale pointers to Elements? Which you will then dereference below? > Source/WebCore/page/FrameView.cpp:3495 > + IntRect bb = it->value->renderer()->absoluteBoundingBoxRect(); > + IntPoint point(bb.x(), bb.y()); Please expand "bb". > Source/WebCore/page/FrameView.cpp:3496 > + if (pageRect.contains(point)) What if the bounds are shoved off the page but the bounding box intersects the page? You probably want "intersection of pageRect and 'bb' is not empty", no? > Source/WebCore/platform/graphics/skia/GraphicsContextSkia.cpp:1113 > + return true; Shouldn't this only return true if the graphics context is backed by e.g. a PDF? Certainly you don't want to do all this work for a bitmap Skia context...

David Lattimore

Comment 25 2013-03-21 17:54:11 PDT

Created attachment 194404 [details] Patch

David Lattimore

Comment 26 2013-03-21 17:57:50 PDT

Comment on attachment 194177 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=194177&action=review >> Source/WebCore/page/FrameView.cpp:3490 >> + } > > What if you print, the page is mutated via JavaScript, and then you print again? m_linkedDestinations will be stale, right? And contain stale pointers to Elements? Which you will then dereference below? I'd wrongly thought that FrameView only lived for the duration of the print. I've moved this code to PrintContext which should fix this. >> Source/WebCore/page/FrameView.cpp:3495 >> + IntPoint point(bb.x(), bb.y()); > > Please expand "bb". Done. >> Source/WebCore/page/FrameView.cpp:3496 >> + if (pageRect.contains(point)) > > What if the bounds are shoved off the page but the bounding box intersects the page? You probably want "intersection of pageRect and 'bb' is not empty", no? Good point. Done. >> Source/WebCore/platform/graphics/skia/GraphicsContextSkia.cpp:1113 >> + return true; > > Shouldn't this only return true if the graphics context is backed by e.g. a PDF? Certainly you don't want to do all this work for a bitmap Skia context... I now check if it's printing.

Alexey Proskuryakov

Comment 27 2013-03-26 14:23:04 PDT

Comment on attachment 194404 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=194404&action=review > In any case, at the Skia level, we definitely have a different SkPDFDevice/SkCanvas for each page and the destinations need to get written to the correct SkPDFDevice for the page that the destination appears on. I don't understand how this can work. If every page is separate, how do cross-page links work at all? Surely there is an object in Skia that represents the whole multi-page document? For example in CG, it's a CGPDFContext, and one calls CGPDFContextBeginPage when moving to the next page. > Source/WebCore/platform/graphics/skia/GraphicsContextSkia.cpp:1113 > + return platformContext()->printing(); I'm not sure if this is a good place to make such checks. A GraphicsContext may not have any idea about whether it's printing or not, it's the responsibility of both higher level and lower level code.

Alexey Proskuryakov

Comment 28 2013-03-26 14:27:18 PDT

Created attachment 195162 [details] with non-functional CG support I tried making a patch that works with CG too, but for some reason, a resulting document has no clickable links at all. Perhaps rects are represented in a way that's different from how it's done elsewhere?

David Lattimore

Comment 29 2013-04-01 19:33:28 PDT

Sorry about the slow reply - we had a 4 day weekend here in Australia. (In reply to comment #27) > (From update of attachment 194404 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=194404&action=review > > > In any case, at the Skia level, we definitely have a different SkPDFDevice/SkCanvas for each page and the destinations need to get written to the correct SkPDFDevice for the page that the destination appears on. > > I don't understand how this can work. If every page is separate, how do cross-page links work at all? Surely there is an object in Skia that represents the whole multi-page document? For example in CG, it's a CGPDFContext, and one calls CGPDFContextBeginPage when moving to the next page. There's SkPDFDocument. It represents the whole PDF document. The destinations still need to be written to the correct SkPDFDevice for the page they're on though. > > Source/WebCore/platform/graphics/skia/GraphicsContextSkia.cpp:1113 > > + return platformContext()->printing(); > > I'm not sure if this is a good place to make such checks. A GraphicsContext may not have any idea about whether it's printing or not, it's the responsibility of both higher level and lower level code. So I should just revert this to return true here? Now that adding the destinations is being done in PrintContext, it will already only be getting run if we're printing. That only leaves linking to destinations which doesn't really do any significant amount of work, so is probably OK to do even if the underlying Skia device doesn't support it. (In reply to comment #28) > Created an attachment (id=195162) [details] > with non-funcitonal CG support > > I tried making a patch that works with CG too, but for some reason, a resulting document has no clickable links at all. Perhaps rects are represented in a way that's different from how it's done elsewhere? Presumably it would need to apply transforms in a similar way to what the CG implementation of GraphicsContext::setURLForRect does. That said, I tried doing that and it still didn't work. Looking at the generated PDF in a text editor, neither the destinations nor the links to the destinations are being put out. I've got no idea why not.

Darin Adler

Comment 30 2013-04-08 18:07:52 PDT

We used to do this in Safari and Core Graphics. Did the code to do this get removed at some point?

Darin Adler

Comment 31 2013-04-08 18:08:13 PDT

Oh, I see “internal links!” Got it.

David Lattimore

Comment 32 2013-04-08 18:14:08 PDT

I'm going to continue with trying to get this change into Blink. https://code.google.com/p/chromium/issues/detail?id=157266 If anyone is interested in seeing this feature in WebKit with CG, feel free to pick this up. Thanks Alexey for the helpful comments.

Alexey Proskuryakov

Comment 33 2013-04-08 19:04:09 PDT

Comment on attachment 194404 [details] Patch r- to get this out of review queue, as we don't support Skia in WebKit any more. We should certainly finish this.

Alexey Proskuryakov

Comment 34 2013-05-10 10:47:08 PDT

<rdar://problem/8239740>

Tim Horton

Comment 35 2015-08-16 17:24:29 PDT

Created attachment 259132 [details] patch, still doesn't work Rebaselined the patch, and added the requisite transforms to the CG code. It still doesn't work though, even though everything we're handing to CG seems totally sane. Not sure what's up.

Tim Horton

Comment 36 2016-10-11 01:06:28 PDT

I believe the problem is WKPrintingView _drawPDFDocument:page:atPoint:'s implementation of annotation copying.

Tim Horton

Comment 37 2016-10-11 01:14:09 PDT

Yup! This should be pretty easy now, knowing that.

Tim Horton

Comment 38 2016-11-02 00:20:30 PDT

Created attachment 293645 [details] Significant progress; actually works now

Tim Horton

Comment 39 2016-11-02 00:21:16 PDT

Comment on attachment 293645 [details] Significant progress; actually works now Not ready for review, though.

Tim Horton

Comment 40 2016-11-02 18:24:13 PDT

Created attachment 293729 [details] Patch

Tim Horton

Comment 41 2016-11-02 18:26:07 PDT

Created attachment 293730 [details] Patch

Tim Horton

Comment 42 2016-11-02 18:41:16 PDT

Created attachment 293732 [details] Patch

Simon Fraser (smfr)

Comment 43 2016-11-02 19:14:31 PDT

Comment on attachment 293732 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=293732&action=review > Source/WebCore/page/PrintContext.cpp:252 > +void PrintContext::collectLinkedDestinations(Node* node) Node& > Source/WebCore/page/PrintContext.cpp:255 > + for (Node* child = node->firstChild(); child; child = child->nextSibling()) > + collectLinkedDestinations(child); Hmm, can we use tree traversal iterator thingies? > Source/WebCore/platform/graphics/GraphicsContext.h:474 > + void setDestinationForRect(const String& name, const IntRect&); > + void addDestinationAtPoint(const String& name, const IntPoint&); The rest of this API is FloatPoint/FloatRect, other than the URL one above. > Source/WebCore/platform/graphics/GraphicsContext.h:582 > + bool supportsInternalLinks(); const ? > Source/WebCore/rendering/RenderObject.cpp:621 > + if (paintInfo.context().supportsInternalLinks() && url.hasFragmentIdentifier() && equalIgnoringFragmentIdentifier(url, document.baseURL())) { > + String name = url.fragmentIdentifier(); > + if (document.findAnchor(name)) > + paintInfo.context().setDestinationForRect(name, snappedIntRect(urlRect)); > + return; > + } This looks a lot like the code in PrintContext::collectLinkedDestinations(). Share it?

Tim Horton

Comment 44 2016-11-03 13:05:09 PDT

Created attachment 293791 [details] Patch

Simon Fraser (smfr)

Comment 45 2016-11-03 13:19:11 PDT

Comment on attachment 293791 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=293791&action=review > Source/WebCore/page/PrintContext.h:110 > + Optional<HashMap<String, Element*>> m_linkedDestinations; It makes me slightly nervous that this has raw Element*. Can script run while this thing is live? Also, not sure that Optional<> gains you much over unique_ptr<>

Tim Horton

Comment 46 2016-11-03 13:29:59 PDT

(In reply to comment #45) > Comment on attachment 293791 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=293791&action=review > > > Source/WebCore/page/PrintContext.h:110 > > + Optional<HashMap<String, Element*>> m_linkedDestinations; > > It makes me slightly nervous that this has raw Element*. Can script run > while this thing is live? It shouldn't be able to; it lives just during PrintContext between begin/end, which looks like only encompasses painting, but I could be wrong. It seems entirely reasonable to ref these, anyway. > Also, not sure that Optional<> gains you much over unique_ptr<> Sure, but is there any reason to prefer unique_ptr?

Tim Horton

Comment 47 2016-11-03 14:11:31 PDT

https://trac.webkit.org/changeset/208347

Darin Adler

Comment 48 2016-11-04 09:53:34 PDT

Comment on attachment 293645 [details] Significant progress; actually works now View in context: https://bugs.webkit.org/attachment.cgi?id=293645&action=review Sorry, I missed the final work on this and maybe things changed a lot in the patch that you landed, but here were my comments on an earlier draft. > Source/WebCore/page/PrintContext.cpp:39 > + , m_linkedDestinationsValid(false) In new code we should do these in the class definitions. > Source/WebCore/page/PrintContext.cpp:196 > + if (ctx.supportsInternalLinks()) > + outputLinkedDestinations(ctx, m_frame->document(), pageRect); Might be clearer to have the outputLinkedDestinations functions check the boolean instead. > Source/WebCore/page/PrintContext.cpp:208 > + if (ctx.supportsInternalLinks()) > + outputLinkedDestinations(ctx, m_frame->document(), rect); Ditto. > Source/WebCore/page/PrintContext.cpp:257 > +void PrintContext::collectLinkedDestinations(Node* node) Since this function dereferences the node pointer, it should take a reference, not a pointer. I suggest Document& and using a non-recursive algorithm. > Source/WebCore/page/PrintContext.cpp:259 > + // FIXME: We have better ways to iterate now? We do. In particular, we should do non-recursive iteration. Since we are only interested in elements we can use the elements iterator. > Source/WebCore/page/PrintContext.cpp:266 > + const AtomicString& href = static_cast<Element*>(node)->getAttribute(HTMLNames::hrefAttr); downcast is preferred over static_cast for casts from node to element. auto& would be nice here. attributeWithoutSynchronization is better here. > Source/WebCore/page/PrintContext.cpp:270 > + Document& document = node->document(); auto& would be nice here. > Source/WebCore/page/PrintContext.cpp:278 > + m_linkedDestinations.set(name, element); Probably want to use add instead of set. Depends really on what we want to do if the same name occurs more than once. > Source/WebCore/page/PrintContext.cpp:282 > +void PrintContext::outputLinkedDestinations(GraphicsContext& graphicsContext, Node* node, const IntRect& pageRect) I suggest taking a Document& instead of a Node*. > Source/WebCore/page/PrintContext.cpp:287 > + if (!m_linkedDestinationsValid) { > + collectLinkedDestinations(node); > + m_linkedDestinationsValid = true; > + } Seems peculiar that if m_linkedDestinationsValid is true that we ignore the node; somehow we know it’s the same as last time? Why a pointer for the node? Can it be null? > Source/WebCore/page/PrintContext.cpp:290 > + HashMap<String, Element*>::const_iterator end = m_linkedDestinations.end(); > + for (HashMap<String, Element*>::const_iterator it = m_linkedDestinations.begin(); it != end; ++it) { Modern for loop would work well here and eliminate those long type names for the iterator. > Source/WebCore/page/PrintContext.cpp:292 > + IntRect boundingBox = it->value->renderer()->absoluteBoundingBoxRect(); > + if (pageRect.intersects(boundingBox)) { Not sure we need a local variable here; might read more nicely without it. Also, what guarantees the renderers are all non-null?

Tim Horton

Comment 49 2016-11-04 12:27:49 PDT

(In reply to comment #48) > Comment on attachment 293645 [details] > Significant progress; actually works now > > View in context: > https://bugs.webkit.org/attachment.cgi?id=293645&action=review > > Sorry, I missed the final work on this and maybe things changed a lot in the > patch that you landed, but here were my comments on an earlier draft. OK! Let me see what I fixed; it looks like this one was before I really started ripping it up, so hopefully most are fixed. > > Source/WebCore/page/PrintContext.cpp:39 > > + , m_linkedDestinationsValid(false) > > In new code we should do these in the class definitions. This is gone. > > Source/WebCore/page/PrintContext.cpp:196 > > + if (ctx.supportsInternalLinks()) > > + outputLinkedDestinations(ctx, m_frame->document(), pageRect); > > Might be clearer to have the outputLinkedDestinations functions check the > boolean instead. I did this. > > Source/WebCore/page/PrintContext.cpp:208 > > + if (ctx.supportsInternalLinks()) > > + outputLinkedDestinations(ctx, m_frame->document(), rect); > > Ditto. I did this. > > Source/WebCore/page/PrintContext.cpp:257 > > +void PrintContext::collectLinkedDestinations(Node* node) > > Since this function dereferences the node pointer, it should take a > reference, not a pointer. I suggest Document& and using a non-recursive > algorithm. I did this (both the reference and non-recursiveness, though not taking Document, which is a good idea). > > Source/WebCore/page/PrintContext.cpp:259 > > + // FIXME: We have better ways to iterate now? > > We do. In particular, we should do non-recursive iteration. Since we are > only interested in elements we can use the elements iterator. Ooh, I switched to a different and non-recursive node iterator, but didn't know about the element iterator. Will look into that. > > Source/WebCore/page/PrintContext.cpp:266 > > + const AtomicString& href = static_cast<Element*>(node)->getAttribute(HTMLNames::hrefAttr); > > downcast is preferred over static_cast for casts from node to element. Did this. > auto& would be nice here. Did not do this. > attributeWithoutSynchronization is better here. Did not do this. I never know how to know when to do it! > > Source/WebCore/page/PrintContext.cpp:270 > > + Document& document = node->document(); > > auto& would be nice here. OK. > > Source/WebCore/page/PrintContext.cpp:278 > > + m_linkedDestinations.set(name, element); > > Probably want to use add instead of set. Depends really on what we want to > do if the same name occurs more than once. Interesting. > > Source/WebCore/page/PrintContext.cpp:282 > > +void PrintContext::outputLinkedDestinations(GraphicsContext& graphicsContext, Node* node, const IntRect& pageRect) > > I suggest taking a Document& instead of a Node*. Sure! > > Source/WebCore/page/PrintContext.cpp:287 > > + if (!m_linkedDestinationsValid) { > > + collectLinkedDestinations(node); > > + m_linkedDestinationsValid = true; > > + } > > Seems peculiar that if m_linkedDestinationsValid is true that we ignore the > node; somehow we know it’s the same as last time? Why a pointer for the > node? Can it be null? This thing only lasts one print operation, so there's theretically no opportunity for it to become invalid > > Source/WebCore/page/PrintContext.cpp:290 > > + HashMap<String, Element*>::const_iterator end = m_linkedDestinations.end(); > > + for (HashMap<String, Element*>::const_iterator it = m_linkedDestinations.begin(); it != end; ++it) { > > Modern for loop would work well here and eliminate those long type names for > the iterator. Yep, did this. > > Source/WebCore/page/PrintContext.cpp:292 > > + IntRect boundingBox = it->value->renderer()->absoluteBoundingBoxRect(); > > + if (pageRect.intersects(boundingBox)) { > > Not sure we need a local variable here; might read more nicely without it. I think this is resolved. > Also, what guarantees the renderers are all non-null? Hmm, that's a good point. Will fix.

Tim Horton

Comment 50 2016-11-04 14:08:07 PDT

Follow up in https://trac.webkit.org/changeset/208398

Darin Adler

Comment 51 2016-11-05 13:05:56 PDT

(In reply to comment #49) > > attributeWithoutSynchronization is better here. > > Did not do this. I never know how to know when to do it! If you forget the rule, you can re-derive it by reading the Element::synchronizeAttribute function. The rule is this: If the attribute we are getting is, or might be either 1) the style attribute (HTMLNames::styleAttr) or 2) an animated attribute of an SVG element then we have to use the getAttribute family of functions, which has a couple more branches and is slower. Otherwise, we can and should use the attributeWithoutSynchronization family. So typically if we are writing a general purpose function that can work with any attribute we will need to use the getAttribute family. But if we are writing a function that works with one particular attribute, it's likely we can use the attributeWithoutSynchronization. In this case, the attribute was href, which is not style, and SVG does not have an animated attribute named href. So that means we can use attributeWithoutSynchronization.

Attachments
Patch (9.37 KB, patch) 2013-03-11 17:26 PDT, David Lattimore	no flags	Details Formatted Diff Diff
Patch (9.04 KB, patch) 2013-03-12 21:08 PDT, David Lattimore	no flags	Details Formatted Diff Diff
Patch (7.88 KB, patch) 2013-03-14 22:17 PDT, David Lattimore	no flags	Details Formatted Diff Diff
Patch (9.19 KB, patch) 2013-03-20 20:37 PDT, David Lattimore	no flags	Details Formatted Diff Diff
Patch (10.36 KB, patch) 2013-03-21 17:54 PDT, David Lattimore	ap: review-	Details Formatted Diff Diff
with non-functional CG support (10.64 KB, patch) 2013-03-26 14:27 PDT, Alexey Proskuryakov	no flags	Details Formatted Diff Diff
patch, still doesn't work (9.03 KB, patch) 2015-08-16 17:24 PDT, Tim Horton	no flags	Details Formatted Diff Diff
Significant progress; actually works now (14.51 KB, patch) 2016-11-02 00:20 PDT, Tim Horton	no flags	Details Formatted Diff Diff
Patch (16.47 KB, patch) 2016-11-02 18:24 PDT, Tim Horton	no flags	Details Formatted Diff Diff
Patch (16.47 KB, patch) 2016-11-02 18:26 PDT, Tim Horton	no flags	Details Formatted Diff Diff
Patch (16.47 KB, patch) 2016-11-02 18:41 PDT, Tim Horton	no flags	Details Formatted Diff Diff
Patch (20.86 KB, patch) 2016-11-03 13:05 PDT, Tim Horton	simon.fraser: review+	Details Formatted Diff Diff
Show Obsolete (10) View All Add attachment proposed patch, testcase, etc.

ap
cdumez
cmarcelo
commit-queue
darin
dbates
ddkilzer
dglazkov
dml+webkit
eric
esprehn+autocc
glenn
junov
kangil.han
kondapallykalyan
ojan.autocc
peter+ews
scotty.hyndman
senorblanco
simon.fraser
thorton
webkit.review.bot