Bug 112081 - Printing to PDF should produce internal links when HTML has internal links
Summary: Printing to PDF should produce internal links when HTML has internal links
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Printing (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Enhancement
Assignee: Tim Horton
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2013-03-11 16:56 PDT by David Lattimore
Modified: 2016-11-05 13:05 PDT (History)
22 users (show)

See Also:


Attachments
Patch (9.37 KB, patch)
2013-03-11 17:26 PDT, David Lattimore
no flags Details | Formatted Diff | Diff
Patch (9.04 KB, patch)
2013-03-12 21:08 PDT, David Lattimore
no flags Details | Formatted Diff | Diff
Patch (7.88 KB, patch)
2013-03-14 22:17 PDT, David Lattimore
no flags Details | Formatted Diff | Diff
Patch (9.19 KB, patch)
2013-03-20 20:37 PDT, David Lattimore
no flags Details | Formatted Diff | Diff
Patch (10.36 KB, patch)
2013-03-21 17:54 PDT, David Lattimore
ap: review-
Details | Formatted Diff | Diff
with non-functional CG support (10.64 KB, patch)
2013-03-26 14:27 PDT, Alexey Proskuryakov
no flags Details | Formatted Diff | Diff
patch, still doesn't work (9.03 KB, patch)
2015-08-16 17:24 PDT, Tim Horton
no flags Details | Formatted Diff | Diff
Significant progress; actually works now (14.51 KB, patch)
2016-11-02 00:20 PDT, Tim Horton
no flags Details | Formatted Diff | Diff
Patch (16.47 KB, patch)
2016-11-02 18:24 PDT, Tim Horton
no flags Details | Formatted Diff | Diff
Patch (16.47 KB, patch)
2016-11-02 18:26 PDT, Tim Horton
no flags Details | Formatted Diff | Diff
Patch (16.47 KB, patch)
2016-11-02 18:41 PDT, Tim Horton
no flags Details | Formatted Diff | Diff
Patch (20.86 KB, patch)
2016-11-03 13:05 PDT, Tim Horton
simon.fraser: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description David Lattimore 2013-03-11 16:56:20 PDT
Currently when outputting to PDF with Skia, WebKit treats links to elsewhere in the same HTML document the same as links to elsewhere on the Web. For example, a HTML page with a table of contents that links to headings later in the page will result in a PDF with links back to the original HTML page. A nicer user experience would be for these links to be put out as internal PDF links, so if the user clicks one, they are taken to the relevant page within the PDF.

Proposed solution:
1) When any element with an ID is encountered, output a named destination to the PDF. For example <h1 id="summary">Summary</h1> would result in a destination named "summary" in the PDF. Even if there are no links within the document to this ID, having this named destination allows links elsewhere on the Web to point to this part of the document (e.g. http://example.com/document.pdf#nameddest=summary).

2) When a link is encountered that points to the current document, the URL fragment (part after the "#") is taken and a link is produced that points to the corresponding named destination.

Note, I already have code to do this.

APIs to support this have already been added to Skia in change: https://code.google.com/p/skia/source/detail?r=8034
Comment 1 David Lattimore 2013-03-11 17:26:48 PDT
Created attachment 192604 [details]
Patch
Comment 2 WebKit Review Bot 2013-03-11 18:30:24 PDT
Comment on attachment 192604 [details]
Patch

Attachment 192604 [details] did not pass cr-linux-debug-ews (chromium-xvfb):
Output: http://webkit-commit-queue.appspot.com/results/17169360
Comment 3 Peter Beverloo (cr-android ews) 2013-03-11 18:44:01 PDT
Comment on attachment 192604 [details]
Patch

Attachment 192604 [details] did not pass cr-android-ews (chromium-android):
Output: http://webkit-commit-queue.appspot.com/results/17121466
Comment 4 WebKit Review Bot 2013-03-11 19:06:20 PDT
Comment on attachment 192604 [details]
Patch

Attachment 192604 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://webkit-commit-queue.appspot.com/results/17013944
Comment 5 David Lattimore 2013-03-11 19:30:18 PDT
Compilation failures looks like my Skia change (landed three days ago) is not present. Is updating to the latest version of Skia a manual process?
Comment 6 Alexey Proskuryakov 2013-03-11 22:10:03 PDT
Comment on attachment 192604 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=192604&action=review

Looks fine in general.

> Source/WebCore/ChangeLog:8
> +        This change doesn't have any tests because PDF generation doesn't have any existing tests.

How did you test manually?

> Source/WebCore/platform/graphics/GraphicsContext.h:271
> +        void setShouldOutputInternalLinks(bool);

This function is never called in this patch.

> Source/WebCore/platform/graphics/GraphicsContext.h:431
>          void setURLForRect(const KURL&, const IntRect&);
>  
> +        void setLinkToDestination(const char*, const IntRect&);

This first parameter needs a name, it's not easily deducible from type. The parameter should be a String, it's the job of GraphicsContext to convert it to a type suitable for a specific platform.

It would be better to have these functions follow a common style. I'd name the new one "setDestinationForRect".

> Source/WebCore/platform/graphics/GraphicsContext.h:432
> +        void addDestination(const String&, IntPoint);

addDestinationAtPoint? Also, we normally pass IntPoint as const reference, too.

> Source/WebCore/rendering/RenderObject.cpp:1118
> +    KURL url = n->document()->completeURL(href);
> +    CString urlString = url.string().utf8();

The URL may be invalid, which needs to be accounted for.

> Source/WebCore/rendering/RenderObject.cpp:1120
> +        context->setLinkToDestination(strchr(urlString.data(), '#') + 1, pixelSnappedIntRect(rect));

A better way to determine what the identifier is is would be via KURL::fragmentIdentifier().

> Source/WebCore/rendering/RenderObject.cpp:1127
> +    Node* n = this->node();

The reason to say "this->node()" is to avoid conflicts with a local variable of the same name, which you don't have here.

But using abbreviations is discouraged in WebKit code, so you should.

> Source/WebCore/rendering/RenderObject.cpp:1139
> +    this->addDestination(graphicsContext);

No reason for "this->" here.

> Source/WebCore/rendering/RenderObject.cpp:1142
> +    while (child) {
> +        child->addDestinationsRecursive(graphicsContext);

Wow, adding a destination for every element with an id is quite some work. You explained the reason in the bug, but I'm not convinced. If there is already an HTML document on the Web, other pages should link to it, not to a PDF. And if it's not on the Web, a real authoring tool should be used to produce the file to publish, not a browser's "Save As" command.

Also, anchors link to names, not to ids.

> Source/WebCore/rendering/RenderObject.h:983
> +    void addDestination(GraphicsContext*);

This function is never called from outside, and should be private.
Comment 7 Stephen White 2013-03-12 08:02:38 PDT
(In reply to comment #5)
> Compilation failures looks like my Skia change (landed three days ago) is not present. Is updating to the latest version of Skia a manual process?

Skia is rolled manually, usually once a day, into Chrome.  You can check which version Chrome is using by examining the src/DEPS file in a Chrome checkout.

Skia is also rolled manually into WebKit, via the Source/WebKit/chromium/DEPS file.  This happens more infrequently, and you may have to ask someone to do it for you, or add it to your change.  This is the version used by the EWS bots, so that will have to happen before you can satisfy the EWS bots on this patch.
Comment 8 Stephen White 2013-03-12 08:04:18 PDT
(In reply to comment #7)
> Skia is also rolled manually into WebKit, via the Source/WebKit/chromium/DEPS file.  This happens more infrequently, and you may have to ask someone to do it for you, or add it to your change.  This is the version used by the EWS bots, so that will have to happen before you can satisfy the EWS bots on this patch.

Let me amend that:  *chrome* is rolled into WebKit via Source/WebKit/chromium/DEPS, which pulls with it the current version of Skia that Chrome is using (in its src/DEPS file).
Comment 9 David Lattimore 2013-03-12 21:08:30 PDT
Created attachment 192865 [details]
Patch
Comment 10 David Lattimore 2013-03-12 21:13:38 PDT
Comment on attachment 192604 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=192604&action=review

>> Source/WebCore/ChangeLog:8
>> +        This change doesn't have any tests because PDF generation doesn't have any existing tests.
> 
> How did you test manually?

I wrote a test HTML file with various different elements (<h1>, <p>, <a>, <div>, <td>). These had IDs. One of the <a> elements had a name instead. I then had links to each of these. I also had an element with an ID but no link to it and a link to something that didn't exist. I generated PDFs from this HTML using both a headless version of WebKit and file->print->save as PDF in Chromium. I then opened these PDFs in a text editor to make sure the links and the destinations looked right (and to make sure there was no destination defined for the element with no link to it). I then opened the PDFs in Evince, Acroread and the PDF viewer on Mac OS and confirmed that clicking the links scrolled to the appropriate page (or in the case of Evince to the actual element itself).

>> Source/WebCore/platform/graphics/GraphicsContext.h:271
>> +        void setShouldOutputInternalLinks(bool);
> 
> This function is never called in this patch.

Was thinking of providing a way to disable it in case some users of WebKit didn't want it. Probably less needed now that I'm only outputting named destinations that are used. Removed this.

>> Source/WebCore/platform/graphics/GraphicsContext.h:431
>> +        void setLinkToDestination(const char*, const IntRect&);
> 
> This first parameter needs a name, it's not easily deducible from type. The parameter should be a String, it's the job of GraphicsContext to convert it to a type suitable for a specific platform.
> 
> It would be better to have these functions follow a common style. I'd name the new one "setDestinationForRect".

Done

>> Source/WebCore/platform/graphics/GraphicsContext.h:432
>> +        void addDestination(const String&, IntPoint);
> 
> addDestinationAtPoint? Also, we normally pass IntPoint as const reference, too.

Done

>> Source/WebCore/rendering/RenderObject.cpp:1118
>> +    CString urlString = url.string().utf8();
> 
> The URL may be invalid, which needs to be accounted for.

Done

>> Source/WebCore/rendering/RenderObject.cpp:1120
>> +        context->setLinkToDestination(strchr(urlString.data(), '#') + 1, pixelSnappedIntRect(rect));
> 
> A better way to determine what the identifier is is would be via KURL::fragmentIdentifier().

Done

>> Source/WebCore/rendering/RenderObject.cpp:1127
>> +    Node* n = this->node();
> 
> The reason to say "this->node()" is to avoid conflicts with a local variable of the same name, which you don't have here.
> 
> But using abbreviations is discouraged in WebKit code, so you should.

This code has now been removed.

>> Source/WebCore/rendering/RenderObject.cpp:1139
>> +    this->addDestination(graphicsContext);
> 
> No reason for "this->" here.

Code removed.

>> Source/WebCore/rendering/RenderObject.cpp:1142
>> +        child->addDestinationsRecursive(graphicsContext);
> 
> Wow, adding a destination for every element with an id is quite some work. You explained the reason in the bug, but I'm not convinced. If there is already an HTML document on the Web, other pages should link to it, not to a PDF. And if it's not on the Web, a real authoring tool should be used to produce the file to publish, not a browser's "Save As" command.
> 
> Also, anchors link to names, not to ids.

IDs are checked first. Names are checked second and only for <a> elements. http://www.w3.org/html/wg/drafts/html/master/browsers.html#scroll-to-fragid

Fortunately I was able to take advantage of Document::findAnchor, so this patch no longer has to worry about those details.

I've changed the patch to:
1) only output links to named destinations where such a named destination exists.
2) only output named destinations if something links to it.

>> Source/WebCore/rendering/RenderObject.h:983
>> +    void addDestination(GraphicsContext*);
> 
> This function is never called from outside, and should be private.

Removed method.
Comment 11 David Lattimore 2013-03-12 21:17:06 PDT
Thanks for the quick review. I think I've addressed all your comments, although it was a substantial change so I expect there will likely be additional comments on the new code. I had a look at the version of Skia being pulled by Chromium and the version of Chromium being pulled by WebKit and it looks like it'll probably be OK to me - someone else updated it a few hours ago. So fingers crossed that it passes this time.
Comment 12 Alexey Proskuryakov 2013-03-12 22:31:51 PDT
Comment on attachment 192865 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=192865&action=review

Thank you, this looks much better. r- for leaving a dangling pointer.

> Source/WebCore/ChangeLog:9
> +        This change doesn't have any tests because PDF generation doesn't
> +        have any existing tests. Manual testing was performed in Chromium

That said, adding a way to test PDF generation would be a valuable project.

> Source/WebCore/ChangeLog:17
> +        (WebCore):

It's just a bug of prepare-ChangeLog that this cruft is added. The purpose of generating file and name links is to simplify adding per-function comments to ChangeLogs, and unhelpful garbage should be removed (yes, a lot of people don't do that, sadly).

> Source/WebCore/dom/Document.h:1200
> +    HashMap<String, Element*>& usedAnchors() { return m_usedAnchors; }

Looks like this could be const (on both sides). But see other comments.

> Source/WebCore/dom/Document.h:1465
> +    HashMap<String, Element*> m_usedAnchors;

This is very surprising to have as a permanent member of Document. This causes a serious bug: if one generates PDF twice, and an element is removed in between the calls, a dangling pointer is left in the HashMap, and then dereferenced.

The HashMap should at least be cleared when no longer needed, but I think that I have a better suggestion below.

> Source/WebCore/page/FrameView.cpp:3434
> +            p->addDestinationAtPoint(i->key, IntPoint(0, y));

Why no horizontal offset? The user can have the document zoomed in, and then scrolling to (0, y) may not reveal the element.

Can the result of absoluteBoundingBoxRect() have negative coordinates? Perhaps we should clamp to 0. Ditto for being out of page bounds on the other side.

> Source/WebCore/rendering/RenderObject.cpp:1124
> +            document()->addUsedAnchor(name, element);

Would it work if we just called addDestinationAtPoint here? The same destination could be added several times, but I don't see any reason why that would be a problem. We are just using an intermediate map instead of one in PDF context.

> Source/WebCore/rendering/RenderObject.cpp:1129
> +            return;
> +        }
> +    }
> +    context->setURLForRect(url, pixelSnappedIntRect(rect));

Unsure if it's meaningful to add a same-document URL if anchor couldn't be found. Perhaps the early return should be outside "if (element)"?
Comment 13 David Lattimore 2013-03-12 23:30:09 PDT
Comment on attachment 192865 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=192865&action=review

>> Source/WebCore/rendering/RenderObject.cpp:1124
>> +            document()->addUsedAnchor(name, element);
> 
> Would it work if we just called addDestinationAtPoint here? The same destination could be added several times, but I don't see any reason why that would be a problem. We are just using an intermediate map instead of one in PDF context.

Unfortunately not. GraphicsContext::addDestinationAtPoint needs to be called on the GraphicsContext for the page containing the destination. If we called it here, the link would only work if the destination was on the same page as the link. It's currently being called for all pages and then dropped when the point falls outside the bounds of the page (i.e. all but the correct page). I'll have a look and see if I can find another way of structuring it without storing stuff on Document. If you have any ideas, let me know.

>> Source/WebCore/rendering/RenderObject.cpp:1129
>> +    context->setURLForRect(url, pixelSnappedIntRect(rect));
> 
> Unsure if it's meaningful to add a same-document URL if anchor couldn't be found. Perhaps the early return should be outside "if (element)"?

The patch as it currently stands puts out a regular link - i.e. falls back to the old behavior. I'll move the return, which will mean same-document URLs that are broken will become non-links.
Comment 14 Alexey Proskuryakov 2013-03-13 08:35:47 PDT
> GraphicsContext::addDestinationAtPoint needs to be called on the GraphicsContext for the page containing the destination.

I'm confused. Do we have multiple GraphicsContexts when printing to a PDF? But even then, aren't all internal links in-document?
Comment 15 David Lattimore 2013-03-14 22:17:40 PDT
Created attachment 193232 [details]
Patch
Comment 16 David Lattimore 2013-03-14 22:21:19 PDT
Comment on attachment 192865 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=192865&action=review

>> Source/WebCore/ChangeLog:9
>> +        have any existing tests. Manual testing was performed in Chromium
> 
> That said, adding a way to test PDF generation would be a valuable project.

Agreed. Not sure I have sufficient familiarity with WebKit to know where and how to write such a test. Ideally it'd run WebKit headlessly on an input HTML file, render to PDF then make assertions about the contents of the PDF. I've got no idea how straightforward it would be to run WebKit in this manner.

As an interim measure, is there somewhere I can put my test file, in case the code is changed later and someone wants to manually test that it still works? Or should I just attach it to this bug?

>> Source/WebCore/ChangeLog:17
>> +        (WebCore):
> 
> It's just a bug of prepare-ChangeLog that this cruft is added. The purpose of generating file and name links is to simplify adding per-function comments to ChangeLogs, and unhelpful garbage should be removed (yes, a lot of people don't do that, sadly).

Makes sense. Removed.

>> Source/WebCore/dom/Document.h:1200
>> +    HashMap<String, Element*>& usedAnchors() { return m_usedAnchors; }
> 
> Looks like this could be const (on both sides). But see other comments.

Code removed.

>> Source/WebCore/dom/Document.h:1465
>> +    HashMap<String, Element*> m_usedAnchors;
> 
> This is very surprising to have as a permanent member of Document. This causes a serious bug: if one generates PDF twice, and an element is removed in between the calls, a dangling pointer is left in the HashMap, and then dereferenced.
> 
> The HashMap should at least be cleared when no longer needed, but I think that I have a better suggestion below.

I've removed all code from Document.

>> Source/WebCore/page/FrameView.cpp:3434
>> +            p->addDestinationAtPoint(i->key, IntPoint(0, y));
> 
> Why no horizontal offset? The user can have the document zoomed in, and then scrolling to (0, y) may not reveal the element.
> 
> Can the result of absoluteBoundingBoxRect() have negative coordinates? Perhaps we should clamp to 0. Ditto for being out of page bounds on the other side.

I now check that the point is inside the page's rectangle and ignore it if it isn't. I think Skia does this as well, but it doesn't hurt to do it in WebKit as well.

>>> Source/WebCore/rendering/RenderObject.cpp:1124
>>> +            document()->addUsedAnchor(name, element);
>> 
>> Would it work if we just called addDestinationAtPoint here? The same destination could be added several times, but I don't see any reason why that would be a problem. We are just using an intermediate map instead of one in PDF context.
> 
> Unfortunately not. GraphicsContext::addDestinationAtPoint needs to be called on the GraphicsContext for the page containing the destination. If we called it here, the link would only work if the destination was on the same page as the link. It's currently being called for all pages and then dropped when the point falls outside the bounds of the page (i.e. all but the correct page). I'll have a look and see if I can find another way of structuring it without storing stuff on Document. If you have any ideas, let me know.

You asked if a new GraphicsContext is created for each page... I'd assumed that one was - the headless code I'm using does that, but I see that PrintContext::spoolAllPagesWithBoundaries doesn't do that - it instead reuses the same GraphicsContext. In any case, at the Skia level, we definitely have a different SkPDFDevice/SkCanvas for each page and the destinations need to get written to the correct SkPDFDevice for the page that the destination appears on. So unless there is some way to get access to a random page's SkCanvas, the destinations need to be put out while the page containing those destinations is being rendered.

Instead, I've added code to FrameView that finds links, gets the elements they point to and outputs them as destinations if they're on the current page.
Comment 17 Alexey Proskuryakov 2013-03-14 23:13:21 PDT
> Agreed. Not sure I have sufficient familiarity with WebKit to know where and how to write such a test. Ideally it'd run WebKit headlessly on an input HTML file, render to PDF then make assertions about the contents of the PDF. I've got no idea how straightforward it would be to run WebKit in this manner.

The hardest part here is to make assertions about contents of produced PDFs, and to a lesser degree, adding plumbing to produce a PDFD. Headless operation is what regression tests do normally, and we have 30000 of them.

Tests in LayoutTests/webarchive directory are somewhat similar, although their way to assert correctness is by comparing produced binary file to expected one, which would not work so great for PDFs.

> As an interim measure, is there somewhere I can put my test file, in case the code is changed later and someone wants to manually test that it still works? Or should I just attach it to this bug?

There is a ManualTests directory at the root of WebKit checkout, but I would discourage using it. No one ever runs these tests, and adding something that requires multiple manual steps just makes existing tests slightly less useful.

> In any case, at the Skia level, we definitely have a different SkPDFDevice/SkCanvas for each page and the destinations need to get written to the correct SkPDFDevice for the page that the destination appears on. So unless there is some way to get access to a random page's SkCanvas, the destinations need to be put out while the page containing those destinations is being rendered.

Are you talking about pages in a produced PDF, or about subframes? There is something here that feels wrong, but I can't put my finger on it.
Comment 18 David Lattimore 2013-03-15 00:05:01 PDT
(In reply to comment #17)
> > Agreed. Not sure I have sufficient familiarity with WebKit to know where and how to write such a test. Ideally it'd run WebKit headlessly on an input HTML file, render to PDF then make assertions about the contents of the PDF. I've got no idea how straightforward it would be to run WebKit in this manner.
> 
> The hardest part here is to make assertions about contents of produced PDFs, and to a lesser degree, adding plumbing to produce a PDFD. Headless operation is what regression tests do normally, and we have 30000 of them.

The assertions I think I can manage, since both the links and the destinations end up in the PDF in plain text. It's the plumbing parts that I'm unsure of.

> > In any case, at the Skia level, we definitely have a different SkPDFDevice/SkCanvas for each page and the destinations need to get written to the correct SkPDFDevice for the page that the destination appears on. So unless there is some way to get access to a random page's SkCanvas, the destinations need to be put out while the page containing those destinations is being rendered.
> 
> Are you talking about pages in a produced PDF, or about subframes? There is something here that feels wrong, but I can't put my finger on it.

AFAIK there should be one SkPDFDevice and on SkCanvas per page in the produced PDF.

I've got some ideas that I'll investigate on Monday (Sydney time) that may mean it would work to add the destinations from RenderObject (like you originally suggested).
Comment 19 David Lattimore 2013-03-17 17:19:27 PDT
(In reply to comment #17)
> > In any case, at the Skia level, we definitely have a different SkPDFDevice/SkCanvas for each page and the destinations need to get written to the correct SkPDFDevice for the page that the destination appears on. So unless there is some way to get access to a random page's SkCanvas, the destinations need to be put out while the page containing those destinations is being rendered.
> 
> Are you talking about pages in a produced PDF, or about subframes? There is something here that feels wrong, but I can't put my finger on it.

I thought it might be possible to call addDestinationAtPoint from RenderObject::addPDFURLRect. I've now tried it and it doesn't work. Destinations only get written if they are on the same page as something that links to them. I thought that addPDFURLRect was getting called for each page and for each URL, but, probably fortunately for performance, it isn't.

I did some investigation as to where the GraphicsContext is getting created. It seems there is a new GraphicsContext for each page, created (at least for Chromium) in WebFrameImpl::printPage.

Given the above, I don't see any obvious options other than what I'm doing in my latest patch.

Is it a problem that I'm traversing the render tree for each page? The paint code already does that. One alternative to that would be to traverse it only for the first page, then store a HashMap on FrameView from names to elements that have been linked to. This could then be reused for subsequent pages.
Comment 20 Alexey Proskuryakov 2013-03-18 22:09:12 PDT
> Is it a problem that I'm traversing the render tree for each page?

Well, this effectively adds an O(n^2) algorithm. Can you measure how long it takes to generate a PDF for single page HTML5 spec with and without this patch?

IIRC Safari prints the whole document at once, not page by page (except when generating preview). So, ideally I'm interested in the above measurement for Safari.
Comment 21 David Lattimore 2013-03-20 20:37:03 PDT
Created attachment 194177 [details]
Patch
Comment 22 David Lattimore 2013-03-20 20:54:20 PDT
Does Safari use Skia for generating PDFs? When I build WebKit and run Safari (using Tools/Scripts/run-safari) then print to PDF, the resulting PDF appears as though my change was not present. It seems like the code in RenderObject is being run but not the code in GraphicsContextSkia.

In any case, I've updated my change so that it only traverses the DOM once and caches the linked elements for use on subsequent pages.
Comment 23 Tim Horton 2013-03-20 20:56:17 PDT
(In reply to comment #22)
> Does Safari use Skia for generating PDFs? When I build WebKit and run Safari (using Tools/Scripts/run-safari) then print to PDF, the resulting PDF appears as though my change was not present. It seems like the code in RenderObject is being run but not the code in GraphicsContextSkia.

No, Safari does not use Skia, it uses CG. Skia is a Chromium-port thing.
Comment 24 Tim Horton 2013-03-21 11:30:28 PDT
Comment on attachment 194177 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=194177&action=review

> Source/WebCore/page/FrameView.cpp:3490
> +    if (!m_linkedDestinationsValid) {
> +        collectLinkedDestinations(node);
> +        m_linkedDestinationsValid = true;
> +    }

What if you print, the page is mutated via JavaScript, and then you print again? m_linkedDestinations will be stale, right? And contain stale pointers to Elements? Which you will then dereference below?

> Source/WebCore/page/FrameView.cpp:3495
> +        IntRect bb = it->value->renderer()->absoluteBoundingBoxRect();
> +        IntPoint point(bb.x(), bb.y());

Please expand "bb".

> Source/WebCore/page/FrameView.cpp:3496
> +        if (pageRect.contains(point))

What if the bounds are shoved off the page but the bounding box intersects the page? You probably want "intersection of pageRect and 'bb' is not empty", no?

> Source/WebCore/platform/graphics/skia/GraphicsContextSkia.cpp:1113
> +    return true;

Shouldn't this only return true if the graphics context is backed by e.g. a PDF? Certainly you don't want to do all this work for a bitmap Skia context...
Comment 25 David Lattimore 2013-03-21 17:54:11 PDT
Created attachment 194404 [details]
Patch
Comment 26 David Lattimore 2013-03-21 17:57:50 PDT
Comment on attachment 194177 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=194177&action=review

>> Source/WebCore/page/FrameView.cpp:3490
>> +    }
> 
> What if you print, the page is mutated via JavaScript, and then you print again? m_linkedDestinations will be stale, right? And contain stale pointers to Elements? Which you will then dereference below?

I'd wrongly thought that FrameView only lived for the duration of the print. I've moved this code to PrintContext which should fix this.

>> Source/WebCore/page/FrameView.cpp:3495
>> +        IntPoint point(bb.x(), bb.y());
> 
> Please expand "bb".

Done.

>> Source/WebCore/page/FrameView.cpp:3496
>> +        if (pageRect.contains(point))
> 
> What if the bounds are shoved off the page but the bounding box intersects the page? You probably want "intersection of pageRect and 'bb' is not empty", no?

Good point. Done.

>> Source/WebCore/platform/graphics/skia/GraphicsContextSkia.cpp:1113
>> +    return true;
> 
> Shouldn't this only return true if the graphics context is backed by e.g. a PDF? Certainly you don't want to do all this work for a bitmap Skia context...

I now check if it's printing.
Comment 27 Alexey Proskuryakov 2013-03-26 14:23:04 PDT
Comment on attachment 194404 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=194404&action=review

> In any case, at the Skia level, we definitely have a different SkPDFDevice/SkCanvas for each page and the destinations need to get written to the correct SkPDFDevice for the page that the destination appears on.

I don't understand how this can work. If every page is separate, how do cross-page links work at all? Surely there is an object in Skia that represents the whole multi-page document? For example in CG, it's a CGPDFContext, and one calls CGPDFContextBeginPage when moving to the next page.

> Source/WebCore/platform/graphics/skia/GraphicsContextSkia.cpp:1113
> +    return platformContext()->printing();

I'm not sure if this is a good place to make such checks. A GraphicsContext may not have any idea about whether it's printing or not, it's the responsibility of both higher level and lower level code.
Comment 28 Alexey Proskuryakov 2013-03-26 14:27:18 PDT
Created attachment 195162 [details]
with non-functional CG support

I tried making a patch that works with CG too, but for some reason, a resulting document has no clickable links at all. Perhaps rects are represented in a way that's different from how it's done elsewhere?
Comment 29 David Lattimore 2013-04-01 19:33:28 PDT
Sorry about the slow reply - we had a 4 day weekend here in Australia.

(In reply to comment #27)
> (From update of attachment 194404 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=194404&action=review
> 
> > In any case, at the Skia level, we definitely have a different SkPDFDevice/SkCanvas for each page and the destinations need to get written to the correct SkPDFDevice for the page that the destination appears on.
> 
> I don't understand how this can work. If every page is separate, how do cross-page links work at all? Surely there is an object in Skia that represents the whole multi-page document? For example in CG, it's a CGPDFContext, and one calls CGPDFContextBeginPage when moving to the next page.

There's SkPDFDocument. It represents the whole PDF document. The destinations still need to be written to the correct SkPDFDevice for the page they're on though.
 
> > Source/WebCore/platform/graphics/skia/GraphicsContextSkia.cpp:1113
> > +    return platformContext()->printing();
> 
> I'm not sure if this is a good place to make such checks. A GraphicsContext may not have any idea about whether it's printing or not, it's the responsibility of both higher level and lower level code.

So I should just revert this to return true here? Now that adding the destinations is being done in PrintContext, it will already only be getting run if we're printing. That only leaves linking to destinations which doesn't really do any significant amount of work, so is probably OK to do even if the underlying Skia device doesn't support it.

(In reply to comment #28)
> Created an attachment (id=195162) [details]
> with non-funcitonal CG support
> 
> I tried making a patch that works with CG too, but for some reason, a resulting document has no clickable links at all. Perhaps rects are represented in a way that's different from how it's done elsewhere?

Presumably it would need to apply transforms in a similar way to what the CG implementation of GraphicsContext::setURLForRect does. That said, I tried doing that and it still didn't work. Looking at the generated PDF in a text editor, neither the destinations nor the links to the destinations are being put out. I've got no idea why not.
Comment 30 Darin Adler 2013-04-08 18:07:52 PDT
We used to do this in Safari and Core Graphics. Did the code to do this get removed at some point?
Comment 31 Darin Adler 2013-04-08 18:08:13 PDT
Oh, I see “internal links!”

Got it.
Comment 32 David Lattimore 2013-04-08 18:14:08 PDT
I'm going to continue with trying to get this change into Blink. https://code.google.com/p/chromium/issues/detail?id=157266

If anyone is interested in seeing this feature in WebKit with CG, feel free to pick this up.

Thanks Alexey for the helpful comments.
Comment 33 Alexey Proskuryakov 2013-04-08 19:04:09 PDT
Comment on attachment 194404 [details]
Patch

r- to get this out of review queue, as we don't support Skia in WebKit any more. We should certainly finish this.
Comment 34 Alexey Proskuryakov 2013-05-10 10:47:08 PDT
<rdar://problem/8239740>
Comment 35 Tim Horton 2015-08-16 17:24:29 PDT
Created attachment 259132 [details]
patch, still doesn't work

Rebaselined the patch, and added the requisite transforms to the CG code. It still doesn't work though, even though everything we're handing to CG seems totally sane. Not sure what's up.
Comment 36 Tim Horton 2016-10-11 01:06:28 PDT
I believe the problem is WKPrintingView _drawPDFDocument:page:atPoint:'s implementation of annotation copying.
Comment 37 Tim Horton 2016-10-11 01:14:09 PDT
Yup! This should be pretty easy now, knowing that.
Comment 38 Tim Horton 2016-11-02 00:20:30 PDT
Created attachment 293645 [details]
Significant progress; actually works now
Comment 39 Tim Horton 2016-11-02 00:21:16 PDT
Comment on attachment 293645 [details]
Significant progress; actually works now

Not ready for review, though.
Comment 40 Tim Horton 2016-11-02 18:24:13 PDT
Created attachment 293729 [details]
Patch
Comment 41 Tim Horton 2016-11-02 18:26:07 PDT
Created attachment 293730 [details]
Patch
Comment 42 Tim Horton 2016-11-02 18:41:16 PDT
Created attachment 293732 [details]
Patch
Comment 43 Simon Fraser (smfr) 2016-11-02 19:14:31 PDT
Comment on attachment 293732 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=293732&action=review

> Source/WebCore/page/PrintContext.cpp:252
> +void PrintContext::collectLinkedDestinations(Node* node)

Node&

> Source/WebCore/page/PrintContext.cpp:255
> +    for (Node* child = node->firstChild(); child; child = child->nextSibling())
> +        collectLinkedDestinations(child);

Hmm, can we use tree traversal iterator thingies?

> Source/WebCore/platform/graphics/GraphicsContext.h:474
> +    void setDestinationForRect(const String& name, const IntRect&);
> +    void addDestinationAtPoint(const String& name, const IntPoint&);

The rest of this API is FloatPoint/FloatRect, other than the URL one above.

> Source/WebCore/platform/graphics/GraphicsContext.h:582
> +    bool supportsInternalLinks();

const ?

> Source/WebCore/rendering/RenderObject.cpp:621
> +    if (paintInfo.context().supportsInternalLinks() && url.hasFragmentIdentifier() && equalIgnoringFragmentIdentifier(url, document.baseURL())) {
> +        String name = url.fragmentIdentifier();
> +        if (document.findAnchor(name))
> +            paintInfo.context().setDestinationForRect(name, snappedIntRect(urlRect));
> +        return;
> +    }

This looks a lot like the code in PrintContext::collectLinkedDestinations(). Share it?
Comment 44 Tim Horton 2016-11-03 13:05:09 PDT
Created attachment 293791 [details]
Patch
Comment 45 Simon Fraser (smfr) 2016-11-03 13:19:11 PDT
Comment on attachment 293791 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=293791&action=review

> Source/WebCore/page/PrintContext.h:110
> +    Optional<HashMap<String, Element*>> m_linkedDestinations;

It makes me slightly nervous that this has raw Element*. Can script run while this thing is live?

Also, not sure that Optional<> gains you much over unique_ptr<>
Comment 46 Tim Horton 2016-11-03 13:29:59 PDT
(In reply to comment #45)
> Comment on attachment 293791 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=293791&action=review
> 
> > Source/WebCore/page/PrintContext.h:110
> > +    Optional<HashMap<String, Element*>> m_linkedDestinations;
> 
> It makes me slightly nervous that this has raw Element*. Can script run
> while this thing is live?

It shouldn't be able to; it lives just during PrintContext between begin/end, which looks like only encompasses painting, but I could be wrong. It seems entirely reasonable to ref these, anyway.

> Also, not sure that Optional<> gains you much over unique_ptr<>

Sure, but is there any reason to prefer unique_ptr?
Comment 47 Tim Horton 2016-11-03 14:11:31 PDT
https://trac.webkit.org/changeset/208347
Comment 48 Darin Adler 2016-11-04 09:53:34 PDT
Comment on attachment 293645 [details]
Significant progress; actually works now

View in context: https://bugs.webkit.org/attachment.cgi?id=293645&action=review

Sorry, I missed the final work on this and maybe things changed a lot in the patch that you landed, but here were my comments on an earlier draft.

> Source/WebCore/page/PrintContext.cpp:39
> +    , m_linkedDestinationsValid(false)

In new code we should do these in the class definitions.

> Source/WebCore/page/PrintContext.cpp:196
> +    if (ctx.supportsInternalLinks())
> +        outputLinkedDestinations(ctx, m_frame->document(), pageRect);

Might be clearer to have the outputLinkedDestinations functions check the boolean instead.

> Source/WebCore/page/PrintContext.cpp:208
> +    if (ctx.supportsInternalLinks())
> +        outputLinkedDestinations(ctx, m_frame->document(), rect);

Ditto.

> Source/WebCore/page/PrintContext.cpp:257
> +void PrintContext::collectLinkedDestinations(Node* node)

Since this function dereferences the node pointer, it should take a reference, not a pointer. I suggest Document& and using a non-recursive algorithm.

> Source/WebCore/page/PrintContext.cpp:259
> +    // FIXME: We have better ways to iterate now?

We do. In particular, we should do non-recursive iteration. Since we are only interested in elements we can use the elements iterator.

> Source/WebCore/page/PrintContext.cpp:266
> +    const AtomicString& href = static_cast<Element*>(node)->getAttribute(HTMLNames::hrefAttr);

downcast is preferred over static_cast for casts from node to element.

auto& would be nice here.

attributeWithoutSynchronization is better here.

> Source/WebCore/page/PrintContext.cpp:270
> +    Document& document = node->document();

auto& would be nice here.

> Source/WebCore/page/PrintContext.cpp:278
> +            m_linkedDestinations.set(name, element);

Probably want to use add instead of set. Depends really on what we want to do if the same name occurs more than once.

> Source/WebCore/page/PrintContext.cpp:282
> +void PrintContext::outputLinkedDestinations(GraphicsContext& graphicsContext, Node* node, const IntRect& pageRect)

I suggest taking a Document& instead of a Node*.

> Source/WebCore/page/PrintContext.cpp:287
> +    if (!m_linkedDestinationsValid) {
> +        collectLinkedDestinations(node);
> +        m_linkedDestinationsValid = true;
> +    }

Seems peculiar that if m_linkedDestinationsValid is true that we ignore the node; somehow we know it’s the same as last time? Why a pointer for the node? Can it be null?

> Source/WebCore/page/PrintContext.cpp:290
> +    HashMap<String, Element*>::const_iterator end = m_linkedDestinations.end();
> +    for (HashMap<String, Element*>::const_iterator it = m_linkedDestinations.begin(); it != end; ++it) {

Modern for loop would work well here and eliminate those long type names for the iterator.

> Source/WebCore/page/PrintContext.cpp:292
> +        IntRect boundingBox = it->value->renderer()->absoluteBoundingBoxRect();
> +        if (pageRect.intersects(boundingBox)) {

Not sure we need a local variable here; might read more nicely without it.

Also, what guarantees the renderers are all non-null?
Comment 49 Tim Horton 2016-11-04 12:27:49 PDT
(In reply to comment #48)
> Comment on attachment 293645 [details]
> Significant progress; actually works now
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=293645&action=review
> 
> Sorry, I missed the final work on this and maybe things changed a lot in the
> patch that you landed, but here were my comments on an earlier draft.

OK! Let me see what I fixed; it looks like this one was before I really started ripping it up, so hopefully most are fixed.

> > Source/WebCore/page/PrintContext.cpp:39
> > +    , m_linkedDestinationsValid(false)
> 
> In new code we should do these in the class definitions.

This is gone.

> > Source/WebCore/page/PrintContext.cpp:196
> > +    if (ctx.supportsInternalLinks())
> > +        outputLinkedDestinations(ctx, m_frame->document(), pageRect);
> 
> Might be clearer to have the outputLinkedDestinations functions check the
> boolean instead.

I did this.

> > Source/WebCore/page/PrintContext.cpp:208
> > +    if (ctx.supportsInternalLinks())
> > +        outputLinkedDestinations(ctx, m_frame->document(), rect);
> 
> Ditto.

I did this.

> > Source/WebCore/page/PrintContext.cpp:257
> > +void PrintContext::collectLinkedDestinations(Node* node)
> 
> Since this function dereferences the node pointer, it should take a
> reference, not a pointer. I suggest Document& and using a non-recursive
> algorithm.

I did this (both the reference and non-recursiveness, though not taking Document, which is a good idea).

> > Source/WebCore/page/PrintContext.cpp:259
> > +    // FIXME: We have better ways to iterate now?
> 
> We do. In particular, we should do non-recursive iteration. Since we are
> only interested in elements we can use the elements iterator.

Ooh, I switched to a different and non-recursive node iterator, but didn't know about the element iterator. Will look into that.

> > Source/WebCore/page/PrintContext.cpp:266
> > +    const AtomicString& href = static_cast<Element*>(node)->getAttribute(HTMLNames::hrefAttr);
> 
> downcast is preferred over static_cast for casts from node to element.

Did this.

> auto& would be nice here.

Did not do this.

> attributeWithoutSynchronization is better here.

Did not do this. I never know how to know when to do it!

> > Source/WebCore/page/PrintContext.cpp:270
> > +    Document& document = node->document();
> 
> auto& would be nice here.

OK.

> > Source/WebCore/page/PrintContext.cpp:278
> > +            m_linkedDestinations.set(name, element);
> 
> Probably want to use add instead of set. Depends really on what we want to
> do if the same name occurs more than once.

Interesting.

> > Source/WebCore/page/PrintContext.cpp:282
> > +void PrintContext::outputLinkedDestinations(GraphicsContext& graphicsContext, Node* node, const IntRect& pageRect)
> 
> I suggest taking a Document& instead of a Node*.

Sure!

> > Source/WebCore/page/PrintContext.cpp:287
> > +    if (!m_linkedDestinationsValid) {
> > +        collectLinkedDestinations(node);
> > +        m_linkedDestinationsValid = true;
> > +    }
> 
> Seems peculiar that if m_linkedDestinationsValid is true that we ignore the
> node; somehow we know it’s the same as last time? Why a pointer for the
> node? Can it be null?

This thing only lasts one print operation, so there's theretically no opportunity for it to become invalid

> > Source/WebCore/page/PrintContext.cpp:290
> > +    HashMap<String, Element*>::const_iterator end = m_linkedDestinations.end();
> > +    for (HashMap<String, Element*>::const_iterator it = m_linkedDestinations.begin(); it != end; ++it) {
> 
> Modern for loop would work well here and eliminate those long type names for
> the iterator.

Yep, did this.

> > Source/WebCore/page/PrintContext.cpp:292
> > +        IntRect boundingBox = it->value->renderer()->absoluteBoundingBoxRect();
> > +        if (pageRect.intersects(boundingBox)) {
> 
> Not sure we need a local variable here; might read more nicely without it.

I think this is resolved.

> Also, what guarantees the renderers are all non-null?

Hmm, that's a good point. Will fix.
Comment 50 Tim Horton 2016-11-04 14:08:07 PDT
Follow up in https://trac.webkit.org/changeset/208398
Comment 51 Darin Adler 2016-11-05 13:05:56 PDT
(In reply to comment #49)
> > attributeWithoutSynchronization is better here.
> 
> Did not do this. I never know how to know when to do it!

If you forget the rule, you can re-derive it by reading the Element::synchronizeAttribute function.

The rule is this: If the attribute we are getting is, or might be either

1) the style attribute (HTMLNames::styleAttr) or
2) an animated attribute of an SVG element

then we have to use the getAttribute family of functions, which has a couple more branches and is slower. Otherwise, we can and should use the attributeWithoutSynchronization family.

So typically if we are writing a general purpose function that can work with any attribute we will need to use the getAttribute family. But if we are writing a function that works with one particular attribute, it's likely we can use the attributeWithoutSynchronization.

In this case, the attribute was href, which is not style, and SVG does not have an animated attribute named href. So that means we can use attributeWithoutSynchronization.