Bug 194934 - Spurious find results on many apple.com pages
Summary: Spurious find results on many apple.com pages
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Tim Horton
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2019-02-21 23:02 PST by Tim Horton
Modified: 2019-06-15 18:18 PDT (History)
9 users (show)

See Also:


Attachments
Patch (3.03 KB, patch)
2019-06-14 13:00 PDT, Tim Horton
simon.fraser: review-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Tim Horton 2019-02-21 23:02:46 PST
Apple.com seems to have adopted a technique where they hide accessibility text all over the page in spans with this style:

    clip: rect(1px, 1px, 1px, 1px);
    -webkit-clip-path: inset(0px 0px 99.9% 99.9%);
    clip-path: inset(0px 0px 99.9% 99.9%);
    overflow: hidden;
    height: 1px;
    width: 1px;

If you use find-in-page to search for text inside of one of these elements, we happily return it as a potential result, though the text is not visible, and the find hole and highlight end up being 1x1 px.

I'm not sure the best way or place to detect this, but making TextIterator's `fullyClipsContents` return true if the contentSize's area is <= 1px certainly fixes it (and makes find on these pages feel much more sensible).
Comment 1 Tim Horton 2019-02-21 23:03:08 PST
Steps to Reproduce:

1. Search for 'apple' or 'iphone' on apple.com
Comment 2 Darin Adler 2019-02-22 09:52:52 PST
The heuristic about very small sizes makes sense to me and seems worth doing rather than a lot of soul searching. I think we could consider a threshold even larger than 1px, based on some assumptions about the smallest size for readable text.
Comment 3 Darin Adler 2019-02-22 09:53:43 PST
What is not so clear to me is which part of this is the key. The small size, the clipping. It would be nice to have the heuristic consider the minimum clue necessary to help ensure it works on the widest number of pages.
Comment 4 Ryosuke Niwa 2019-02-22 13:43:06 PST
Hm.. maybe 2-3px box would be too small? I think anything bigger than that, we run the risk of things being visible when zoomed, or the author intentionally showing really tiny text.
Comment 5 Tim Horton 2019-06-14 11:27:18 PDT
<rdar://problem/51739857>
Comment 6 Tim Horton 2019-06-14 13:00:31 PDT
Created attachment 372136 [details]
Patch
Comment 7 Tim Horton 2019-06-14 13:28:25 PDT
Hilariously, the overlay test failures are legit, because that test puts its log content in a "position: absolute; height: 1px; width: 1px; overflow: hidden;" div (for some reason), and now TextIterator decides not to dump that text when dumpAsText-ing the layout test.

Not really sure we can change global TextIterator behavior that much (Darin? Ryosuke? smfr is not on board anymore)
Comment 8 Tim Horton 2019-06-14 13:29:51 PDT
The other option is to add a TextIteratorOption that find uses, but it seems weiiiird for plainText() and find to include different text.
Comment 9 Darin Adler 2019-06-14 15:27:46 PDT
(In reply to Tim Horton from comment #7)
> Hilariously, the overlay test failures are legit, because that test puts its
> log content in a "position: absolute; height: 1px; width: 1px; overflow:
> hidden;" div (for some reason), and now TextIterator decides not to dump
> that text when dumpAsText-ing the layout test.
> 
> Not really sure we can change global TextIterator behavior that much (Darin?
> Ryosuke? smfr is not on board anymore)

I think this kind of change is OK. I can’t think of any concrete use of TextIterator where iterating this invisible text is a plus. We should change those tests to not depend on this.
Comment 10 Darin Adler 2019-06-14 15:31:47 PDT
(In reply to Tim Horton from comment #8)
> The other option is to add a TextIteratorOption that find uses, but it seems
> weiiiird for plainText() and find to include different text.

We need to go through the different TextIterator use types. There are probably still fewer than 10. I doubt we can find any where it’s a benefit to include this kind of invisible text.

The use of TextIterator in dumpAsText is probably the least "legit" use of it. It's basically designed for use in Copy and Find, and then we found many uses for that in text editing as well.

If there’s someone arguing that such invisible text *should* continue to be made visible when you copy and paste, I suggest that’s what we should discuss, not the abstract "what should a text iterator do".
Comment 11 Ryosuke Niwa 2019-06-14 16:15:01 PDT
TextIterator is also used to implement innerText for now, and that API's behavior shouldn't change.
Comment 12 Tim Horton 2019-06-14 16:20:53 PDT
(In reply to Ryosuke Niwa from comment #11)
> TextIterator is also used to implement innerText for now, and that API's
> behavior shouldn't change.

Hmm, I assumed that set TextIteratorIgnoresStyleVisibility but it does not.
Comment 13 Antti Koivisto 2019-06-14 23:08:21 PDT
Time for another flag!
Comment 14 Darin Adler 2019-06-15 13:02:32 PDT
Seems risky that innerText is possibly important for website compatibility but it’s not precisely specified.
Comment 15 Darin Adler 2019-06-15 18:18:12 PDT
OK, so if innerText does not set TextIteratorIgnoresStyleVisibility:

1) I think it should.
2) I think then that changing to treat 1x1 text as invisible is a good idea anyway.
3) But why would any of these need to be web-visible.

I do not want to hold off on fixing this bug forever just because of innerText!