RESOLVED FIXED 21280
AXTables on page are probably not tables
https://bugs.webkit.org/show_bug.cgi?id=21280
Summary AXTables on page are probably not tables
chris fleizach
Reported 2008-10-01 12:34:03 PDT
Request is to improve table detection. There are two non-table tables on http://www.msnbc.msn.com/id/26760673/#storyContinued
Attachments
Patch to make AXTable detection more robust (18.82 KB, patch)
2008-10-01 12:55 PDT, chris fleizach
darin: review+
chris fleizach
Comment 1 2008-10-01 12:55:41 PDT
Created attachment 23986 [details] Patch to make AXTable detection more robust
Darin Adler
Comment 2 2008-10-01 13:38:39 PDT
Comment on attachment 23986 [details] Patch to make AXTable detection more robust I think I would use the word "heuristic" somewhere here to describe what we're doing. To me it seems overkill to look at all the cells in a table. Once you've found borders or colors on 10 different cells, I'm not sure what the value is of examining 100 others to determine that it's "more than half of the cells" that have borders or colors. I think "two rows" was probably too limited, but "every single cell" is probably too open ended. In particular we want good performance on absurdly huge tables, and querying every cell of such tables is obviously not helpful. + // considered a bordered cell. Try the cell borders and the style's borders I don't see code corresponding to the style's borders. The check for background differences isn't quite right -- it doesn't account for transparent colors. + unsigned neededCellCount = (validCellCount >> 1); It's bad style to do ">> 1" when you really mean "/ 2" and doesn't make the code any faster. We also don't usually put parentheses around this operation. I'm going to say r=me, but I think it's worth pondering further whether it's sensible to iterate the entire table even for pathologically huge tables.
chris fleizach
Comment 3 2008-10-01 14:32:30 PDT
that's reasonable. i can put in a limit to how many cells are checked at max. Let's say 10 cells
chris fleizach
Comment 4 2008-10-02 17:01:05 PDT
Note You need to log in before you can comment on or make changes to this bug.