Bug 6218 - CSS1: WebTextRenderer caches and re-uses fallback renderers that are based on family lists
Summary: CSS1: WebTextRenderer caches and re-uses fallback renderers that are based on...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Text (show other bugs)
Version: 420+
Hardware: Mac OS X 10.4
: P2 Normal
Assignee: Nobody
URL:
Keywords: HasReduction, InRadar
: 6529 (view as bug list)
Depends on:
Blocks: 6530
  Show dependency treegraph
 
Reported: 2005-12-23 06:45 PST by mitz
Modified: 2007-01-18 17:25 PST (History)
5 users (show)

See Also:


Attachments
Testcase (402 bytes, text/html)
2005-12-23 06:46 PST, mitz
no flags Details
Obliterate fallback glyph caching (1.77 KB, patch)
2006-02-23 13:35 PST, Beth Dakin
bdakin: review-
Details | Formatted Diff | Diff
Beginning of a HashMap patch (15.52 KB, patch)
2006-03-02 13:41 PST, Beth Dakin
no flags Details | Formatted Diff | Diff
test case (256 bytes, text/html)
2006-05-18 04:16 PDT, Alexey Proskuryakov
no flags Details
[WIP] Implement trees of fallback glyph pages (25.85 KB, patch)
2007-01-13 12:08 PST, mitz
no flags Details | Formatted Diff | Diff
[WIP] Implement trees of fallback glyph pages (30.28 KB, patch)
2007-01-15 11:40 PST, mitz
no flags Details | Formatted Diff | Diff
Implement trees of fallback glyph pages (85.89 KB, patch)
2007-01-15 13:19 PST, mitz
no flags Details | Formatted Diff | Diff
Implement trees of fallback glyph pages (86.10 KB, patch)
2007-01-16 02:49 PST, mitz
no flags Details | Formatted Diff | Diff
Implement trees of fallback glyph pages (86.53 KB, patch)
2007-01-18 06:39 PST, mitz
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description mitz 2005-12-23 06:45:20 PST
Summary: A WebTextRenderer will remember the fallback font it used for a character missing in its font 
and will use that font again for the missing character even if the list of families to search has changed. 
This results in the character coming from a different family from the one specified and even worse, 
different fonts for different characters (depending on which were looked up before and which were not).

To reproduce: (Quit and re-)open Safari and load the test case.

Expected: Both "capital L with stroke" on each line to be the in the same font.

Actual: on the Zapfino line, the first letter is in Lucida Grande, the fallback font that was used for that 
character in the Lucida Grande line.
Comment 1 mitz 2005-12-23 06:46:04 PST
Created attachment 5249 [details]
Testcase
Comment 2 mitz 2006-01-14 06:46:34 PST
*** Bug 6529 has been marked as a duplicate of this bug. ***
Comment 3 Dave Hyatt 2006-01-14 11:59:14 PST
Ugh, this is terrible.  This is basic CSS1 compliance issue. :(
Comment 4 Beth Dakin 2006-01-14 15:22:31 PST
Apple bug: <rdar://problem/4409087>
Comment 5 Beth Dakin 2006-02-23 13:33:48 PST
I have been talking with Darin and Hyatt about the best way to solve this problem, and there are several solutions. Most of them are architectural and require some reorganization of font families, etc. There is one simple solution, though, which is just to entirely get rid of the updateGlyphMapEntry() function in WebTextRenderer.m. This function caches the new glyph and substitute renderer in the WebTextRenderer's characterToGlyphMap; the function makes it so that when a glyph needs to be found in the fallback fonts, that particular glyph will only need to be looked up once for the page, and after that, it will be cached. So there is an obvious performance trade-off if we decide not to cache these glyphs anymore. I have done some performance testing, and my patch does not appear to affect the PLT. However, when I run the PLT on the following html, the performance regression becomes noticeable:

<p style="font-family:'Arabic';">
A A A A A A A A A A A A A A A A A A A A A A A A A A A 
A A A A A A A A A A A A A A A A A A A A A A A A A A A 
A A A A A A A A A A A A A A A A A A A A A A A A A A A
A A A A A A A A A A A A A A A A A A A A A A A A A A A
</p> 

I am not sure what kind of trade-offs we are willing to make. I have attached the patch for now anyway.
Comment 6 Beth Dakin 2006-02-23 13:35:06 PST
Created attachment 6683 [details]
Obliterate fallback glyph caching
Comment 7 Beth Dakin 2006-02-23 16:27:50 PST
I was talking to Mitz about this patch on #webkit, and I asked him for a couple of Hebrew sites with a lot of text to performance test this is a more real-world environment in terms of real sites that could actually see a performance regression from this change. Here is what I got:

http://benyehuda.org/bialik/hazozra.html
Avg before: 4136.9        Avg after: 4641.25

http://www.haayal.co.il/story_2579
Avg before: 4740.0        Avg after: 6390.1

These are averages of 6 runs of the PLT where in each run, the PLT loaded the site 10 times...which, if I understand this tool correctly, means that each time is the average load time for 60 loads. The regression for the second site seems fairly significant...
Comment 8 Beth Dakin 2006-02-23 16:34:48 PST
Comment on attachment 6683 [details]
Obliterate fallback glyph caching

I am review-minusing my own patch. I think these regressions are bad.
Comment 9 Beth Dakin 2006-03-02 13:41:59 PST
Created attachment 6814 [details]
Beginning of a HashMap patch

I am attaching this patch here for now for safe keeping, but after a very long discussion with Hyatt and Maciej, I have decided the shelve working on this bug for now until more of the text code is refactored (which Dave has been planning on doing within the next month or so). Basically, this patch has hit a road block, and it seems like there is no easy way around it right now, but there should be after refactoring. Specifically, the road blocker is that the families array in the WebCoreFont needs to be dynamically allocated to work as a hash key. We simply could not figure out a way to do this that would not be a significant performance hit or involve adding a lot of code to WebCoreFont that already exists for other classes like FontDescription. This comes down to the fundamental issue that maybe the WebCoreFont shouldn't be the key to the hash table at all -- something like FontDescription makes a lot more sense. However, since the hash table needs to be accessed from the WebKit side as well, WebCoreFont is the only real option right now. Dave thinks that after a lot of the refactoring that is going on, the path this bug should take will become more clear. So I am shelving it for now and attaching the patch for future reference and safe keeping.
Comment 10 Dave Hyatt 2006-05-18 01:23:55 PDT
The test case attached to this bug is not valid.  The glyph exists in Verdana.  Need a better test case.


Comment 11 Alexey Proskuryakov 2006-05-18 04:16:30 PDT
Created attachment 8390 [details]
test case

Test case copied form bug 6529.
Comment 12 mitz 2007-01-13 12:08:52 PST
Created attachment 12418 [details]
[WIP] Implement trees of fallback glyph pages

This patch is work in progress. It takes character-to-glyph mapping out of FontData and allows each Font to have its own mapping, using a shared set of glyph page fallback trees.

For each page number there is (at most) one tree. A path from the root to a node in the tree corresponds to a list of FontDatas. The node points to a glyph page (which may be shared with its ancestors) that maps each character to a glyph in the first FontData in the list that has it, or to 0 if none of the FontDatas has a mapping for that character. A special kind of node, that can occur only as a leaf, corresponds to using system fallback fonts after the list has been exhausted. This prevents system fallback from polluting non-leaf nodes.

Nodes and pages are initialized lazily and employ "copy on write" as far as memory usage is concerned. ("Lazily" still means page-at-a-time, not character-at-a-time, except for system fallback, which remains character-at-a-time).

Some very obvious issues:
- Small caps are broken. I don't know if there should be "small caps" pages in "small caps" trees, where the lowercase characters map to the small caps font, or some way to use the existing trees.

- I did not integrate this into the complex code path.

- Nodes and pages are never destroyed. I think it is generally okay to keep them around, but when fonts are enabled or disabled in the system it may be a good idea (or absolutely necessary) to destroy them.
Comment 13 mitz 2007-01-15 11:40:37 PST
Created attachment 12463 [details]
[WIP] Implement trees of fallback glyph pages

No layout test regressions and no apparent pixel test regressions.
Comment 14 mitz 2007-01-15 13:19:11 PST
Created attachment 12466 [details]
Implement trees of fallback glyph pages

Includes change log. Includes changes to other platforms. The diff is somewhat big due to how moved files are represented by svn-create-patch.
Comment 15 Darin Adler 2007-01-15 20:29:36 PST
Comment on attachment 12466 [details]
Implement trees of fallback glyph pages

The general approach here looks great. I'm going to review it in more detail later.

My main concern, and what I discussed with Hyatt a while back when we contemplated this sort of design, is to make sure that in common cases we share as much as possible to keep memory use down. We don't want to trade a lot of additional memory use for a small amount of additional correctness.

It should be quite common to have distinct family lists that in practice have never encountered characters that would be different. I envision a scheme where we would share pages at first, and then copy and split them as we ask for information about characters that must fall back differently than other characters.

I'm also concerned that the code is just as fast as the current code in the common case of no substitution at all. Font::glyphDataForCharacter is a very hot function. I improved performance in the past by making minor tweaks to the logic to reduce the number of branches. It's important to make sure that Font::glyphDataForCharacter is written with a very fast common/cached case.

The copyrights on new files say 2006, but should say 2007.

It seems to me that Font.h does not need to include GlyphPageTreeNode.h. A forward declaration should suffice, since it only refers to pointers to the nodes.
Comment 16 mitz 2007-01-16 02:49:09 PST
Created attachment 12480 [details]
Implement trees of fallback glyph pages

Now with self-loops in the tree. In the previous patch, pages were shared by nodes if the child node did not have any characters mapped that its parent did not. Now the child is merged with the parent and a self-loop is added. This prevents duplication of the (A overlay C) page in the case of fallback lists A->B->C and A->C where B has no mappings that A does not have.
Comment 17 Darin Adler 2007-01-17 09:31:56 PST
How are we going to evaluate the performance impact of this? I'm very excited to get this fix into the tree.

Hyatt, comments?
Comment 18 Dave Hyatt 2007-01-17 16:12:01 PST
I think it's ok to land it and then see if performance regresses.  We should pick a landing time when mitz will be available to respond quickly to feedback though if it does regress perf.
Comment 19 mitz 2007-01-17 22:11:38 PST
Comment on attachment 12480 [details]
Implement trees of fallback glyph pages

Going to make a version that addresses Darin's comments (change the #include to a forward declaration and update copyright notices).
Comment 20 mitz 2007-01-18 04:57:22 PST
Comment on attachment 12480 [details]
Implement trees of fallback glyph pages

This was very buggy.
Comment 21 mitz 2007-01-18 06:39:20 PST
Created attachment 12531 [details]
Implement trees of fallback glyph pages

Abandoned the node-collapsing idea, instead using another technique to share the page in the scenario described in comment #16. Collapsing nodes turned out to be complicated because once you do it a node's level in the tree no longer corresponds with an index into a FontFallbackList, and my solution for this was kind of messy. It also required special-casing "system fallback".

Replaced the #include with forward declarations. There are no new files in this patch, only moved (renamed) files, therefore I did not change the years in copyright lines.
Comment 22 Darin Adler 2007-01-18 07:56:59 PST
Comment on attachment 12531 [details]
Implement trees of fallback glyph pages

This is great!

The patch definitely removes an optimization: needCharTransform combined the smallCaps check and the RTL/mirror check into a single check outside the loop. The change means we now have an additional branch in the loop. But I'm not sure what the impact will be. Since this code is so hot someone will need to profile it again and possibly rearrange things a bit.

+                return data;
+            } else {
+                // Even system fallback can fail.

No need for an else after a return.

+        return floatWidthForSimpleText(run, style, 0, 0);
     else

Same here.

+        node = new GlyphPageTreeNode();

I usually leave out the () in cases like this one.

I agree with Hyatt, we should land this!
Comment 23 Mark Rowe (bdash) 2007-01-18 17:25:54 PST
Landed in r18966.