RESOLVED FIXED 68598
[chromium] HDC leak in Uniscribe Helper
https://bugs.webkit.org/show_bug.cgi?id=68598
Summary [chromium] HDC leak in Uniscribe Helper
Daniel Cheng
Reported 2011-09-22 00:48:02 PDT
Interesting details are at crbug.com/96021, but in summary, if something interesting happens and we start spinning in the loop, we'll leak DCs. A lot of them.
Attachments
Patch (3.25 KB, patch)
2011-09-22 01:13 PDT, Daniel Cheng
dcheng: review-
Patch (8.23 KB, patch)
2011-10-05 10:53 PDT, Marc-André Decoste
no flags
Patch (9.33 KB, patch)
2011-10-13 08:42 PDT, Marc-André Decoste
no flags
Patch (8.49 KB, patch)
2011-11-01 14:59 PDT, Marc-André Decoste
no flags
Patch (9.76 KB, patch)
2011-11-22 14:06 PST, Marc-André Decoste
no flags
Daniel Cheng
Comment 1 2011-09-22 01:13:38 PDT
Daniel Cheng
Comment 2 2011-09-22 01:15:12 PDT
I'm not actually able to run layout tests on my machine at the moment for some reason (new-run-webkit-tests gets stuck, and it also complains about a lot of unexpected expected result not found). I will consult with the oracles on #webkit tomorrow to see if I can fix it so I can see if my patch horribly breaks things.
Dimitri Glazkov (Google)
Comment 3 2011-09-22 07:36:36 PDT
Brett needs to look at this.
Daniel Cheng
Comment 4 2011-09-22 10:22:24 PDT
Comment on attachment 108291 [details] Patch mad has a version with a test as well, so I'm deferring this bug to him.
Marc-André Decoste
Comment 5 2011-10-05 10:53:00 PDT
Marc-André Decoste
Comment 6 2011-10-05 10:57:20 PDT
Actually, I couldn't get the test to fail using WebKit only. I have HTML that reproduces the problem in Chrome, but not in test_shell... So if we accept this patch without a new test, I'll add a browser test to Chrome once this is rolled into the Chrome DEPS... if you agree with the fix of course... Thanks! BYE MAD
Brett Wilson (Google)
Comment 7 2011-10-12 10:03:49 PDT
The thing with passing a NULL DC the first time is an optimization. ScriptShape normally can have a nice cache of some script parameters and it doesn't need a DC every time. So you can avoid creating one unless it asks. I guess I'm OK leaking a DC for this. If a user encounters complex scripts, the're likely to encounter more, and the overhead of keeping this one around is probably less than the overhead of constantly recreating it. But in that case, we shouldn't be passing a NULL DC to ScriptShape at all. If we have a DC, we should always be passing it, since that will save us another try. And I think we'll always need to create a DC the first time since the cache is empty. So this says to me we should ensure the DC is created up front, and remove the "retry with a DC" logic completely (we still need some retry logic for the "need a bigger buffer" case).
Marc-André Decoste
Comment 8 2011-10-12 10:51:25 PDT
We still create the DC only when it's needed. So for the cases where we don't need it (most cases where special international fonts are not needed), it won't be created at all and thus won't leak. As for the optimization, we still save the SelectObject, but I agree that it's not as expensive as the HDC creation... So do you still want me to change the code to always use the HDC with a selected font in it, instead of trying with a NULL DC beforehand? Thanks! BYE MAD
Brett Wilson (Google)
Comment 9 2011-10-12 15:50:02 PDT
Okay, but I guess if we already have created the DC, we should be passing it into the ScriptShape the first time we call it so we don't have to re-call it again. Probably just initializing tempDC to be the (possibly NULL) cached one should be sufficient.
Marc-André Decoste
Comment 10 2011-10-13 06:04:36 PDT
We would also need to conditionally select the font object when non-NULL, but I guess it's OK... A new patch is coming... Thanks again! BYE MAD
Marc-André Decoste
Comment 11 2011-10-13 08:42:34 PDT
Marc-André Decoste
Comment 12 2011-10-13 08:46:52 PDT
How about this new version of the patch? I don't use a tempDC anymore, and only loop back where we already potentially loop back for other reasons... Removing the annoying for loop of 2 turns... BYE MAD
Daniel Cheng
Comment 13 2011-10-19 12:42:45 PDT
Comment on attachment 110852 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=110852&action=review Some drive-by comments. I'd personally love to see this bug fixed; I've encountered this issue a fair number of times myself, and it bugs me a lot. > Source/WebCore/platform/graphics/chromium/UniscribeHelper.cpp:802 > + return; Indent four spaces. > Source/WebCore/platform/graphics/chromium/UniscribeHelper.h:43 > +class UniscribeTest_TooBig_Test; // A gunit test for UniscribeHelper. Is the test supposed to be in this patch?
Marc-André Decoste
Comment 14 2011-11-01 11:46:58 PDT
Comment on attachment 110852 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=110852&action=review >> Source/WebCore/platform/graphics/chromium/UniscribeHelper.cpp:802 >> + return; > > Indent four spaces. Done... Thanks! >> Source/WebCore/platform/graphics/chromium/UniscribeHelper.h:43 >> +class UniscribeTest_TooBig_Test; // A gunit test for UniscribeHelper. > > Is the test supposed to be in this patch? It was already there, I only changed the comment spacing, based on lint warnings...
Marc-André Decoste
Comment 15 2011-11-01 14:59:44 PDT
Marc-André Decoste
Comment 16 2011-11-01 15:00:41 PDT
Comment on attachment 113229 [details] Patch How about this now?
Philippe Verdy
Comment 17 2011-11-02 12:22:16 PDT
You still perform a test to use ScriptItemizeOpenType() and ScriptShapeOpenType() instead of ScriptItemize() and ScriptShape(); but not to replace ScriptPlace() by ScriptPlaceOpenType(). Mising the legacy Script*() API with the extended Script*OpenType() API is clearly NOT supported by Microsoft and can cause havoc that may be explaining why the last loop over ScriptPlace() hangs by returning a failure infinitely (causing the initial detected issue on massive leakage of DC ressources allocated by GetDC(0) and immediately modified by SelectObject(hFont) making it unique and not reusable for the next GetDC(0) call...) Note that the DC returned by GetDC(0) is shared between all processes on the same desktop, but if any process modifies its state, the share is converted to a new allocation with the modified state. Such internal reallocation of the DC does not occur in Windows when the DC is not shared, owned by a private process creating its own window. The default shared DC is initialized from the window class defined in the process creting its first window event handler. The window class specifies the default font, default background color, default pen color which is used to preallocate a DC owned and shared by the process creating the window class. But GetDC(0) returns the DC of the desktop window, which is created by the session manager when it creates a console (initially it was created by the shell in older versions of Windows, but it's safer now in the session manager as it allows the shell to exit and be restarted, saving other infos like notification icons in the system status bar, and preserving OLE handles and some other resources not created by the shell itself but to communicate between other processes)
Brett Wilson (Google)
Comment 18 2011-11-22 09:34:12 PST
Comment on attachment 113229 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=113229&action=review Looks Good To Me (I'm not a WebKit reviewer) > Source/WebCore/platform/graphics/chromium/UniscribeHelper.h:400 > + // We use this DC if we can't get ScriptShape/ScriptPlace to succeed without This comment isn't quite accurate. The DC is always created for ScriptShape in your patch, and the font isn't selected for ScriptShape. Probably a less specific comment would be fine here.
Tony Chang
Comment 19 2011-11-22 09:44:25 PST
Comment on attachment 113229 [details] Patch This patch is missing a ChangeLog. The ChangeLog should say which test covers this change. Why didn't the test fail without this change (maybe put that in the ChangeLog too)?
Marc-André Decoste
Comment 20 2011-11-22 14:06:18 PST
Philippe Verdy
Comment 21 2011-11-22 20:52:10 PST
The last patfh contains the comment: " For some reason the Script functions on Windows sometimes return E_PENDING even with a non-NULL DC, so we must handle that case." I can explain when this occurs: this does not seem to occur during the itemization or shaping steps, but only in the final placement step, because you have mixed the "OpenType" API in the first two steps with the non-OpenType API in the last step. This is where the API can fail as it has not been prepared consistantly with non-OpenType API for the itemization and shaping steps. Please fix it, because Microsoft clearly says that it DOES NOT support such mixes of API, and it says it *bold* in MSDN documentation. Given that you need the OpenType API for the first step, you MUST use the OpenType API too for the last placement step. I reformulate my request to use ScriptPlaceOpenType() instead of ScriptPlace() if you have already used ScriptItemizeOpenType() and ScriptShapeOpenType() instead of ScriptItemize() and ScriptShape(). Because this is clearly a bug. You should not mix these APIs even if it *may* have worked someday in some configuration or OS version or service pack, or with some fonts. Microsoft clearly has no intention of supporting the mix you still use today. And this is confirmed on my experience with Chrome on Windows XP: the infinite loop leaking lost of HDC's *never* occurs because Windows XP does not have Uniscribe 1.1 but only Uniscribe 1.0 (without the extended "OpenType" API); but it always occurs on Windows Vista, Seven, Server 2003, and even in Windows Eight Beta (that all have Uniscribe 1.1 with OpenType support).
Philippe Verdy
Comment 22 2011-11-22 21:06:23 PST
Also please tag this bug as *critical* (yes it is a critical bug, as it severely impacts the whole system, putting it to the knees, only when rendering normal multilingual pages, notably pages on Wikimedia that very frequently display texts in various scripts, just to link the other languages available. The bug for example affects now the *HOME* page of Meta-Wiki (meta.wikimedia.org), which you cannot visit with Chrome or Safari on Windows Vista/Seven/Server 2003: you need to use XP (for example in a VM) or another browser (Firefox, IE, Opera). The patch using your own DC instead of modifying the state of the desktop DC makes this bug less critical (because it no longer affects other applications and allows to kill or debug Chrome or Safari easily), but the rendering bug is still there and text does not render as it should (now with the patch, the complex scripts no longer render at all or displays square boxes, if there's any E_PENDING occuring using your own DC: this visibly occurs when glyphs can't be placed at the final step, see my comment above about the incorrect mix of OpenType/non-Opentype APIs). But for now, none of your patches are applied in the core distribution, and the bug is still extremely critical for Chrome (this bug has been signaled since 2 years and a half, starting in Chrome 4, and forgotten for too long, it has caused lots of users abandonning Chrome on Windows and returning to IE or Firefox, after too many system crashes that they could not explain). You should put this bug in the list of critical bugs to solve with much more people involved to fix it completely and correctly in a faster way.
WebKit Review Bot
Comment 23 2011-11-23 02:10:44 PST
Comment on attachment 116272 [details] Patch Clearing flags on attachment: 116272 Committed r101059: <http://trac.webkit.org/changeset/101059>
WebKit Review Bot
Comment 24 2011-11-23 02:10:50 PST
All reviewed patches have been landed. Closing bug.
Philippe Verdy
Comment 25 2011-11-23 02:32:53 PST
Please reopen for the incorrect use of ScriptPlace() instead of ScriptPlaceOpenType() which is incoherent and unsupported ! Yes the issue is still there and still causes a failure (even if you detect this failure) causing incorrect or missing rendering.
Daniel Cheng
Comment 26 2011-11-23 02:52:05 PST
(In reply to comment #25) > Please reopen for the incorrect use of ScriptPlace() instead of ScriptPlaceOpenType() which is incoherent and unsupported ! > Yes the issue is still there and still causes a failure (even if you detect this failure) causing incorrect or missing rendering. Webkit doesn't reuse bugs. Since it sounds like you hit this issue fairly often, please file a bug with a repro case attached that triggers faulty glyph rendering--ideally, with some screenshots to compare expected vs actual. I would also link to the MSDN page that discourages mixing and matching OpenType and legacy Uniscribe calls--I'm assuming you're referring to http://msdn.microsoft.com/en-us/library/windows/desktop/dd317792(v=vs.85).aspx
Philippe Verdy
Comment 27 2011-11-23 03:12:47 PST
I don'r reuse bugs. In fact the critical bug occured exactly in the last step, but not on Windows XP (Uniscribe 1.0 only). I.e. with the call of ScriptPlace that caused the infinite loop as it always returned E_PENDING, even if you had selected a font. The E_PENDING bug still occurs today, and has NOT been solved for the long term. And yes, the MSDN article was what I was referencing (I had already cited it several times). The code cannot continue to live this way as it is clearly unsupported by Microsoft. If you only use ScriptPlace instead of ScriptPlaceOpenType, it won't be able to correctly place *all* the glyphs that have been substituted by ScriptShapeOpenType, because ScriptShapeOpenType will have modified the glyphs cache using OpenType features that are completely ignored by ScriptPlace, but only considered coherently by ScriptPlaceOpenType. Anyway you'll need ScriptPlaceOpenType for any coming necessary support of OpenType GPOS tables. For now the code can only support GSUB tables, but some OpenType GPOS are only meaningful in association with the GSUB table for the same feature; using only one will create incoherent glyphs. The case will very likely occur in South Asian scripts (e.g. Repha forms), but as well with advanced Latin typography, and for correct display of cursive scripts (which cannot work without correct positioning, e.g. in Urdu).
Philippe Verdy
Comment 28 2011-11-23 03:18:47 PST
Note that Microsoft still says "should" and "should not"; in the Microsoft terminology, this has always meant that there will be soon incompatibilities and absence of support. This is already the case with the equivalent .Net APIs that ONLY use the OpenType API, and completely deprecate the pre-1.6 version implemented separately in Uniscribe for the Win32 API. And with Windows 8, this is already causing problems: the legacy API has severe limitations that cannot render all the scripts supported now. The legacy API for example cannot render Urdu correctly wit hthe expected Nasthalik style, cannot render Tibetan correctly, cannot render correctly Asian texts in the vertical style, cannot render the Mongolian script...
Daniel Cheng
Comment 29 2011-11-23 21:10:00 PST
Like I said, please file a new bug in WebKit, as the general policy is that each WebKit commit is tied to a unique WebKit bug for that commit. In addition, please attach a test case that demonstrates flawed glyph rendering with screenshots comparing what it should look like to the broken rendering in Windows Chrome. While I agree that we should avoid mixing OpenType and non-OpenType Uniscribe calls, I also can't see any rendering artifacts that result from this--the Urdu wikipedia appears to render without problems on Windows 7.
Philippe Verdy
Comment 30 2011-11-23 22:24:54 PST
At least one documented OpenType required positioning feature is not working properly if not using the OpenType API for glyph placement: Tag: 'blwm' Friendly name: Below-base Mark Positioning Registered by: Microsoft Function: Positions marks below base glyphs. Example: In complex scripts like Gujarati (Indic), the vowel sign U needs to be positioned below base consonant/conjuncts that form the base glyph. This position can vary depending on the base glyph, as well as the presence/absence of other marks below the base glyph. Recommended implementation: The blwm table provides positioning information (x,y) to enable mark positioning (GPOS lookup type 4, 5). Application interface: The application must define the GIDs of the base glyphs below which marks need to be positioned, and the marks themselves. If these are located in the coverage table, the application passes the sequence to the blwm table and gets the positioning values (x,y) or positioning adjustments for the mark in return. UI suggestion: This feature should be on by default. Script/language sensitivity: Required in Indic scripts. Feature interaction: Can be used to position default marks; or those that have been selected from a number of alternates based on contextual requirement using a feature like blws. There are more complex case than this Gujarati vowel sign U, notably in Tibetan, and in the Eastern style of Arabic (used for Persian, Pashto, or Urdu): when cursive glyphs have been substituted to adopt the angular joins, these joins will only align correctly if the substituted glyphs (with the enabled OpenType feature) are later repositioned vertically (within the same enabled OpenType feature) using a GPOS lookup, which is still not enabled by default and will not be enabled by the non-OpenType API. There are other more complex cases involving special ligatures that adjust metrics (notably for the advance width which may need to be adjusted if ever the glyph substitution has occured, and also requiring other surrounding glyphs to be contextually repositioned (to avoid superpositions, or incorrect/missing joining). Arabic Kashidas will also not work correctly when using text justification, breaking mandatory joins. In East-Asian scripts (Han/Kanji/Hanja, Yi, Hangul, Bopomofo, Hiragana, Katakana, ...), the traditional vertical style will not render correctly (see the OpenType features whose name start be 'v'), notably for punctuations signs that are still rotated 90° (despite most characters are not rotated, but possibly repositioned such as the ideographic full stop or comma), however I wonder if Webkit still has support for the alternate vertical layout (or if it even supports the traditional Mongolian script, which is vertical only). Vertical layout style is only enabled if you have full support of the OpenType GPOS feature. The Win32 API may consider that, because you have used the OpenType API for the substitution, it will substitute glyphs and then assume that repositioning will occur in ScriptPlaceOpenType(), but ScriptPlace() alone assumes that some optional substituions have *not* occured, and position glyphs accoring to their default intended placement: ScriptPlace() only assumes that the OpenType features that are *on by default* have been used by ScriptShape(), but it does not use any language-specific rules (for example the Czech-specific variants of diacritics like the acute accent, which have distinct metrics and require repositioning other surrounding diacritics: ScriptPlace() will assume that these diacritics have the default metrics, where ScriptPlaceOpenType() will detect that those metrics need to be adjusted). ScriptPlaceOpenType() also uses different metrics for computing advance widths. It also adjust the positions more precisely for hinting and for other enhanced readability on screen. When you'lll finally draw the glyphs, with GDI+, you'll see the differences, but ScriptPlace() alone does not precomputes the adjustments needed, so characters tend to have unbalanced spacing, and the result layout is poorer When printing, I can already see the effects of incorrect metrics in Chrome, where text layout supposed to be correct on screen is completely garbled when printing, with right margins exceeded or irregular when using text justification, or with lines colliding each other; but may be this is a bug in Chrome, not in WebKit itself, unless it incorrectly uses the printing device context and mixes it with the display context and partly honors the "@media" CSS rules in an inconsistant way. But I'll file a separate bug later for printing (or for videoprojectors which use slightly different measurement units than for screen display, and that must avoid subpixel rendering like ClearType to avoid very visible color artefacts, or for narrow or low-resolution displays) if I can isolate the cases; for now, it is impossible to isolate these numerous bugs correctly due to the other fundamental bugs.
Note You need to log in before you can comment on or make changes to this bug.