Bug 157973 - PlatformDisplay crash when calling egl_terminate
Summary: PlatformDisplay crash when calling egl_terminate
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKitGTK (show other bugs)
Version: Other
Hardware: PC Linux
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks: 81456
  Show dependency treegraph
 
Reported: 2016-05-21 09:59 PDT by Michael Catanzaro
Modified: 2018-03-05 05:30 PST (History)
5 users (show)

See Also:


Attachments
Patch (2.58 KB, patch)
2016-06-01 04:02 PDT, Carlos Garcia Campos
no flags Details | Formatted Diff | Diff
Updated patch (3.02 KB, patch)
2016-06-01 08:56 PDT, Carlos Garcia Campos
mcatanzaro: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Catanzaro 2016-05-21 09:59:16 PDT
We crash very often in the destructor of PlatformDisplayWayland when it runs in an exit handler. The effect is that the web process corresponding to a closed Epiphany tab crashes quite regularly immediately after closing tab. Normally when issues like this happen I just switch to use of NeverDestroyed, but I'm not sure if that's correct in this case.

Incredibly enough, this is also somehow causing test-ephy-bookmarks to crash when run under Wayland:

$ jhbuild run valgrind ./test-ephy-bookmarks
==16259== Memcheck, a memory error detector
==16259== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==16259== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==16259== Command: ./test-ephy-bookmarks
==16259== 
--16259-- warning: DiCfSI 0xa13c1e0 .. 0xa617b6f is huge; length = 5093776 (libwebkit2gtk-4.0.so.37)
==16259== Warning: set address range perms: large range [0x395d9000, 0x795db000) (noaccess)
/src/bookmarks/ephy-bookmarks/create: OK
/src/bookmarks/ephy-bookmarks/add: OK
/src/bookmarks/ephy-bookmarks/set_address: OK
==16259== Invalid write of size 4
==16259==    at 0x18E2586F: _eglError (eglcurrent.c:240)
==16259==    by 0x18E1F110: eglTerminate (eglapi.c:531)
==16259==    by 0xC2D8344: WebCore::PlatformDisplay::terminateEGLDisplay() (PlatformDisplay.cpp:168)
==16259==    by 0xC2D81C7: WebCore::PlatformDisplay::~PlatformDisplay() (PlatformDisplay.cpp:118)
==16259==    by 0xC1E2B4B: WebCore::PlatformDisplayWayland::~PlatformDisplayWayland() (PlatformDisplayWayland.cpp:108)
==16259==    by 0xC1E2B67: WebCore::PlatformDisplayWayland::~PlatformDisplayWayland() (PlatformDisplayWayland.cpp:118)
==16259==    by 0xC2D8E9B: std::default_delete<WebCore::PlatformDisplay>::operator()(WebCore::PlatformDisplay*) const (unique_ptr.h:76)
==16259==    by 0xC2D8952: std::unique_ptr<WebCore::PlatformDisplay, std::default_delete<WebCore::PlatformDisplay> >::~unique_ptr() (unique_ptr.h:236)
==16259==    by 0x17748947: __run_exit_handlers (exit.c:82)
==16259==    by 0x17748994: exit (exit.c:104)
==16259==    by 0x1772F737: (below main) (libc-start.c:323)
==16259==  Address 0x2909bf00 is 0 bytes inside a block of size 40 free'd
==16259==    at 0x4C2CD5A: free (vg_replace_malloc.c:530)
==16259==    by 0x18E253C1: _eglFiniTSD (eglcurrent.c:80)
==16259==    by 0x18E267CA: _eglAtExit (eglglobals.c:68)
==16259==    by 0x17748947: __run_exit_handlers (exit.c:82)
==16259==    by 0x17748994: exit (exit.c:104)
==16259==    by 0x1772F737: (below main) (libc-start.c:323)
==16259==  Block was alloc'd at
==16259==    at 0x4C2DA60: calloc (vg_replace_malloc.c:711)
==16259==    by 0x18E253F6: _eglCreateThreadInfo (eglcurrent.c:124)
==16259==    by 0x18E253F6: _eglGetCurrentThread.part.1 (eglcurrent.c:171)
==16259==    by 0x18E25915: _eglGetCurrentThread (eglcurrent.c:151)
==16259==    by 0x18E25915: _eglError (eglcurrent.c:235)
==16259==    by 0x18E2163F: eglInitialize (eglapi.c:521)
==16259==    by 0xC2D828E: WebCore::PlatformDisplay::initializeEGLDisplay() (PlatformDisplay.cpp:146)
==16259==    by 0xC1E2A0C: WebCore::PlatformDisplayWayland::PlatformDisplayWayland(wl_display*) (PlatformDisplayWayland.cpp:95)
==16259==    by 0xC1E28AC: WebCore::PlatformDisplayWayland::create() (PlatformDisplayWayland.cpp:67)
==16259==    by 0xC2D8067: WebCore::PlatformDisplay::createPlatformDisplay() (PlatformDisplay.cpp:79)
==16259==    by 0xC2D80D1: WebCore::PlatformDisplay::sharedDisplay()::{lambda()#1}::operator()() const (PlatformDisplay.cpp:101)
==16259==    by 0xC2D85D9: void std::_Bind_simple<WebCore::PlatformDisplay::sharedDisplay()::{lambda()#1} ()>::_M_invoke<>(std::_Index_tuple<>) (functional:1400)
==16259==    by 0xC2D8576: std::_Bind_simple<WebCore::PlatformDisplay::sharedDisplay()::{lambda()#1} ()>::operator()() (functional:1389)
==16259==    by 0xC2D84C0: void std::__once_call_impl<std::_Bind_simple<WebCore::PlatformDisplay::sharedDisplay()::{lambda()#1} ()> >() (mutex:587)
==16259== 
LEAK: 1 WebProcessPool
==16259== 
==16259== HEAP SUMMARY:
==16259==     in use at exit: 1,850,660 bytes in 12,382 blocks
==16259==   total heap usage: 43,709 allocs, 31,327 frees, 7,469,918 bytes allocated
==16259== 
==16259== LEAK SUMMARY:
==16259==    definitely lost: 600 bytes in 3 blocks
==16259==    indirectly lost: 1,089,886 bytes in 3,789 blocks
==16259==      possibly lost: 6,096 bytes in 29 blocks
==16259==    still reachable: 690,526 bytes in 8,026 blocks
==16259==                       of which reachable via heuristic:
==16259==                         length64           : 4,064 bytes in 71 blocks
==16259==                         newarray           : 2,128 bytes in 53 blocks
==16259==         suppressed: 0 bytes in 0 blocks
==16259== Rerun with --leak-check=full to see details of leaked memory
==16259== 
==16259== For counts of detected and suppressed errors, rerun with: -v
==16259== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Comment 1 Michael Catanzaro 2016-05-21 10:04:24 PDT
Most likely we need to add some PlatformDisplay::Destroy function to be called explicitly from... somewhere... that's not an exit handler, and switch to using NeverDestroyed.
Comment 2 Carlos Garcia Campos 2016-05-27 00:50:57 PDT
(In reply to comment #1)
> Most likely we need to add some PlatformDisplay::Destroy function to be
> called explicitly from... somewhere... that's not an exit handler, and
> switch to using NeverDestroyed.

To properly fix this we need to figure out why it's crashing. Workarounds will just hide the actual issue.
Comment 3 Carlos Garcia Campos 2016-05-27 01:02:37 PDT
This is my guess looking at mesa egl code. _eglGlobal registers two at exist callbacks one to finish the display and another one to unload drivers, the one to finish the display happens first. When our destructor is called the _eglFiniDisplay callback has already been called, so we have a valid pointer for an already finished display. Then eglTerminate tries to find the display in the global display list, but fails and for some reason it crashes when trying to return an error. The error doesn't really depend on the display, it uses thread specific data I think. But _eglGetCurrentThread should never return null, so I don't understand why it's crashing.
Comment 4 Zan Dobersek 2016-05-31 10:31:56 PDT
If we're committed to destructing the PlatformDisplay object, then using atexit() manually might work better than a static std::unique_ptr<> object.

The PlatformDisplayWayland constructor would do the EGL initialization, setting up exit-time callbacks in Mesa. We should register the destruction of this object with atexit() afterwards, since the callbacks passed to atexit() are called in reverse order.

This showcases the problem:
https://www.ibm.com/developerworks/community/blogs/5894415f-be62-4bc0-81c5-3956e82276f3/entry/be_careful_when_using_the_atexit_routine_with_static_objects?lang=en
Comment 5 Carlos Garcia Campos 2016-06-01 04:02:45 PDT
Created attachment 280228 [details]
Patch

Could someone try this to confirm it works?
Comment 6 Michael Catanzaro 2016-06-01 05:27:35 PDT
Comment on attachment 280228 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=280228&action=review

I confirm it fixes at least the crash running test-ephy-bookmarks. Awesome, thanks.

> Source/WebCore/platform/graphics/PlatformDisplay.cpp:171
> +    std::atexit([] { PlatformDisplay::sharedDisplay().terminateEGLDisplay(); });

This is harmless, but not really great, since it means terminateEGLDisplay will be redundantly called on the sharedDisplay once for each time initializeEGLDisplay is called on ANY PlatformDisplay object. Perhaps it would it make sense to check if this == PlatformDisplay::sharedDisplay before registering the atexit handler here. Alternatively, and probably better, move this to inside the lambda in PlatformDisplay::sharedDisplay, where it will be called at most once. Or, perhaps the best option is to call terminateEGLDisplay on the current PlatformDisplay (i.e. |this| rather than the sharedDisplay) and remove the call to terminateEGLDisplay from the destructor.
Comment 7 Zan Dobersek 2016-06-01 06:30:07 PDT
Comment on attachment 280228 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=280228&action=review

>> Source/WebCore/platform/graphics/PlatformDisplay.cpp:171
>> +    std::atexit([] { PlatformDisplay::sharedDisplay().terminateEGLDisplay(); });
> 
> This is harmless, but not really great, since it means terminateEGLDisplay will be redundantly called on the sharedDisplay once for each time initializeEGLDisplay is called on ANY PlatformDisplay object. Perhaps it would it make sense to check if this == PlatformDisplay::sharedDisplay before registering the atexit handler here. Alternatively, and probably better, move this to inside the lambda in PlatformDisplay::sharedDisplay, where it will be called at most once. Or, perhaps the best option is to call terminateEGLDisplay on the current PlatformDisplay (i.e. |this| rather than the sharedDisplay) and remove the call to terminateEGLDisplay from the destructor.

Only PlatformDisplay::sharedDisplay() should ever live during the program lifetime, so this method would only be called only once as well. Asserting !m_eglDisplayInitialized at the beginning here would be helpful in that regard.
Comment 8 Michael Catanzaro 2016-06-01 07:40:54 PDT
(In reply to comment #7)
> Only PlatformDisplay::sharedDisplay() should ever live during the program
> lifetime, so this method would only be called only once as well. Asserting
> !m_eglDisplayInitialized at the beginning here would be helpful in that
> regard.

Are you saying that this class is intended to be a singleton? If so, then it should use our normal singleton naming convention (PlatformDisplay::singleton rather than PlatformDisplay::sharedDisplay), and should have nothing in its destructor (as it would never make sense to call egl_terminate there if it's always scheduled by atexit), and terminateEGLDisplay should be called on the this pointer rather than sharedDisplay for readability. If it's not supposed to be a singleton, then the assert you suggested would be inappropriate.
Comment 9 Carlos Garcia Campos 2016-06-01 08:45:30 PDT
The constructor is private (well, protected) so yes, it's indeed a singleton, but we really want to close displays, when needed, and that's why it's not NeverDestroyed. I agree that if we terminate in the atexit, it doesn't make sense to terminate again the destructor, since the atexit should always happen first. We can't capture this on a lambda for a pointer function.
Comment 10 Carlos Garcia Campos 2016-06-01 08:56:11 PDT
Created attachment 280238 [details]
Updated patch
Comment 11 Michael Catanzaro 2016-06-01 09:23:43 PDT
Comment on attachment 280238 [details]
Updated patch

View in context: https://bugs.webkit.org/attachment.cgi?id=280238&action=review

OK.

> Source/WebCore/platform/graphics/PlatformDisplay.cpp:169
> +    std::atexit([] { PlatformDisplay::sharedDisplay().terminateEGLDisplay(); });

Since this is not a static function, I think it make more sense to not use PlatformDisplay::sharedDisplay() here. You say you cannot capture |this| here, but I don't understand why not; did you try writing it this way:

std::atexit([this] { this->terminateEGLDisplay(); });
Comment 12 Zan Dobersek 2016-06-02 00:04:42 PDT
Comment on attachment 280238 [details]
Updated patch

View in context: https://bugs.webkit.org/attachment.cgi?id=280238&action=review

>> Source/WebCore/platform/graphics/PlatformDisplay.cpp:169
>> +    std::atexit([] { PlatformDisplay::sharedDisplay().terminateEGLDisplay(); });
> 
> Since this is not a static function, I think it make more sense to not use PlatformDisplay::sharedDisplay() here. You say you cannot capture |this| here, but I don't understand why not; did you try writing it this way:
> 
> std::atexit([this] { this->terminateEGLDisplay(); });

std::atexit() has to be passed a function pointer to a static function, since it only stores the pointer, not the whole function object.

Only capture-less lambdas can be passed as such arguments, since they have the conversion operator that converts to a function pointer.
http://en.cppreference.com/w/cpp/language/lambda -- see ClosureType::operator ret(*)(params)()

In such cases compiler simply generates a bare struct with the call operator and this conversion operator, since it has no state to manage because nothing is captured. You'd basically be able to call the struct's call operator as if it was any other static non-method function. If anything is captured, then lambda implies an actual object of that helper struct type, generating all the relevant constructors and destructors, plus you have to address the lifetime management (limited to scope or storable to std::function<> for delayed use).
Comment 13 Carlos Garcia Campos 2016-06-02 00:35:16 PDT
Committed r201595: <http://trac.webkit.org/changeset/201595>
Comment 14 Michael Catanzaro 2017-07-24 09:47:16 PDT
Looks like this has somehow returned in bug #174789.