Bug 104506

Summary: [EFL][WK2] WebGL test cases are sometimes crashing
Product: WebKit Reporter: Chris Dumez <cdumez>
Component: WebKit EFLAssignee: Viatcheslav Ostapenko <ostap73>
Status: RESOLVED FIXED    
Severity: Normal CC: kalyan.kondapally, kenneth, lucas.de.marchi, mikhail.pozdnyakov, noam, ostap73, webkit.review.bot, zeno
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 106424, 106582, 107178, 107485    
Bug Blocks: 104459    
Attachments:
Description Flags
Patch for landing
none
Patch for landing
none
backtrace
none
WIP: Delay canvas deletion until renderNextFrame and make sure that destroyCanvas message is sent to UI process.
ostap73: review-
crashlog none

Description Chris Dumez 2012-12-09 22:04:17 PST
The WebGL tests often crash on the WK2 EFL Debug build bot. At least the following test cases have crashed:
  fast/canvas/webgl/functions-returning-strings.html
  fast/canvas/webgl/tex-image-and-sub-image-2d-with-video-rgba4444.html
  fast/canvas/webgl/array-constructor.html
  fast/canvas/webgl/worker-data-view-test.html
  fast/canvas/webgl/webgl-composite-modes-repaint.html
  fast/canvas/webgl/tex-image-and-sub-image-2d-with-image-rgb565.html
  fast/canvas/webgl/tex-image-and-sub-image-2d-with-canvas-rgba5551.html
  fast/canvas/webgl/texture-color-profile.html

It is not very hard for me to reproduce locally either.

The backtrace is not useful:
crash log for WebKitTestRunner (pid 469):
STDOUT: <empty>
STDERR: 1   0x7fb3e0f14fdb
STDERR: 2   0x7fb3d8238cb0
STDERR: LEAK: 1 WebPageProxy
STDERR: LEAK: 1 WebContext
STDERR: LEAK: 1 WebPage
STDERR: LEAK: 1 WebFrame
STDERR: LEAK: 3 RenderObject
STDERR: LEAK: 1 Page
STDERR: LEAK: 1 Frame
STDERR: LEAK: 244 CachedResource
STDERR: LEAK: 4 WebCoreNode
Comment 1 Viatcheslav Ostapenko 2013-01-08 12:27:41 PST
Created attachment 181728 [details]
Patch for landing
Comment 2 WebKit Review Bot 2013-01-08 12:35:32 PST
Comment on attachment 181728 [details]
Patch for landing

Rejecting attachment 181728 [details] from commit-queue.

Failed to run "[u'/mnt/git/webkit-commit-queue/Tools/Scripts/webkit-patch', u'--status-host=queues.webkit.org', ..." exit_code: 1 cwd: /mnt/git/webkit-commit-queue

/mnt/git/webkit-commit-queue/LayoutTests/ChangeLog neither lists a valid reviewer nor contains the string "Unreviewed" or "Rubber stamp" (case insensitive).

Full output: http://queues.webkit.org/results/15762188
Comment 3 Viatcheslav Ostapenko 2013-01-08 12:41:44 PST
Created attachment 181733 [details]
Patch for landing
Comment 4 WebKit Review Bot 2013-01-08 16:48:34 PST
Comment on attachment 181733 [details]
Patch for landing

Clearing flags on attachment: 181733

Committed r139134: <http://trac.webkit.org/changeset/139134>
Comment 5 WebKit Review Bot 2013-01-08 16:48:38 PST
All reviewed patches have been landed.  Closing bug.
Comment 6 Mikhail Pozdnyakov 2013-01-09 00:48:35 PST
The WebGL tests still crash:
http://build.webkit.org/results/EFL%20Linux%2064-bit%20Debug%20WK2/r139169%20%287814%29/results.html

Shouldn't we skip them again?
Comment 7 WebKit Review Bot 2013-01-09 01:14:02 PST
Re-opened since this is blocked by bug 106424
Comment 8 Chris Dumez 2013-01-09 01:19:23 PST
Comment on attachment 181728 [details]
Patch for landing

View in context: https://bugs.webkit.org/attachment.cgi?id=181728&action=review

> LayoutTests/ChangeLog:8
> +        Enable webgl tests that are passing after recent changes.

BTW, for future reference, it would be nice to indicate actual revision(s) if possible. "recent changes" is really vague.
Comment 9 Kalyan 2013-01-09 21:01:26 PST
I would propose to keep this bug open till all the crashes are fixed. Let us keep this as a meta bug for fixing any found crashes and open a separate bug for any individual fixes (or enabling the tests).
Comment 10 Kalyan 2013-01-09 22:10:11 PST
(In reply to comment #3)
> Created an attachment (id=181733) [details]
> Patch for landing

There is a separate bug for composite modes. Could we handle any relevant changes in that bug report ??

https://bugs.webkit.org/show_bug.cgi?id=105029
Comment 11 Kalyan 2013-01-10 04:44:37 PST
Created attachment 182113 [details]
backtrace

Memory seems to get corrupted sometimes. Attached the back trace. We seem to end at this call: 
uint32_t GraphicsSurface::platformGetTextureID()
Comment 12 Kalyan 2013-01-10 10:06:11 PST
(In reply to comment #11)
> Created an attachment (id=182113) [details]
> backtrace
 
> uint32_t GraphicsSurface::platformGetTextureID()

Issue:
1)WebProcess creates the Window handle to be shared by UI process.
2)Handle is passed and shared GraphicsSurface is created in UI Process side.
3)In UI process side, we use XCompositeNameWindowPixmap to create a pixmap.XCompositeNameWindowPixmap creates and returns a pixmap id that serves as a reference to the off-screen storage for the Window Handle. At the same time Window in WebProcess is being destroyed. 
4)This results in a crash at the following line in GraphicsSurfaceGLX:
 pGlXBindTexImageEXT(m_private->display(), pixmap, GLX_FRONT_EXT, 0);

The pixmap created by XCompositeNameWindowPixmap should remain allocated even if the window is destroyed. It seems more like the pixmap is created after the window is destroyed. 

Probably setting XErrorHandle and checking for BadMatch errors could solve this. I will give it a try and see.
Comment 13 Viatcheslav Ostapenko 2013-01-10 11:37:09 PST
(In reply to comment #12)
> (In reply to comment #11)
> > Created an attachment (id=182113) [details] [details]
> > backtrace
> 
> > uint32_t GraphicsSurface::platformGetTextureID()
> 
> Issue:
> 1)WebProcess creates the Window handle to be shared by UI process.
> 2)Handle is passed and shared GraphicsSurface is created in UI Process side.
> 3)In UI process side, we use XCompositeNameWindowPixmap to create a pixmap.XCompositeNameWindowPixmap creates and returns a pixmap id that serves as a reference to the off-screen storage for the Window Handle. At the same time Window in WebProcess is being destroyed. 
> 4)This results in a crash at the following line in GraphicsSurfaceGLX:
>  pGlXBindTexImageEXT(m_private->display(), pixmap, GLX_FRONT_EXT, 0);
> 
> The pixmap created by XCompositeNameWindowPixmap should remain allocated even if the window is destroyed. It seems more like the pixmap is created after the window is destroyed. 
> 
> Probably setting XErrorHandle and checking for BadMatch errors could solve this. I will give it a try and see.

I think error handling is useful and should be implemented for the case when webprocess dies and etc, but we also have to make sure that window is not destroyed before UI process is notified about it. Currently window is destroyed too early on the webprocess side and destroyCanvas message is never sent. I'm working on a patch that solves this problem. Let's do both to make it stable.
Comment 14 Kalyan 2013-01-10 15:53:07 PST
(In reply to comment #13)
> (In reply to comment #12)
> > (In reply to comment #11)
> > > Created an attachment (id=182113) [details] [details] [details]
> > > backtrace
 
> I think error handling is useful and should be implemented for the case when 
Instead of trying to avoid Window being destroyed before UI Process is notified,
can we synchronize the destroyCanvas notification to be handled immediately??
For me atleast it doesn't sound right to postpone the notification as both are sharing GL resources.

Delaying Window destruction would need a two way acknowledgement (WebProcess-> UiProcess(marked for deletion) , UIProcess-> WebProcess(Acknowledges handling the request and informs Webprocess to delete the window.) ??
Comment 15 Viatcheslav Ostapenko 2013-01-11 00:02:37 PST
(In reply to comment #14)
> (In reply to comment #13)
> > (In reply to comment #12)
> > > (In reply to comment #11)
> > > > Created an attachment (id=182113) [details] [details] [details] [details]
> > > > backtrace
> 
> > I think error handling is useful and should be implemented for the case when 
> Instead of trying to avoid Window being destroyed before UI Process is notified,
> can we synchronize the destroyCanvas notification to be handled immediately??
> For me atleast it doesn't sound right to postpone the notification as both are sharing GL resources.

Do you mean send sync message? I don't think it is right.

> Delaying Window destruction would need a two way acknowledgement (WebProcess-> UiProcess(marked for deletion) , UIProcess-> WebProcess(Acknowledges handling the request and informs Webprocess to delete the window.) ??

There is renderNextFrame message which is delivered when all messages till layer flush are processed. I'm delaying deallocation until renderNextFrame.
Comment 16 Viatcheslav Ostapenko 2013-01-11 00:09:22 PST
Created attachment 182276 [details]
WIP: Delay canvas deletion until renderNextFrame and make sure that destroyCanvas message is sent to UI process.
Comment 17 Viatcheslav Ostapenko 2013-01-11 00:11:49 PST
(In reply to comment #14)
> (In reply to comment #13)
> > (In reply to comment #12)
> > > (In reply to comment #11)
> > > > Created an attachment (id=182113) [details] [details] [details] [details]
> > > > backtrace
> 
> > I think error handling is useful and should be implemented for the case when 
> Instead of trying to avoid Window being destroyed before UI Process is notified,
> can we synchronize the destroyCanvas notification to be handled immediately??
> For me atleast it doesn't sound right to postpone the notification as both are sharing GL resources.
> 
> Delaying Window destruction would need a two way acknowledgement (WebProcess-> UiProcess(marked for deletion) , UIProcess-> WebProcess(Acknowledges handling the request and informs Webprocess to delete the window.) ??

I've attached work-in-progress patch. Feel free to comment.
I still have 3 tests crashing (1 because of completely different reason) and didn't try it on Qt yet.
Comment 18 Kalyan 2013-01-11 02:59:51 PST
(In reply to comment #15)
> (In reply to comment #14)
> > (In reply to comment #13)
> > > (In reply to comment #12)
> > > > (In reply to comment #11)
> > > > > Created an attachment (id=182113) [details] [details] [details] [details] [details]
> > > > > backtrace
> > 
> > > I think error handling is useful and should be implemented for the case when 
> > Instead of trying to avoid Window being destroyed before UI Process is notified,
> > can we synchronize the destroyCanvas notification to be handled immediately??
> > For me atleast it doesn't sound right to postpone the notification as both are sharing GL resources.
> 
> Do you mean send sync message? I don't think it is right.

I didn't mean the sync message but any mechanism to ensure that deletion is acknowledged as soon as possible instead of handling it in event queue. 
>
Comment 19 Kalyan 2013-01-11 03:13:49 PST
(In reply to comment #17)
> (In reply to comment #14)
> > (In reply to comment #13)
> > > (In reply to comment #12)
> > > > (In reply to comment #11)
> > > > > Created an attachment (id=182113) [details] [details] [details] [details] [details]
> > > > > backtrace
> I still have 3 tests crashing (1 because of completely different reason) and didn't try it on Qt yet.

K, I was kind of expecting the whole sync to happen in CoordinatGraphicslayer and TextureMapperLayer. As we  use XAddSaveSet which can also result in the same kind of errors (i.e BadWindow etc). Could we get rid of the changes in GraphicsSurfaceGLX and also in GraphicsContext3DPrivate?? instead use the combination of changes in co-ordinated layer and TextureMapperLayer(in this patch) and the one in https://bugs.webkit.org/show_bug.cgi?id=106582
I have been able to get rid of all the crashes except one which seems to happen randomly but is not related to the pixmap management.
Note that 106547 is a separate issue and needs to be handled later.

Error handling is useful not only when WebProcess is crashed but to validate the drawables in general. Adding XAddSaveSet on top of it seems unnecessary. All these crashes are a result of sync between the two process and in theory atleast we could be having the same issues with XAddSaveSet. XCompositeNamedWindowPixmap does provide us with the needed backing(if Window is valid at the time of pixmap creation). If Window is not valid at the time of pixmap creation there is no point of creating anything.
Comment 20 Kalyan 2013-01-11 03:29:01 PST
Comment on attachment 182276 [details]
WIP: Delay canvas deletion until renderNextFrame and make sure that destroyCanvas message is sent to UI process.

View in context: https://bugs.webkit.org/attachment.cgi?id=182276&action=review

> Source/WebCore/platform/graphics/GraphicsContext3D.h:1098
> +    RefPtr<GraphicsContext3DPrivate> m_private;

Should this be ifdef for platforms using TexturePlatformLayer??
i.e Qt and Efl

> Source/WebCore/platform/graphics/surfaces/glx/GraphicsSurfaceGLX.cpp:193
> +        XAddToSaveSet(m_display, m_surface);

This seems unnecessary after adding error handling.

> Source/WebCore/platform/graphics/surfaces/glx/GraphicsSurfaceGLX.cpp:210
> +            XRemoveFromSaveSet(m_display, m_surface);

Seems unnecessary after Error Handling.
Comment 21 Kalyan 2013-01-11 03:33:04 PST
(In reply to comment #19)
GraphicsSurfaceGLX and also in GraphicsContext3DPrivate??

ignore GraphicsContext3DPrivate part.
Comment 22 Kalyan 2013-01-11 03:48:17 PST
Comment on attachment 182276 [details]
WIP: Delay canvas deletion until renderNextFrame and make sure that destroyCanvas message is sent to UI process.

View in context: https://bugs.webkit.org/attachment.cgi?id=182276&action=review

> Source/WebKit2/WebProcess/WebPage/CoordinatedGraphics/CoordinatedGraphicsLayer.cpp:342
> +    m_canvasPlatformLayerToDestroy = m_canvasPlatformLayer;

It would be nice to have a comment in GraphicsContext3DPrivate about it's ownership. With proposed changes, context3d would not own it's platform layer but rather have a reference to it. The ownership resides with the CoordinatedGraphicsLayer (Well a reference but the destruction is done here).  What about wk1??
Comment 23 Kalyan 2013-01-11 03:57:05 PST
(In reply to comment #22)
> (From update of attachment 182276 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=182276&action=review

> The ownership resides with the CoordinatedGraphicsLayer (Well a reference but > the destruction is done here).  What about wk1??

Rather, ownership is transferred to CoordinatedGraphicsLayer.
Comment 24 Kalyan 2013-01-11 07:01:48 PST
(In reply to comment #19)
> K, I was kind of expecting the whole sync to happen in 

K, I tried this patch(removed the changes in GraphicsSurfaceGLX) along with the patch in 106582.

Modified type-conversion-test.html a bit to avoid crash for testing these changes.
Ran the complete tests in fast/canvas/Webgl around 15 times:

No occurrence of a crash which brings down the whole application.

still we have issues with ReadPixel tests (i.e WebProcess not responsive), let's handle that separately.
Comment 25 Kalyan 2013-01-11 10:36:40 PST
Created attachment 182366 [details]
crashlog

As per our offline discussion, I tried to run the tests parallel and do get a crash. Attaching the bug report.

I am also trying to figure out why my random crash (with single run) was fixed by this patch.
Comment 26 Kalyan 2013-01-13 06:00:04 PST
(In reply to comment #25)
> Created an attachment (id=182366) [details]
> crashlog
> I am also trying to figure out why my random crash (with single run) was fixed by this patch.

K, I found the place where crash happens(without this patch and only Xerror check).

Crash seems to happen in glDrawArrays called from 
void TextureMapperGL::drawUnitRect(TextureMapperShaderProgram* program, GC3Denum drawingMode)

This seems to be consistent behaviour when a WebGL layout test crashes WebKitTestRunner.

Note: This doesn't seem to be limited to WebGL layout tests but I could reproduce this by loading any normal webpage with MiniBrowser(don't have to load any pages using WebGL).

Need to figure out reason for the crash though.
Comment 27 Viatcheslav Ostapenko 2013-01-13 17:09:47 PST
(In reply to comment #26)
> (In reply to comment #25)
> > Created an attachment (id=182366) [details] [details]
> > crashlog
> > I am also trying to figure out why my random crash (with single run) was fixed by this patch.
> 
> K, I found the place where crash happens(without this patch and only Xerror check).
> 
> Crash seems to happen in glDrawArrays called from 
> void TextureMapperGL::drawUnitRect(TextureMapperShaderProgram* program, GC3Denum drawingMode)
> 
> This seems to be consistent behaviour when a WebGL layout test crashes WebKitTestRunner.
> 
> Note: This doesn't seem to be limited to WebGL layout tests but I could reproduce this by loading any normal webpage with MiniBrowser(don't have to load any pages using WebGL).
> 
> Need to figure out reason for the crash though.

I think it is something relatively new. Now I get it also.
Comment 28 Kalyan 2013-01-14 01:08:17 PST
(In reply to comment #27)
> (In reply to comment #26)
> > (In reply to comment #25)
> > > Created an attachment (id=182366) [details] [details] [details]
>
> I think it is something relatively new. Now I get it also.

k, I think finding the reason for this should probably help us figure out the right solution here. This seems to be the main reason(After the XError checks). In my testing so far I haven't found any other place where the crash happens. It happens in the subsequent update after surface in UI Process side is destroyed.
Comment 29 Kalyan 2013-01-15 12:25:03 PST
(In reply to comment #28)
> (In reply to comment #27)

Sometimes a test crashes with following crash log,  any idea why??

crash log for WebKitTestRunner (pid 12450):
STDOUT: <empty>
STDERR: ASSERTION FAILED: m_ptr
STDERR: /home/kalyan/Downloads/webkit_upstream/WebKit/Source/WTF/wtf/OwnPtr.h(72) : WTF::OwnPtr<T>::ValueType* WTF::OwnPtr<T>::operator->() const [with T = WTR::TestInvocation, WTF::OwnPtr<T>::PtrType = WTR::TestInvocation*, WTF::OwnPtr<T>::ValueType = WTR::TestInvocation]
STDERR: 1   0x80676e4 WTF::OwnPtr<WTR::TestInvocation>::operator->() const
STDERR: 2   0x8063b46 WTR::TestController::run()
STDERR: 3   0x80617eb WTR::TestController::TestController(int, char const**)
STDERR: 4   0x80776bf main
STDERR: 5   0xafa754d3 __libc_start_main
STDERR: 6   0x80603c1
STDERR: LEAK: 1 WebPage
STDERR: LEAK: 1 WebFrame
Comment 30 Viatcheslav Ostapenko 2013-01-15 13:09:38 PST
(In reply to comment #29)
> (In reply to comment #28)
> > (In reply to comment #27)
> 
> Sometimes a test crashes with following crash log,  any idea why??
> 
> crash log for WebKitTestRunner (pid 12450):
> STDOUT: <empty>
> STDERR: ASSERTION FAILED: m_ptr
> STDERR: /home/kalyan/Downloads/webkit_upstream/WebKit/Source/WTF/wtf/OwnPtr.h(72) : WTF::OwnPtr<T>::ValueType* WTF::OwnPtr<T>::operator->() const [with T = WTR::TestInvocation, WTF::OwnPtr<T>::PtrType = WTR::TestInvocation*, WTF::OwnPtr<T>::ValueType = WTR::TestInvocation]
> STDERR: 1   0x80676e4 WTF::OwnPtr<WTR::TestInvocation>::operator->() const
> STDERR: 2   0x8063b46 WTR::TestController::run()
> STDERR: 3   0x80617eb WTR::TestController::TestController(int, char const**)
> STDERR: 4   0x80776bf main
> STDERR: 5   0xafa754d3 __libc_start_main
> STDERR: 6   0x80603c1
> STDERR: LEAK: 1 WebPage
> STDERR: LEAK: 1 WebFrame

I've caught it once under debugger. It asserts because m_currentInvocation becomes 0 after webprocess became unresponsive. May be just 0 check would solve the problem.
Comment 31 Kalyan 2013-01-22 01:27:26 PST
updated the dependent bug ids.