WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
128928
Improve GDB backtrace generation for GTK/EFL
https://bugs.webkit.org/show_bug.cgi?id=128928
Summary
Improve GDB backtrace generation for GTK/EFL
Carlos Alberto Lopez Perez
Reported
2014-02-17 12:17:09 PST
To current GDB backtrace generation (see
https://bugs.webkit.org/show_bug.cgi?id=119338
) uses the Linux kernel sysctl core_pattern variable to generate the core dumps with a specific name. echo "/path/to/cores/core-pid_%p-_-process_%e" > /proc/sys/kernel/core_pattern The problem is that the executable filename (variable "%e") used in core_pattern has a limit of 16 characters. See comm variable defined on task_struct at include/linux/sched.h This causes most of the times that cores generated have a filename on disk different than the one expected: Ex: core-pid_25409-_-process_WebKitTestRunne core-pid_25411-_-process_WebKitWebProces core-pid_25461-_-process_testwebinspecto ^ notice the cut This causes that the script that launches gdb when a test crashes is unable to find the core. See for example our current GTK Linux 64-bit Release bot:
http://build.webkit.org/results/GTK%20Linux%2064-bit%20Release/r164108%20%2844874%29/fast/dom/Geolocation/not-enough-arguments-crash-log.txt
In order to fix this the variable "%E" could be used (introduced on
https://git.kernel.org/linus/57cc083ad
), but a more interesting approach is to set core_pattern to pipe to a program (introduced on
https://git.kernel.org/linus/d025c9d
). This approach of piping the core to a program have some extra benefits: This program can have much more information about the crashed process since the /proc/$pid structures are still available when this helper program is executed. So it can extract easily more information about the crashed program. Also the tests can set a WEBKIT_CURRENT_TEST="name_test" environment variable that this program can check (/proc/$pid/environ) in order to detect without doubt which test was the one that crashed. This avoid the problem that we currently have when the crashing process is a child and we don't know its pid. What the current code does is to just pick the most recent core matching the executable filename. This is not deterministic in a context where the tests are ran in parallel.
Attachments
patch
(21.07 KB, patch)
2016-07-14 08:36 PDT
,
Philippe Normand
no flags
Details
Formatted Diff
Diff
patch
(21.03 KB, patch)
2016-07-14 08:37 PDT
,
Philippe Normand
no flags
Details
Formatted Diff
Diff
patch
(17.49 KB, patch)
2016-07-22 00:36 PDT
,
Philippe Normand
no flags
Details
Formatted Diff
Diff
patch
(17.39 KB, patch)
2016-07-22 00:50 PDT
,
Philippe Normand
cgarcia
: review+
cgarcia
: commit-queue-
Details
Formatted Diff
Diff
patch for landing
(18.11 KB, patch)
2016-07-22 03:14 PDT
,
Philippe Normand
no flags
Details
Formatted Diff
Diff
patch for landing
(18.08 KB, patch)
2016-07-22 03:38 PDT
,
Philippe Normand
no flags
Details
Formatted Diff
Diff
Show Obsolete
(5)
View All
Add attachment
proposed patch, testcase, etc.
Philippe Normand
Comment 1
2014-02-17 21:48:05 PST
Nice investigation Carlos! So I suppose this broke since we moved the debug bot to WK2.
Carlos Alberto Lopez Perez
Comment 2
2014-02-18 03:51:37 PST
> So I suppose this broke since we moved the debug bot to WK2.
Yes, this affects mainly WK2, because with WK1 the process that crashes is most of the times "DumpRenderTree" which fits the 15 char limit. I'm going to start working on a patch to implement this.
Philippe Normand
Comment 3
2014-11-10 05:14:01 PST
I plan to upload a non-intrusive patch in
Bug 138564
but this bug should be also fixed in the mid-term :)
Michael Catanzaro
Comment 4
2015-12-22 11:15:21 PST
Nice investigation indeed. The right thing to do is to get cores from coredumpctl so we don't have to mess around with changing the core_pattern. I'd be interested in working on this eventually (for some large value of eventually).
Carlos Alberto Lopez Perez
Comment 5
2015-12-23 05:07:02 PST
(In reply to
comment #4
)
> Nice investigation indeed. > > The right thing to do is to get cores from coredumpctl so we don't have to > mess around with changing the core_pattern. I'd be interested in working on > this eventually (for some large value of eventually).
I guess that would require the users to run systemd. This can be problematic on the bots (they run on unprivileged containers, and systemd don't plays nice there) Also I'm not sure if it will solve the main problem, that is: * Due to WebKit multiprocess model, our tooling for the tests can only know the pid of WebKitTestRunner, but not the pid of any of the childs (WebKitWebProcess, WebKitNetworkProcess, etc). So, when you get a coredump for WebKitWebProcess (for example) with pid NNNN you can't tell which test caused the crash. The only reasonable way of solving this that I see is setting an environment variable (that the childs will inherit) with the test name. When it crashes, the program invoked via the pipe in core_pattern will have access to /proc/$pid/environ and can read this. With coredumpctl, such program will be systemd-coredump. So... the main question.... is coredumpctl able to get and print the environ variables that a program had defined when it crashed for a given coredump? If is not, I don't think it will be of any help here.
Michael Catanzaro
Comment 6
2015-12-23 06:08:20 PST
coredumpctl is not useful for bots -- anything that makes coredumps display in the web interface would be a great solution there -- but for humans running tests manually, we should try to figure out a solution that doesn't require editing core_pattern to disable it. (In reply to
comment #5
)
> I guess that would require the users to run systemd. This can be problematic > on the bots (they run on unprivileged containers, and systemd don't plays > nice there)
Huh, I didn't realize that. I thought all Linux except Gentoo was using systemd nowadays (certainly it's installed on the bots I used yesterday), and that it was designed to handle unprivileged containers nicely -- that's the whole point of machinectl, right? but I've hardly used containers myself.
> So... the main question.... is coredumpctl able to get and print the > environ variables that a program had defined when it crashed for a given > coredump? If is not, I don't think it will be of any help here.
Hm, I don't think so. You can match on an awful lot, anything listed in systemd.journal-fields(7), but not environment variables. It'd be easy to add though, since it has access to /proc/$pid/environ as you say, we'd just need to make it save the environment as a journal field. This is how the abrt coredumpctl integration was written -- abrt needed more info on the coredumps, so they modified coredumpctl to save it in journal fields. But that is a lot more work....
Carlos Alberto Lopez Perez
Comment 7
2015-12-23 06:42:51 PST
(In reply to
comment #6
)
> coredumpctl is not useful for bots -- anything that makes coredumps display > in the web interface would be a great solution there -- but for humans > running tests manually, we should try to figure out a solution that doesn't > require editing core_pattern to disable it. >
Humans running tests can already match a coredump with a test name. Is a matter of running only one test at a time and picking the last coredump. The main use case here is for the bots. The bots runs thousands of tests in parallel, and they need a way to match the coredump with the test name. Otherwise is not possible to show the backtrace/crash log on the results page that the bot published. For example:
https://build.webkit.org/results/GTK%20Linux%2064-bit%20Release%20%28Tests%29/r189464%20%2810899%29/results.html
Notice the empty crash logs on that html page published by the bot? That's what we are trying to fix here.
> (In reply to
comment #5
) > > I guess that would require the users to run systemd. This can be problematic > > on the bots (they run on unprivileged containers, and systemd don't plays > > nice there) > > Huh, I didn't realize that. I thought all Linux except Gentoo was using > systemd nowadays (certainly it's installed on the bots I used yesterday), > and that it was designed to handle unprivileged containers nicely -- that's > the whole point of machinectl, right? but I've hardly used containers myself. >
Well, Ubuntu is still using upstart (the switch to systemd still didn't happened.. I guess that by 16.04 maybe). And with unprivileged containers I mean LXC namespace based containers, and systemd is not running correctly inside there. I think i read that in the latest releases of systemd this was somehow fixed, but I didn't tested it. So I remain cautious about the possibility of using something like coredumpctl inside a LXC namespace based container until I test it and see it working.
> > So... the main question.... is coredumpctl able to get and print the > > environ variables that a program had defined when it crashed for a given > > coredump? If is not, I don't think it will be of any help here. > > Hm, I don't think so. You can match on an awful lot, anything listed in > systemd.journal-fields(7), but not environment variables. It'd be easy to > add though, since it has access to /proc/$pid/environ as you say, we'd just > need to make it save the environment as a journal field. This is how the > abrt coredumpctl integration was written -- abrt needed more info on the > coredumps, so they modified coredumpctl to save it in journal fields. But > that is a lot more work....
If you are interested on adding this to systemd please go ahead.. But take into account that because of the release cycles of the distributions, the time it will take to reach the systemd used by all our developers (debian/ubuntu/fedora/etc) will be measured in years.
Carlos Alberto Lopez Perez
Comment 8
2015-12-23 06:45:50 PST
(In reply to
comment #7
)
> Well, Ubuntu is still using upstart (the switch to systemd still didn't > happened.. I guess that by 16.04 maybe).
Correction: they switched to systemd on 15.04 according to this wiki
https://wiki.ubuntu.com/SystemdForUpstartUsers#System_Init_Daemon
Philippe Normand
Comment 9
2016-07-07 10:51:11 PDT
(In reply to
comment #5
)
> (In reply to
comment #4
) > > Nice investigation indeed. > > > > The right thing to do is to get cores from coredumpctl so we don't have to > > mess around with changing the core_pattern. I'd be interested in working on > > this eventually (for some large value of eventually). > > I guess that would require the users to run systemd. This can be problematic > on the bots (they run on unprivileged containers, and systemd don't plays > nice there) > > > Also I'm not sure if it will solve the main problem, that is: > > * Due to WebKit multiprocess model, our tooling for the tests can only know > the pid of WebKitTestRunner, but not the pid of any of the childs > (WebKitWebProcess, WebKitNetworkProcess, etc).
This can be solved by implementing private C APIs for WKContext and WKPage, like on Mac. I started a patch... Also changing the crash log analisys as advised in the initial description of the bug.
Philippe Normand
Comment 10
2016-07-14 08:36:11 PDT
Created
attachment 283646
[details]
patch
Philippe Normand
Comment 11
2016-07-14 08:37:43 PDT
Created
attachment 283647
[details]
patch
Carlos Garcia Campos
Comment 12
2016-07-16 04:11:23 PDT
Comment on
attachment 283647
[details]
patch View in context:
https://bugs.webkit.org/attachment.cgi?id=283647&action=review
> Source/WebKit2/UIProcess/API/C/gtk/WKContextPrivateGtk.cpp:41 > +pid_t WKContextGetNetworkProcessIdentifier(WKContextRef contextRef) > +{ > + return toImpl(contextRef)->networkProcessIdentifier(); > +} > + > +pid_t WKContextGetDatabaseProcessIdentifier(WKContextRef contextRef) > +{ > + return toImpl(contextRef)->databaseProcessIdentifier();
Why is this specific to GTK+? Why don't we add this to WKContext.cpp and WKContextPrivate.h?
> Source/WebKit2/UIProcess/API/C/gtk/WKPagePrivateGtk.cpp:37 > +pid_t WKPageGetProcessIdentifier(WKPageRef pageRef) > +{ > + return toImpl(pageRef)->processIdentifier(); > +}
Ditto, this could be added to WKPage.cpp and WKPagePrivate.h, since this is exposing cross-platform API
> Tools/Scripts/process-linux-coredump:26 > + full_path = os.path.join(destination_directory, 'core-pid_%d.dump' % int(pid))
Why do you need to convert the pid to int? Why not just use %s instead? Or the script could receive the path already built no?
> Tools/Scripts/webkitpy/port/linux_get_crash_log_unittest.py:49 > core_pattern = os.path.join(core_directory, "core-pid_%p-_-process_%E") > + core_pattern = "|%s %%p /path/to/coredumps" % process_coredump_script_path
Shouldn't you remove the previous core_pattern assignation?
> Tools/WebKitTestRunner/TestController.cpp:1487 > pid_t pid = WKContextGetNetworkProcessIdentifier(m_context.get());
Ah, mac already has an impl of this. In that case it should be moved to the cross-platform files, I think, instead of duplicating it in GTK specific files.
Philippe Normand
Comment 13
2016-07-22 00:04:27 PDT
Comment on
attachment 283647
[details]
patch View in context:
https://bugs.webkit.org/attachment.cgi?id=283647&action=review
>> Tools/Scripts/process-linux-coredump:26 >> + full_path = os.path.join(destination_directory, 'core-pid_%d.dump' % int(pid)) > > Why do you need to convert the pid to int? Why not just use %s instead? Or the script could receive the path already built no?
No strong reason for the conversion. The script cannot receive the full path already built because the script invocation is hardcoded in the core_pattern.
>> Tools/Scripts/webkitpy/port/linux_get_crash_log_unittest.py:49 >> + core_pattern = "|%s %%p /path/to/coredumps" % process_coredump_script_path > > Shouldn't you remove the previous core_pattern assignation?
oops
>> Tools/WebKitTestRunner/TestController.cpp:1487 >> pid_t pid = WKContextGetNetworkProcessIdentifier(m_context.get()); > > Ah, mac already has an impl of this. In that case it should be moved to the cross-platform files, I think, instead of duplicating it in GTK specific files.
Ok I can try this.
Philippe Normand
Comment 14
2016-07-22 00:36:06 PDT
Created
attachment 284313
[details]
patch
Philippe Normand
Comment 15
2016-07-22 00:41:39 PDT
(In reply to
comment #13
)
> Comment on
attachment 283647
[details]
> patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=283647&action=review
> > >> Tools/Scripts/process-linux-coredump:26 > >> + full_path = os.path.join(destination_directory, 'core-pid_%d.dump' % int(pid)) > > > > Why do you need to convert the pid to int? Why not just use %s instead? Or the script could receive the path already built no? > > No strong reason for the conversion. The script cannot receive the full path > already built because the script invocation is hardcoded in the core_pattern. >
Ah sorry this can be improved indeed. I'll update the patch.
Philippe Normand
Comment 16
2016-07-22 00:50:27 PDT
Created
attachment 284314
[details]
patch
Carlos Garcia Campos
Comment 17
2016-07-22 01:14:54 PDT
Comment on
attachment 284314
[details]
patch Patch looks good to me, you just need to fix mac builds (update the header includes), so please submit a new patch and wait for EWS before landing
Philippe Normand
Comment 18
2016-07-22 03:14:16 PDT
Created
attachment 284321
[details]
patch for landing
Philippe Normand
Comment 19
2016-07-22 03:38:27 PDT
Created
attachment 284322
[details]
patch for landing
Philippe Normand
Comment 20
2016-07-24 23:09:02 PDT
Committed
r203674
: <
http://trac.webkit.org/changeset/203674
>
Michael Catanzaro
Comment 21
2016-07-25 07:25:30 PDT
It broke a thing: [692/1454] webkitpy.port.linux_get_crash_log_unittest.GDBCrashLogGeneratorTest.test_generate_crash_log failed: Traceback (most recent call last): File "/home/slave/webkitgtk/gtk-linux-64-release-tests/build/Tools/Scripts/webkitpy/port/linux_get_crash_log_unittest.py", line 67, in test_generate_crash_log self.assertMultiLineEqual(log, mock_empty_crash_log) AssertionError: 'crash log for DumpRenderTree (pid 28529):\n\nCoredump core-pid_28529.dump not f [truncated]... != 'crash log for DumpRenderTree (pid 28529):\n\nCoredump core-pid_28529.dump not f [truncated]... Diff is 822 characters long. Set self.maxDiff to None to see it. Could you investigate please?
Philippe Normand
Comment 22
2016-07-25 09:32:47 PDT
Thanks, follow-up in
r203685
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug