104858 – "Running 1 DumpRenderTree over X shards" is not a helpful output

RESOLVED FIXED 104858

"Running 1 DumpRenderTree over X shards" is not a helpful output

https://bugs.webkit.org/show_bug.cgi?id=104858

Summary "Running 1 DumpRenderTree over X shards" is not a helpful output

Ryosuke Niwa

Reported 2012-12-12 16:09:22 PST

Apparently, "shards" in this context means a list of tests, usually directories. I don't we should be using made up words like "shards" that most of people don't know the exact meaning of.

Attachments
Only mention shards in the debug mode (1.92 KB, patch) 2012-12-14 02:07 PST, Ryosuke Niwa	no flags	Details Formatted Diff Diff
Patch (3.17 KB, patch) 2012-12-14 11:31 PST, Ryosuke Niwa	no flags	Details Formatted Diff Diff
Show Obsolete (1) View All Add attachment proposed patch, testcase, etc.

Dirk Pranke

Comment 1 2012-12-12 16:20:51 PST

As discussed, "shard" is not a made up term. As multiple people have told you, it is a common term used when scheduling a set of tasks to be run across multiple physical resources.

Maciej Stachowiak

Comment 2 2012-12-12 16:46:58 PST

Alternate name suggestions (for the UI level at least): - chunk - part - section - group - group - subgroup - subset Or any of these (or the word "shard") could be prefixed with "test". I'm sure there's many other possibilities. I think shard makes non-server-experts think of separate (possibly physically separate) computing resources, rather than just pieces of data. Thus, for people without good knowledge of server infrastructure, "over 240 shards" would sound like it's running on 240 different physical/logical servers or 240 different databases, not just running 240 semi-arbitrary chunks of the full test suite.

Ryosuke Niwa

Comment 3 2012-12-12 16:56:05 PST

Also, a typical WebKit contributor doesn’t have an experience working on distributed computing systems. e.g. I graduated from UC Berkeley very recently but I had never heard of the word "shard" in any of my classes except in the context of distributed computing until I joined Google where everyone refers to all sorts of things by "sharding". One thing that makes contributing to WebKit really painful is mounds of technical jargons used in our community like "layout tests", "DRT", "NRWT", etc… We should strive to get rid of them as much as possible so that a reasonably intelligent person can understand various things without having to go through many wiki pages.

Dirk Pranke

Comment 4 2012-12-12 17:20:41 PST

jargon is often, but not always, bad. Sometimes inventing new terms to accurately and concisely describe concepts is the right thing to do (I am not necessarily making that claim here, mind you). All of Maciej's suggestions in comment #2 have the downside of being vague and interpreted as one of many possible concepts. In particular, NRWT already uses "part" and "chunk" to refer to different subsets of the test as well. "shard" has the virtue of having fairly specific connotations, even if they are being applied slightly differently here than one might be used to seeing them. As I mentioned over #irc, in the server clustering / cloud community, a shard is *not* necessarily a separate physical resource; shards are virtual concepts that get mapped onto physical resources, typically in an M:N manner (which may of course be 1:1, but the whole point of a shard as opposed to a parallel database is that you can move shards between physical resources to redistribute the load, which is exactly what we're doing here). "batch" or "job" might be better terms that are at least closer in spirit to "shard" than something generic like "section". Unfortunately, we also use --batch-size in NRWT to mean something else. "suite" is also close, but not quiet right (since people might get confused when a suite doesn't refer exactly to a directory).

Ryosuke Niwa

Comment 5 2012-12-12 17:22:35 PST

I’d vote for "test groups".

Maciej Stachowiak

Comment 6 2012-12-13 01:13:57 PST

(In reply to comment #4) > jargon is often, but not always, bad. Sometimes inventing new terms to accurately and concisely describe concepts is the right thing to do (I am not necessarily making that claim here, mind you). > > All of Maciej's suggestions in comment #2 have the downside of being vague and interpreted as one of many possible concepts. In particular, NRWT already uses "part" and "chunk" to refer to different subsets of the test as well. "shard" has the virtue of having fairly specific connotations, even if they are being applied slightly differently here than one might be used to seeing them. > > As I mentioned over #irc, in the server clustering / cloud community, a shard is *not* necessarily a separate physical resource; shards are virtual concepts that get mapped onto physical resources, typically in an M:N manner (which may of course be 1:1, but the whole point of a shard as opposed to a parallel database is that you can move shards between physical resources to redistribute the load, which is exactly what we're doing here). I agree that sometimes jargon can be good for its precision, but you have to consider the possibility that a particular term may confuse more than it enlightens. At least four WebKit contributors said on IRC that they actually found this confusing and misinterpreted the message. In light of this, it seems like a poor defense to say that it's technically correct jargon in the server clustering / cloud community. Probably most WebKit contributors are not part of the server clustering / cloud community, nor is there any reason to even assume server clustering is the context here.. (Also, it's kind of crazy that NRWT has so many distinct ways to group the tests that it's hard to find a free term! I am curious what "part" and "chunk" refer to and why they different from "shard".)

Simon Fraser (smfr)

Comment 7 2012-12-13 08:42:43 PST

If a shard to DRT is almost always a directory, why not say: "running DRT in parallel over X directories" or something similar.

Dirk Pranke

Comment 8 2012-12-13 09:59:12 PST

(In reply to comment #6) > I agree that sometimes jargon can be good for its precision, but you have to consider the possibility that a particular term may confuse more than it enlightens. Of course. That's why I said I wasn't necessarily making that argument. > At least four WebKit contributors said on IRC that they actually found this confusing and misinterpreted the message. > At least as many people on IRC said that they thought this was a correct and accurate term. And it may be that now that the others have had this explained, they are enlightened. I'm pretty sure that's how learning new vocabulary works. > (Also, it's kind of crazy that NRWT has so many distinct ways to group the tests that it's hard to find a free term! I am curious what "part" and "chunk" refer to and why they different from "shard".) From new-run-webkit-tests --help: --run-chunk=RUN_CHUNK Run a specified chunk (n:l), the nth of len l, of the layout tests --run-part=RUN_PART Run a specified part (n:m), the nth of m parts, of the layout tests --batch-size=BATCH_SIZE Run a the tests in batches (n), after every n tests, DumpRenderTree is relaunched. "chunk" and "part" control which tests are run; sharding controls how they are run and is pretty much an internal detail that the user doesn't normally need to be aware of. (In reply to comment #7) > If a shard to DRT is almost always a directory, why not say: "running DRT in parallel over X directories" or something similar. Because it's not always a directory. Even in the default case, we carve all of the http tests into a few shards (1/4 the number of DRTs by default), which means that multiple directories are usually in a single shard here. Finally, I will note that the code refers to "shards" in multiple places. Any change should update the code as well as the text string in the message. I suggest that we either leave the wording alone, or make the sharding a debug message so that devs don't normally see it. I'm fine with either option. Learning a new use of a word is not a bad thing. I am also about to write more documentation about how NRWT works, and will need to refer to this concept as part of that (so it would be good to end this debate sooner rather than later). Perhaps having the documentation will partially address your concerns over confusing users not familiar with this usage. If you are dead set on wanting to change this, of all of the alternatives that have been suggested, "batch" is probably the second-best word to describe things after "shard". I don't think the --batch-size flag is used very often, and we could and should probably rename it to something more descriptive like --restart-drt-after-every-n-tests or something. I do not like any of those other suggestions that we use generic collective words like "part", "chunk", "group", or "section". There are already too many generic collections in NRWT and I have been working hard to get rid of them in order to make things clearer. Using one of these words would be a step backwards.

Ryosuke Niwa

Comment 9 2012-12-13 10:44:13 PST

(In reply to comment #8) > (In reply to comment #6) > > I agree that sometimes jargon can be good for its precision, but you have to consider the possibility that a particular term may confuse more than it enlightens. > > Of course. That's why I said I wasn't necessarily making that argument. > > > At least four WebKit contributors said on IRC that they actually found this confusing and misinterpreted the message. > > At least as many people on IRC said that they thought this was a correct and accurate term. And it may be that now that the others have had this explained, they are enlightened. I'm pretty sure that's how learning new vocabulary works. As far as I could recall, everyone who said they know what sharding mean in this context were all Google employees. > (In reply to comment #7) > Finally, I will note that the code refers to "shards" in multiple places. Any change should update the code as well as the text string in the message. > > I suggest that we either leave the wording alone, or make the sharding a debug message so that devs don't normally see it. I'm fine with either option. Making it a debug message will be fine with me so that ordinary users don't have to get confused by this message. > Learning a new use of a word is not a bad thing. As a user of this development tool, I don't want to learn or need to know anything new unless it's absolutely necessary to use the tool. The number of technical jargons we have, even in just NRWT, is absolutely insane. This whole elitism and egoism towards knowledge in our industry always drives me nuts. Times like this, I really consider stop being a software engineer and go do something else. And this is why I don't think software engineers should ever design UIs or write documentations.

Ryosuke Niwa

Comment 10 2012-12-13 11:13:41 PST

"test queues"?

Maciej Stachowiak

Comment 11 2012-12-13 20:35:19 PST

(In reply to comment #8) > (In reply to comment #6) > > At least four WebKit contributors said on IRC that they actually found this confusing and misinterpreted the message. > > > > At least as many people on IRC said that they thought this was a correct and accurate term. And it may be that now that the others have had this explained, they are enlightened. I'm pretty sure that's how learning new vocabulary works. Come on, let's be a little less smug. This is a UI design problem, not an opportunity to show the class our superior knowledge of jargon terms.

Ryosuke Niwa

Comment 12 2012-12-14 02:07:54 PST

Created attachment 179453 [details] Only mention shards in the debug mode

Dirk Pranke

Comment 13 2012-12-14 09:06:20 PST

Comment on attachment 179453 [details] Only mention shards in the debug mode View in context: https://bugs.webkit.org/attachment.cgi?id=179453&action=review > Tools/Scripts/webkitpy/layout_tests/views/printing.py:119 > + self._print_debug("Running %d %ss in parallel over %d shards (%d locked)." % I would like to keep the "Running 5 DumpRenderTrees" part of the message showing up at the info level. That is a very useful message.

Ryosuke Niwa

Comment 14 2012-12-14 11:31:51 PST

Created attachment 179504 [details] Patch

Dirk Pranke

Comment 15 2012-12-14 11:34:48 PST

Comment on attachment 179504 [details] Patch Thanks.

WebKit Review Bot

Comment 16 2012-12-14 14:14:56 PST

Comment on attachment 179504 [details] Patch Clearing flags on attachment: 179504 Committed r137770: <http://trac.webkit.org/changeset/137770>

WebKit Review Bot

Comment 17 2012-12-14 14:15:01 PST

All reviewed patches have been landed. Closing bug.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware Unspecified

OS Unspecified

Product WebKit

Component Tools / Tests

Assignee

Ryosuke Niwa

Reported

2012-12-12 16:09 PST

Modified

2012-12-14 14:15 PST History

CC List

6 users Show

URL

Keywords

Depends on

Blocks