<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://bugs.webkit.org/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4.1"
          urlbase="https://bugs.webkit.org/"
          
          maintainer="admin@webkit.org"
>

    <bug>
          <bug_id>77419</bug_id>
          
          <creation_ts>2012-01-31 03:21:53 -0800</creation_ts>
          <short_desc>run-webkit-tests: This machine could support 16 child processes, but only has enough memory for 15</short_desc>
          <delta_ts>2012-06-19 14:44:48 -0700</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WebKit</product>
          <component>Tools / Tests</component>
          <version>528+ (Nightly build)</version>
          <rep_platform>Unspecified</rep_platform>
          <op_sys>Unspecified</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords>NRWT</keywords>
          <priority>P2</priority>
          <bug_severity>Normal</bug_severity>
          <target_milestone>---</target_milestone>
          <dependson>73847</dependson>
    
    <dependson>74021</dependson>
    
    <dependson>74650</dependson>
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Antti Koivisto">koivisto</reporter>
          <assigned_to name="Nobody">webkit-unassigned</assigned_to>
          <cc>ap</cc>
    
    <cc>aroben</cc>
    
    <cc>dglazkov</cc>
    
    <cc>dpranke</cc>
    
    <cc>eric</cc>
    
    <cc>lforschler</cc>
          

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>545904</commentid>
    <comment_count>0</comment_count>
    <who name="Antti Koivisto">koivisto</who>
    <bug_when>2012-01-31 03:21:53 -0800</bug_when>
    <thetext>wat. This MacPro has 14GB of RAM.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>545908</commentid>
    <comment_count>1</comment_count>
    <who name="Eric Seidel (no email)">eric</who>
    <bug_when>2012-01-31 03:28:37 -0800</bug_when>
    <thetext>It&apos;s a heuristic. :)

http://trac.webkit.org/browser/trunk/Tools/Scripts/webkitpy/layout_tests/port/base.py#L166

It uses the results of vm_stat:
http://trac.webkit.org/browser/trunk/Tools/Scripts/webkitpy/common/system/platforminfo.py#L75

I suspect you have a lot of other things running?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>545910</commentid>
    <comment_count>2</comment_count>
    <who name="Eric Seidel (no email)">eric</who>
    <bug_when>2012-01-31 03:30:45 -0800</bug_when>
    <thetext>I&apos;m very happy to tune the heuristic further.  It can be wrong in both directions.  ORWT didn&apos;t have this problem because it only ran one copy of DRT.  NRWT can run as many as we&apos;d like it to (you can manually control it with --child-processes=N as you like).  Right now it tries to run one-per-core if we have the ram to support them.  The RAM requirement was added because we were breaking the Mac builders which only had 3GB of RAM :) (Which was about all you needed back in the day to link webkit and run one DRT at a time.)</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>545912</commentid>
    <comment_count>3</comment_count>
    <who name="Eric Seidel (no email)">eric</who>
    <bug_when>2012-01-31 03:32:38 -0800</bug_when>
    <thetext>See bug 74021, bug 74650 and bug 73847 for more of the history.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>545963</commentid>
    <comment_count>4</comment_count>
    <who name="Antti Koivisto">koivisto</who>
    <bug_when>2012-01-31 05:17:58 -0800</bug_when>
    <thetext>No, the machine is not under memory pressure.

You are likely to get better results by addressing the problem directly (only allow n threads per GB of physical memory in the machine). Any attempts to base heuristics on free memory figures is almost certain to go wrong.

Explaining why this is the case would require lengthy discussions. It suffices to say that the OS memory subsystem is complex.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>546121</commentid>
    <comment_count>5</comment_count>
    <who name="Alexey Proskuryakov">ap</who>
    <bug_when>2012-01-31 08:58:35 -0800</bug_when>
    <thetext>The heuristic is not based on free memory any more - see bug 74650.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>546200</commentid>
    <comment_count>6</comment_count>
    <who name="Antti Koivisto">koivisto</who>
    <bug_when>2012-01-31 10:21:24 -0800</bug_when>
    <thetext>(In reply to comment #5)
&gt; The heuristic is not based on free memory any more - see bug 74650.

Yes it is, with a slightly altered definition of &quot;free memory&quot;.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>546226</commentid>
    <comment_count>7</comment_count>
    <who name="Eric Seidel (no email)">eric</who>
    <bug_when>2012-01-31 10:53:15 -0800</bug_when>
    <thetext>I&apos;m happy to accept alternate proposals.  Ideally in python form. :)  I&apos;m not at all wedded to the current heuristic.

The very first attempt at this (bug 73847) used physical memory (sysctl -n hw.memsize), that was deemed not good-enough, and changed to free memory (vm_stat &quot;Pages free&quot;) in bug 74021.  That was again decided to be insufficient and changed to use free + inactive (vm_stat &quot;Pages free&quot; + &quot;Pages inactive&quot;) in bug 74650.

Again, totally open to changing the algorithm.  But we&apos;ll need a concrete suggestion.  See bug 73847 for why we moved away from sysctl -n hw.memsize to vm_stat &quot;Pages free&quot;.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>546236</commentid>
    <comment_count>8</comment_count>
    <who name="Eric Seidel (no email)">eric</who>
    <bug_when>2012-01-31 10:59:31 -0800</bug_when>
    <thetext>(In reply to comment #7)
&gt; Again, totally open to changing the algorithm.  But we&apos;ll need a concrete suggestion.  See bug 73847 for why we moved away from sysctl -n hw.memsize to vm_stat &quot;Pages free&quot;.

Sorry, I meant to say bug 74021, but I realize now the discussion was all in private mail about the mac bots.  I&apos;m happy to forward you the (not very exciting) discussion.

I&apos;m open to changing this back to using hw.memsize with a smaller expected-ram-per-DRT value.

Another way would be to not pick a number of DRTs to spawn at the beginning and dynamically control them based on free memory.  That&apos;s a larger change, but perhaps a better system design.  Dirk might be able to comment on how difficult that might be.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>546306</commentid>
    <comment_count>9</comment_count>
    <who name="Lucas Forschler">lforschler</who>
    <bug_when>2012-01-31 11:42:09 -0800</bug_when>
    <thetext>(In reply to comment #8)
&gt; (In reply to comment #7)
&gt; &gt; Again, totally open to changing the algorithm.  But we&apos;ll need a concrete suggestion.  See bug 73847 for why we moved away from sysctl -n hw.memsize to vm_stat &quot;Pages free&quot;.
&gt; 
&gt; Sorry, I meant to say bug 74021, but I realize now the discussion was all in private mail about the mac bots.  I&apos;m happy to forward you the (not very exciting) discussion.
&gt; 
&gt; I&apos;m open to changing this back to using hw.memsize with a smaller expected-ram-per-DRT value.
&gt; 
&gt; Another way would be to not pick a number of DRTs to spawn at the beginning and dynamically control them based on free memory.  That&apos;s a larger change, but perhaps a better system design.  Dirk might be able to comment on how difficult that might be.

We should ensure that all the bots have enough memory to run as many DRT processes as cores, otherwise we are just wasting cpu capacity.  If this means upgrading the memory in the bots, that is what we should do.  Our EWS bots are a mix of 4, 8, and 16 core machines.  What is the memory requirement for DRT?  Obviously a 16 core machine will need more memory than a 4 core machine, but how much more I am unsure.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>546434</commentid>
    <comment_count>10</comment_count>
    <who name="Dirk Pranke">dpranke</who>
    <bug_when>2012-01-31 13:28:09 -0800</bug_when>
    <thetext>(In reply to comment #9)
&gt; We should ensure that all the bots have enough memory to run as many DRT processes as cores, otherwise we are just wasting cpu capacity.  If this means upgrading the memory in the bots, that is what we should do.  Our EWS bots are a mix of 4, 8, and 16 core machines.  What is the memory requirement for DRT?  Obviously a 16 core machine will need more memory than a 4 core machine, but how much more I am unsure.

I agree with Lucas. Memory is (relatively) cheap, and this is the approach we&apos;ve been using since time immemorial on the Chromium bots. I think roughly speaking we tend to have about 768MB of physical memory per DRT instance (i.e., per virtual core). Dunno if 512MB/DRT would be enough, but it sure seems like it should be.

I can pull the stats from all of the Chromium bots if need be.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>652839</commentid>
    <comment_count>11</comment_count>
    <who name="Dirk Pranke">dpranke</who>
    <bug_when>2012-06-19 14:44:48 -0700</bug_when>
    <thetext>This should have been fixed in http://trac.webkit.org/changeset/120738 ; please reopen if you still see issues.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>