Bug 210346

Summary: [GTK][WPE] Replace evil strtok() calls with fscanf() in MemoryPressureMonitor.cpp
Product: WebKit Reporter: Pablo Saavedra <psaavedra>
Component: WebKitGTKAssignee: Pablo Saavedra <psaavedra>
Status: RESOLVED FIXED    
Severity: Normal CC: annulen, aperez, bugs-noreply, cgarcia, clopez, darin
Priority: P2    
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Linux   
Bug Depends on: 209942    
Bug Blocks: 210345    
Attachments:
Description Flags
patch
none
patch
none
patch none

Description Pablo Saavedra 2020-04-10 10:29:50 PDT
Related to https://bugs.webkit.org/show_bug.cgi?id=209942
See details in https://bugs.webkit.org/show_bug.cgi?id=209942#c18


> Source/WebKit/UIProcess/linux/MemoryPressureMonitor.cpp:173
>  // 1:name=systemd:/user.slice/user-1000.slice/user@1000.service/gnome-terminal-server.service


These lines look quite suitable to be parsed with fscanf():

  while (!feof(inputFile)) {
      char name[40 + 1];
      char path[PATH_MAX + 1];
      fscanf("%*u:%40[^:]:%" STRINGIFY(PATH_MAX) "[^\n]", name, path);
      if (!strcmp(name, "name=systemd"))
          return CString(path);
  }

  // Not found
  return CString();

With the STRINGIFY() macro being the usual preprocessor trick:

  #define STRINGIFY_EXPANDED(val) #val
  #define STRINGIFY(val) STRINGIFY_EXPANDED(val)

> Source/WebKit/UIProcess/linux/MemoryPressureMonitor.cpp:217
> +    while (char* line = fgets(buffer, 128, memInfoFile)) {


Idea: instead of going line-by-line and using evil strtok(), take advantage of… fscanf() again!

   memoryTotal = memoryFree = activeFile = inactiveFile = slabReclaimable = notSet;
   while (!feof(memInfoFile)) {
       char token[50 + 1];
       size_t amount;
       if (fscanf(memInfoFile, "%50s%zukB", token, &amount) != 2)
           break;

       if (!strcmp(token, "MemTotal:"))
           memoryTotal = amount;
       else if (!strcmp(token, "MemFree:"))
           memoryFree = amount;
       else ...

      if (memoryTotal != notSet && memoryFree != notSet && activeFile != notSet && inactiveFile != notSet && slabReclaimable != notSet)
          break;
   }

Note how this avoids needing to manually split the input in tokens ourselves: instead we let fscanf() do the hard work of parsing the input and doing numeric conversions.

> Source/WebKit/UIProcess/linux/MemoryPressureMonitor.cpp:420
> +    return value;


This whole function can be implemented as:

  size_t CGroupMemoryController::getCgroupFileValue(FILE* file)
  {
      size_t value;
      return (file && fscanf(file, "%zu", &value) == 1) ? value : notSet;
  }
Comment 1 Pablo Saavedra 2020-04-12 03:18:16 PDT
Created attachment 396217 [details]
patch
Comment 2 Pablo Saavedra 2020-04-12 03:21:52 PDT
Comment on attachment 396217 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=396217&action=review

> Source/WebKit/UIProcess/linux/MemoryPressureMonitor.cpp:66
>  static size_t lowWatermarkPages(FILE* zoneInfoFile)

See notes in comment #1 from 210345 https://bugs.webkit.org/show_bug.cgi?id=210345#c1. Probably we want just to remove this code because memAvailable is already in /proc/meminfo since Linux 3.14 LTS and this is already EOL since 2016 so there is no reason to maintain this code.
Comment 3 Adrian Perez 2020-04-13 10:06:56 PDT
Comment on attachment 396217 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=396217&action=review

I think this is going in the right direction (thanks!) but I have
a few questions and comment, please check them below 😄️

>> Source/WebKit/UIProcess/linux/MemoryPressureMonitor.cpp:66
>>  static size_t lowWatermarkPages(FILE* zoneInfoFile)
> 
> See notes in comment #1 from 210345 https://bugs.webkit.org/show_bug.cgi?id=210345#c1. Probably we want just to remove this code because memAvailable is already in /proc/meminfo since Linux 3.14 LTS and this is already EOL since 2016 so there is no reason to maintain this code.

We would like to decide whether we want to keep supporting kernels
older than 3.14 — but I think it's better to play safe and keep the
code for now.

If (or when) we decide to remove it, we can use a new bug for that.

> Source/WebKit/UIProcess/linux/MemoryPressureMonitor.cpp:77
> +        r = fscanf(zoneInfoFile, " Node %*u, zone %128[^\n]\n", buffer);

Well, 128 bytes sounds a bit too much for the zone names, but at the same time
it's not that much… so let's just leave it as is 😉️

Note that you should use “%127[^\n]” because fscanf() will read at most 127
characters and *then* add the '\0' terminator, which is character number 128.
When using fscanf() I find it clearer to use something like:

   char buffer[128 + 1];  // +1 makes it clearer that we need space for \0
   fscanf(f, "%128s", buffer);

(Actually, you did just this below!)

> Source/WebKit/UIProcess/linux/MemoryPressureMonitor.cpp:80
> +        r = fscanf(zoneInfoFile, "%s", buffer);

This is unsafe, please indicate a field width corresponding to the
size of the buffer as part of the format string:

  r = fscanf(zoneInfoFile, "%127s", buffer);
                             ^^^

> Source/WebKit/UIProcess/linux/MemoryPressureMonitor.cpp:81
> +        if (r == 1 && inNormalZone && !strcmp(buffer, "low")) {

How does this work? The “low” keyword is a few tokens *after* the zone name.
Wouldn't you need a nested loop here reading tokens until “low” is seen?

> Source/WebKit/UIProcess/linux/MemoryPressureMonitor.cpp:168
> +        // int scanResult = fscanf(cgroupControllerFile, "%*u:%40[^:]:%4096[^\n]", name, path);

Please remove this commented statement.
Comment 4 Pablo Saavedra 2020-04-13 23:27:51 PDT
(In reply to Adrian Perez from comment #3)
> Comment on attachment 396217 [details]
> patch
> 

> > Source/WebKit/UIProcess/linux/MemoryPressureMonitor.cpp:81
> > +        if (r == 1 && inNormalZone && !strcmp(buffer, "low")) {
> 
> How does this work? The “low” keyword is a few tokens *after* the zone name.
> Wouldn't you need a nested loop here reading tokens until “low” is seen?
> 

The fscanf() reads the input stream file until the argument list is successfully  filled. In the context of the `while (!feof(zoneInfoFile))` loop:

- the first `fscanf(zoneInfoFile, " Node %*u, zone %128[^\n]\n", buffer);` 
  iterates the `Node` sections.
- Then, when we found a Normal node, we start to read each single  
  `fscanf(zoneInfoFile, "%128s", buffer);` until find the `low` token.
- Finally, we read the next token which is the actual `low` value. 

In this second scanning  of the file I read the content one by one because the format of each row is not consistent (2, 3 or 6 values):

Node 0, zone   Normal
  pages free     27303
        min      20500
        low      24089
        high     27678
        spanned  3414016
        present  3414016
        managed  3337293
        protection: (0, 0, 0, 0, 0)
Comment 5 Pablo Saavedra 2020-04-13 23:32:45 PDT
Created attachment 396384 [details]
patch
Comment 6 Adrian Perez 2020-04-17 08:15:26 PDT
(In reply to Pablo Saavedra from comment #4)
> (In reply to Adrian Perez from comment #3)
> > Comment on attachment 396217 [details]
> > patch
> > 
> 
> > > Source/WebKit/UIProcess/linux/MemoryPressureMonitor.cpp:81
> > > +        if (r == 1 && inNormalZone && !strcmp(buffer, "low")) {
> > 
> > How does this work? The “low” keyword is a few tokens *after* the zone name.
> > Wouldn't you need a nested loop here reading tokens until “low” is seen?
> > 
> 
> The fscanf() reads the input stream file until the argument list is
> successfully  filled. In the context of the `while (!feof(zoneInfoFile))`
> loop:
> 
> - the first `fscanf(zoneInfoFile, " Node %*u, zone %128[^\n]\n", buffer);` 
>   iterates the `Node` sections.
> - Then, when we found a Normal node, we start to read each single  
>   `fscanf(zoneInfoFile, "%128s", buffer);` until find the `low` token.
> - Finally, we read the next token which is the actual `low` value. 
> 
> In this second scanning  of the file I read the content one by one because
> the format of each row is not consistent (2, 3 or 6 values):
> 
> Node 0, zone   Normal
>   pages free     27303
>         min      20500
>         low      24089
>         high     27678
>         spanned  3414016
>         present  3414016
>         managed  3337293
>         protection: (0, 0, 0, 0, 0)

Thanks for the explanation, I get it now, but it's definitely not easy
to understand taking a quick look. Knowing that the “low” keyword must
be always contained inside a “Node” section, I was expecting a nested
loop (the outer one only searching for “Node ... Normal” lines, the
inner one picking the value of the “low“ keyword).

I guess this is fine as-is but I would add a comment explaining how it
works at the beginning of the loop.
Comment 7 Adrian Perez 2020-04-17 08:23:29 PDT
Comment on attachment 396384 [details]
patch

Other than the hardcoded buffer sizes, the parsing logic looks
good, and it's nice that using fscanf() in all this code avoids
the need for temporary memory allocations — which is important
when there isn't much left :)

View in context: https://bugs.webkit.org/attachment.cgi?id=396384&action=review

> Source/WebKit/UIProcess/linux/MemoryPressureMonitor.cpp:69
> +    char buffer[128 + 1];

Please use a constant for the buffer size:

  #define ZONEINFO_TOKEN_BUFFER_SIZE 128

and then here:

  char buffer[ZONEINFO_TOKEN_BUFFER_SIZE + 1];

> Source/WebKit/UIProcess/linux/MemoryPressureMonitor.cpp:77
> +        r = fscanf(zoneInfoFile, " Node %*u, zone %128[^\n]\n", buffer);

…then here you can use:

  int r = fscanf(zoneInfoFile, "Node %*u, zone %" STRINGIFY(ZONEINFO_TOKEN_BUFFER_SIZE) "[^\n]\n", buffer);

instead of hardcoding the buffer sizeo in the input format string.

> Source/WebKit/UIProcess/linux/MemoryPressureMonitor.cpp:80
> +        r = fscanf(zoneInfoFile, "%128s", buffer);

Here you could use fscanf(…, "%" STRINGIFY(ZONEINFO_TOKEN_BUFFER_SIZE) "s", buffer)
instead of hardcoding the token buffer length.

> Source/WebKit/UIProcess/linux/MemoryPressureMonitor.cpp:166
> +        char name[40 + 1];

Same here, you could define CGROUP_NAME_BUFFER_SIZE.

> Source/WebKit/UIProcess/linux/MemoryPressureMonitor.cpp:190
> +        char token[50 + 1];

And here too: MEMINFO_TOKEN_BUFFER_SIZE.
Comment 8 Pablo Saavedra 2020-04-18 06:33:32 PDT
Created attachment 396845 [details]
patch
Comment 9 EWS 2020-04-18 10:17:17 PDT
Committed r260318: <https://trac.webkit.org/changeset/260318>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 396845 [details].