Bug 174232 - Implement __builtin_clzl for MSVC
Summary: Implement __builtin_clzl for MSVC
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: bmalloc (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Daewoong Jang
URL:
Keywords: InRadar
Depends on:
Blocks: 143310
  Show dependency treegraph
 
Reported: 2017-07-06 19:06 PDT by Daewoong Jang
Modified: 2017-08-01 16:19 PDT (History)
6 users (show)

See Also:


Attachments
patch (1.23 KB, patch)
2017-07-06 19:08 PDT, Daewoong Jang
no flags Details | Formatted Diff | Diff
patch (2.62 KB, patch)
2017-07-10 18:48 PDT, Daewoong Jang
no flags Details | Formatted Diff | Diff
patch-does-not-hurt-gcc-compilation (2.29 KB, patch)
2017-07-12 17:11 PDT, Daewoong Jang
no flags Details | Formatted Diff | Diff
patch (2.23 KB, patch)
2017-07-27 16:31 PDT, Daewoong Jang
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews122 for ios-simulator-wk2 (966.59 KB, application/zip)
2017-07-27 19:01 PDT, Build Bot
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Daewoong Jang 2017-07-06 19:06:39 PDT
Rewrite bmalloc::log2() in a recursive fashion as MSVC does not have __builtin_clzl nor exact equivalent of it.
This change is needed for supporting bmalloc on Windows(https://bugs.webkit.org/show_bug.cgi?id=143310).
Comment 1 Daewoong Jang 2017-07-06 19:08:36 PDT
Created attachment 314788 [details]
patch
Comment 2 Yusuke Suzuki 2017-07-06 19:13:04 PDT
Comment on attachment 314788 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=314788&action=review

> Source/bmalloc/bmalloc/Algorithm.h:137
> -    return bitCount<unsigned long>() - 1 - __builtin_clzl(value);
> +    return value < 2 ? 0 : 1 + log2(value / 2);

We should keep the current efficient log2 for non-Windows environment.
If this is not constexpr, in Windows, you can use _BitScanReverse64 / _BitScanReverse. But maybe they are not constexpr.
Comment 3 Daewoong Jang 2017-07-06 21:46:36 PDT
(In reply to Yusuke Suzuki from comment #2)
> Comment on attachment 314788 [details]
> patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=314788&action=review
> 
> > Source/bmalloc/bmalloc/Algorithm.h:137
> > -    return bitCount<unsigned long>() - 1 - __builtin_clzl(value);
> > +    return value < 2 ? 0 : 1 + log2(value / 2);
> 
> We should keep the current efficient log2 for non-Windows environment.
> If this is not constexpr, in Windows, you can use _BitScanReverse64 /
> _BitScanReverse. But maybe they are not constexpr.

_BitScanReverse is not constexpr, so I did it this way. If we have to keep current implementation, then I'll put a compiler guard around the code.

Thank you for reviewing!
Comment 4 Geoffrey Garen 2017-07-07 08:22:35 PDT
I'm curious: Do you have a working version of bmalloc on Windows? If so, have you measured its performance?

It would be helpful to see the full design of the finished work, and to know its performance.

Some of the decisions you've made seem to be compromises that harm performance.

To avoid muddying up the code, I think it's preferable to emulate missing functions on Windows. So, in this case, you could define __builtin_clzl only on windows, and use a recursive function or a tree of branches to compute it.
Comment 5 Daewoong Jang 2017-07-08 02:15:44 PDT
(In reply to Geoffrey Garen from comment #4)
> I'm curious: Do you have a working version of bmalloc on Windows? If so,
> have you measured its performance?
> 
> It would be helpful to see the full design of the finished work, and to know
> its performance.
> 

Of course I have a working version of bmalloc on Windows in my working tree and you can see its performance gain here: https://bugs.webkit.org/show_bug.cgi?id=143310#c28
It's old but bmalloc on Windows gives definite 10%+ performance gain on JavaScript benchmarks run on Windows.

Never mind the cluttered patch you could see on the page above. It won't be cluttered like that in the finished work and won't be upstreamed until you're happy with the change :)

> Some of the decisions you've made seem to be compromises that harm
> performance.
> 
> To avoid muddying up the code, I think it's preferable to emulate missing
> functions on Windows. So, in this case, you could define __builtin_clzl only
> on windows, and use a recursive function or a tree of branches to compute it.

It could be done in that way for this small case, but how about other POSIX functions like pthread/mman APIs? I am thinking of putting an abstraction layer instead of emulation. Will it be OK for you?

Thank you for your comment.
Comment 6 Daewoong Jang 2017-07-10 18:48:00 PDT
Created attachment 315060 [details]
patch
Comment 7 Build Bot 2017-07-10 18:49:54 PDT
Attachment 315060 [details] did not pass style-queue:


ERROR: Source/bmalloc/bmalloc/Algorithm.h:141:  __builtin_clzl is incorrectly named. Don't use underscores in your identifier names.  [readability/naming/underscores] [4]
Total errors found: 1 in 4 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 8 Daewoong Jang 2017-07-12 17:11:21 PDT
Created attachment 315311 [details]
patch-does-not-hurt-gcc-compilation
Comment 9 Build Bot 2017-07-12 17:13:32 PDT
Attachment 315311 [details] did not pass style-queue:


ERROR: Source/bmalloc/bmalloc/Algorithm.h:152:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
Total errors found: 1 in 3 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 10 Daewoong Jang 2017-07-27 16:31:19 PDT
Created attachment 316578 [details]
patch
Comment 11 Build Bot 2017-07-27 16:33:06 PDT
Attachment 316578 [details] did not pass style-queue:


ERROR: Source/bmalloc/bmalloc/Algorithm.h:152:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
Total errors found: 1 in 3 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 12 Daewoong Jang 2017-07-27 16:33:54 PDT
This patch stays here for too long. Can I get a review?
Comment 13 Build Bot 2017-07-27 19:01:37 PDT
Comment on attachment 316578 [details]
patch

Attachment 316578 [details] did not pass ios-sim-ews (ios-simulator-wk2):
Output: http://webkit-queues.webkit.org/results/4199491

New failing tests:
imported/w3c/IndexedDB-private-browsing/idbfactory_open12.html
Comment 14 Build Bot 2017-07-27 19:01:39 PDT
Created attachment 316598 [details]
Archive of layout-test-results from ews122 for ios-simulator-wk2

The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews122  Port: ios-simulator-wk2  Platform: Mac OS X 10.12.5
Comment 15 Geoffrey Garen 2017-08-01 08:27:57 PDT
> It could be done in that way for this small case, but how about other POSIX
> functions like pthread/mman APIs? I am thinking of putting an abstraction
> layer instead of emulation. Will it be OK for you?

I think we'll have to consider each Windows API, and its difference from POSIX, individually. For example, VirtualAllocEx is very similar to mmap+madvise.
Comment 16 Geoffrey Garen 2017-08-01 08:28:09 PDT
Comment on attachment 316578 [details]
patch

r=me
Comment 17 WebKit Commit Bot 2017-08-01 08:56:23 PDT
Comment on attachment 316578 [details]
patch

Clearing flags on attachment: 316578

Committed r220097: <http://trac.webkit.org/changeset/220097>
Comment 18 WebKit Commit Bot 2017-08-01 08:56:25 PDT
All reviewed patches have been landed.  Closing bug.
Comment 19 Radar WebKit Bug Importer 2017-08-01 08:56:48 PDT
<rdar://problem/33654780>
Comment 20 Daewoong Jang 2017-08-01 16:19:01 PDT
(In reply to Geoffrey Garen from comment #15)
> > It could be done in that way for this small case, but how about other POSIX
> > functions like pthread/mman APIs? I am thinking of putting an abstraction
> > layer instead of emulation. Will it be OK for you?
> 
> I think we'll have to consider each Windows API, and its difference from
> POSIX, individually. For example, VirtualAllocEx is very similar to
> mmap+madvise.

All right. Thank you for review.