Summary: | Implement __builtin_clzl for MSVC | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Daewoong Jang <daewoong.jang> | ||||||||||||
Component: | bmalloc | Assignee: | Daewoong Jang <daewoong.jang> | ||||||||||||
Status: | RESOLVED FIXED | ||||||||||||||
Severity: | Normal | CC: | basuke, buildbot, commit-queue, ggaren, webkit-bug-importer, ysuzuki | ||||||||||||
Priority: | P2 | Keywords: | InRadar | ||||||||||||
Version: | WebKit Nightly Build | ||||||||||||||
Hardware: | Unspecified | ||||||||||||||
OS: | Unspecified | ||||||||||||||
Bug Depends on: | |||||||||||||||
Bug Blocks: | 143310 | ||||||||||||||
Attachments: |
|
Description
Daewoong Jang
2017-07-06 19:06:39 PDT
Created attachment 314788 [details]
patch
Comment on attachment 314788 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=314788&action=review > Source/bmalloc/bmalloc/Algorithm.h:137 > - return bitCount<unsigned long>() - 1 - __builtin_clzl(value); > + return value < 2 ? 0 : 1 + log2(value / 2); We should keep the current efficient log2 for non-Windows environment. If this is not constexpr, in Windows, you can use _BitScanReverse64 / _BitScanReverse. But maybe they are not constexpr. (In reply to Yusuke Suzuki from comment #2) > Comment on attachment 314788 [details] > patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=314788&action=review > > > Source/bmalloc/bmalloc/Algorithm.h:137 > > - return bitCount<unsigned long>() - 1 - __builtin_clzl(value); > > + return value < 2 ? 0 : 1 + log2(value / 2); > > We should keep the current efficient log2 for non-Windows environment. > If this is not constexpr, in Windows, you can use _BitScanReverse64 / > _BitScanReverse. But maybe they are not constexpr. _BitScanReverse is not constexpr, so I did it this way. If we have to keep current implementation, then I'll put a compiler guard around the code. Thank you for reviewing! I'm curious: Do you have a working version of bmalloc on Windows? If so, have you measured its performance? It would be helpful to see the full design of the finished work, and to know its performance. Some of the decisions you've made seem to be compromises that harm performance. To avoid muddying up the code, I think it's preferable to emulate missing functions on Windows. So, in this case, you could define __builtin_clzl only on windows, and use a recursive function or a tree of branches to compute it. (In reply to Geoffrey Garen from comment #4) > I'm curious: Do you have a working version of bmalloc on Windows? If so, > have you measured its performance? > > It would be helpful to see the full design of the finished work, and to know > its performance. > Of course I have a working version of bmalloc on Windows in my working tree and you can see its performance gain here: https://bugs.webkit.org/show_bug.cgi?id=143310#c28 It's old but bmalloc on Windows gives definite 10%+ performance gain on JavaScript benchmarks run on Windows. Never mind the cluttered patch you could see on the page above. It won't be cluttered like that in the finished work and won't be upstreamed until you're happy with the change :) > Some of the decisions you've made seem to be compromises that harm > performance. > > To avoid muddying up the code, I think it's preferable to emulate missing > functions on Windows. So, in this case, you could define __builtin_clzl only > on windows, and use a recursive function or a tree of branches to compute it. It could be done in that way for this small case, but how about other POSIX functions like pthread/mman APIs? I am thinking of putting an abstraction layer instead of emulation. Will it be OK for you? Thank you for your comment. Created attachment 315060 [details]
patch
Attachment 315060 [details] did not pass style-queue:
ERROR: Source/bmalloc/bmalloc/Algorithm.h:141: __builtin_clzl is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4]
Total errors found: 1 in 4 files
If any of these errors are false positives, please file a bug against check-webkit-style.
Created attachment 315311 [details]
patch-does-not-hurt-gcc-compilation
Attachment 315311 [details] did not pass style-queue:
ERROR: Source/bmalloc/bmalloc/Algorithm.h:152: Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons. [readability/comparison_to_zero] [5]
Total errors found: 1 in 3 files
If any of these errors are false positives, please file a bug against check-webkit-style.
Created attachment 316578 [details]
patch
Attachment 316578 [details] did not pass style-queue:
ERROR: Source/bmalloc/bmalloc/Algorithm.h:152: Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons. [readability/comparison_to_zero] [5]
Total errors found: 1 in 3 files
If any of these errors are false positives, please file a bug against check-webkit-style.
This patch stays here for too long. Can I get a review? Comment on attachment 316578 [details] patch Attachment 316578 [details] did not pass ios-sim-ews (ios-simulator-wk2): Output: http://webkit-queues.webkit.org/results/4199491 New failing tests: imported/w3c/IndexedDB-private-browsing/idbfactory_open12.html Created attachment 316598 [details]
Archive of layout-test-results from ews122 for ios-simulator-wk2
The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews122 Port: ios-simulator-wk2 Platform: Mac OS X 10.12.5
> It could be done in that way for this small case, but how about other POSIX
> functions like pthread/mman APIs? I am thinking of putting an abstraction
> layer instead of emulation. Will it be OK for you?
I think we'll have to consider each Windows API, and its difference from POSIX, individually. For example, VirtualAllocEx is very similar to mmap+madvise.
Comment on attachment 316578 [details]
patch
r=me
Comment on attachment 316578 [details] patch Clearing flags on attachment: 316578 Committed r220097: <http://trac.webkit.org/changeset/220097> All reviewed patches have been landed. Closing bug. (In reply to Geoffrey Garen from comment #15) > > It could be done in that way for this small case, but how about other POSIX > > functions like pthread/mman APIs? I am thinking of putting an abstraction > > layer instead of emulation. Will it be OK for you? > > I think we'll have to consider each Windows API, and its difference from > POSIX, individually. For example, VirtualAllocEx is very similar to > mmap+madvise. All right. Thank you for review. |