Bug 143310

Summary: Port bmalloc to Windows
Product: WebKit Reporter: Alex Christensen <achristensen>
Component: Web Template FrameworkAssignee: Basuke Suzuki <Basuke.Suzuki>
Status: NEW ---    
Severity: Normal CC: achristensen, annulen, Basuke.Suzuki, benjamin, bfulgham, cdumez, cmarcelo, commit-queue, daewoong.jang, don.olmstead, ews-watchlist, ggaren, gyuyoung.kim, hyungwook.lee, mitz, pvollan, ryuan.choi, sergio, sungmann.cho, ysuzuki
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 255945, 255968, 256011, 173567, 173622, 174232, 175299, 255943, 255944, 255996, 256010, 256314    
Bug Blocks:    
Attachments:
Description Flags
Patch
ggaren: review-
Patch 2
none
Patch 2
none
Patch 2
none
Patch 3
none
Patch 4
none
Patch 4
none
Patch 4
none
Patch 5
ggaren: review-
Updated patch based on r203260.
none
WIP Patch
none
WIP Patch none

Description Alex Christensen 2015-04-01 11:29:01 PDT
bmalloc should be able to work on Windows, with VirtualAlloc/VirtualFree instead of mmap.  FastMalloc used to do something similar.
Comment 1 Daewoong Jang 2015-09-02 20:29:01 PDT
Created attachment 260481 [details]
Patch
Comment 2 WebKit Commit Bot 2015-09-02 20:30:57 PDT
Attachment 260481 [details] did not pass style-queue:


ERROR: Source/bmalloc/bmalloc/Algorithm.h:55:  bitwise_cast is incorrectly named. Don't use underscores in your identifier names.  [readability/naming/underscores] [4]
Total errors found: 1 in 25 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 3 Daewoong Jang 2015-09-02 20:39:44 PDT
I left the style error in Algorithm.h unchanged because it was caused by the code block I copied from WTF/StdLibExtras.h. If I should have fixed the style error, please let me know.
Comment 4 Geoffrey Garen 2015-09-08 11:07:13 PDT
Comment on attachment 260481 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=260481&action=review

> Source/bmalloc/bmalloc/Algorithm.h:64
> +/*
> +* C++'s idea of a reinterpret_cast lacks sufficient cojones.
> +*/
> +template<typename ToType, typename FromType>
> +inline ToType bitwise_cast(FromType from)
> +{
> +    static_assert(sizeof(FromType) == sizeof(ToType), "bitwise_cast size of FromType and ToType must be equal!");
> +    union {
> +        FromType from;
> +        ToType to;
> +    } u;
> +    u.from = from;
> +    return u.to;
> +}

Is this function as efficient as reinterpret_cast? What does the generated assembly look like?

> Source/bmalloc/bmalloc/Allocator.cpp:81
> +#else
> +        BCRASH();
> +#endif

This doesn't look right. You want the Windows aligned allocation API.

> Source/bmalloc/bmalloc/Allocator.cpp:84
> -    
> +

Please revert this whitespace change.

> Source/bmalloc/bmalloc/Chunk.h:76
>  #if BPLATFORM(IOS)
>      char m_memory[] __attribute__((aligned(16384)));
>      static_assert(vmPageSize == 16384, "vmPageSize and alignment must be same");
> -#else
> +#elif !BOS(WINDOWS)
>      char m_memory[] __attribute__((aligned(4096)));
>      static_assert(vmPageSize == 4096, "vmPageSize and alignment must be same");
> +#else
> +    char* m_memory { mask(reinterpret_cast<char*>(&m_memory + 1) + (vmPageSize - 1), vmPageMask) };
>  #endif

It is bad to clutter the code with #ifdefs. That's a general problem in this patch. We want some better options for abstracting platform differences.
Comment 5 Alex Christensen 2015-09-08 11:17:44 PDT
Please add this to the CMake build, too.  There are several places where I added if (NOT WIN32) with a comment referencing this bug.
Also, I think it would probably be best if we turned bmalloc on for the WinCairo port only at first.
Comment 6 Daewoong Jang 2015-09-08 12:47:48 PDT
Comment on attachment 260481 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=260481&action=review

>> Source/bmalloc/bmalloc/Algorithm.h:64
>> +}
> 
> Is this function as efficient as reinterpret_cast? What does the generated assembly look like?

The generated assembly I got from MSVC:

    return reinterpret_cast<uintptr_t>(ptr);
00402760  mov         eax,ecx  

    return bitwise_cast<uintptr_t>(ptr);
00402760  mov         eax,ecx  

I believe we get same results from both of them. I will check on Mac and post the result later.

>> Source/bmalloc/bmalloc/Allocator.cpp:81
>> +#endif
> 
> This doesn't look right. You want the Windows aligned allocation API.

I know, but the equivalent Windows API _aligned_malloc() has a constraint that the memory block allocated by it must be freed by _aligned_free(). And I believe bmalloc::Allocator doesn't treat aligned memory block as special for now since POSIX APIs don't.
In my opinion, it would take more effort to replace BCRASH() with proper implementation and I would love to do that, but I think it is beyond this patch's work. I would appreciate if you give me a suggestion.

>> Source/bmalloc/bmalloc/Allocator.cpp:84
>> +
> 
> Please revert this whitespace change.

OK. I'll do it.

>> Source/bmalloc/bmalloc/Chunk.h:76
>>  #endif
> 
> It is bad to clutter the code with #ifdefs. That's a general problem in this patch. We want some better options for abstracting platform differences.

I totally agree. Could you give me any suggestions for improving code?
Comment 7 Daewoong Jang 2015-09-08 12:52:10 PDT
(In reply to comment #5)
> Please add this to the CMake build, too.  There are several places where I
> added if (NOT WIN32) with a comment referencing this bug.
> Also, I think it would probably be best if we turned bmalloc on for the
> WinCairo port only at first.

Alright I'll update the patch. And actually I didn't turned bmalloc on neither AppleWin nor WinCairo. Do you think it is okay to turn it on WinCairo port by default?
Comment 8 Brent Fulgham 2015-09-08 16:43:29 PDT
(In reply to comment #5)
> Please add this to the CMake build, too.  There are several places where I
> added if (NOT WIN32) with a comment referencing this bug.
> Also, I think it would probably be best if we turned bmalloc on for the
> WinCairo port only at first.

Really? I think it would be better to turn it on across the board. The Apple Windows port is the only one with any meaningful testing, and the only place we are likely to see if anything is going wrong.

If it starts failing, we can always roll it out.
Comment 9 Alex Christensen 2015-09-08 16:59:01 PDT
(In reply to comment #8)
> (In reply to comment #5)
> > Please add this to the CMake build, too.  There are several places where I
> > added if (NOT WIN32) with a comment referencing this bug.
> > Also, I think it would probably be best if we turned bmalloc on for the
> > WinCairo port only at first.
> 
> Really? I think it would be better to turn it on across the board. The Apple
> Windows port is the only one with any meaningful testing, and the only place
> we are likely to see if anything is going wrong.
> 
> If it starts failing, we can always roll it out.
I think all the ports should use it, but we would have to submit a project to build it internally, too.  If you think that won't cause a problem, then let's just use bmalloc everywhere.
Comment 10 Daewoong Jang 2015-09-10 15:31:00 PDT
Comment on attachment 260481 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=260481&action=review

>>> Source/bmalloc/bmalloc/Algorithm.h:64
>>> +}
>> 
>> Is this function as efficient as reinterpret_cast? What does the generated assembly look like?
> 
> The generated assembly I got from MSVC:
> 
>     return reinterpret_cast<uintptr_t>(ptr);
> 00402760  mov         eax,ecx  
> 
>     return bitwise_cast<uintptr_t>(ptr);
> 00402760  mov         eax,ecx  
> 
> I believe we get same results from both of them. I will check on Mac and post the result later.

From XCode:

	pushq	%rbp
Ltmp0:
	.cfi_def_cfa_offset 16
Ltmp1:
	.cfi_offset %rbp, -16
	movq	%rsp, %rbp

I got the assemply above from code:

    uintptr_t a = reinterpret_cast<uintptr_t>(p);

and

    uintptr_t a = bitwise_cast<uintptr_t>(p);
Comment 11 Daewoong Jang 2015-09-10 16:24:38 PDT
Created attachment 260964 [details]
Patch 2
Comment 12 WebKit Commit Bot 2015-09-10 16:30:23 PDT
Attachment 260964 [details] did not pass style-queue:


ERROR: Source/bmalloc/bmalloc/Threading.h:59:  The parameter name "threadHandle" adds no information, so it should be removed.  [readability/parameter_name] [5]
ERROR: Source/bmalloc/bmalloc/Threading.h:61:  The parameter name "destructor" adds no information, so it should be removed.  [readability/parameter_name] [5]
ERROR: Source/bmalloc/bmalloc/Threading.h:62:  The parameter name "key" adds no information, so it should be removed.  [readability/parameter_name] [5]
ERROR: Source/bmalloc/bmalloc/Threading.h:63:  The parameter name "key" adds no information, so it should be removed.  [readability/parameter_name] [5]
ERROR: Source/bmalloc/bmalloc/Threading.h:63:  The parameter name "object" adds no information, so it should be removed.  [readability/parameter_name] [5]
ERROR: Source/bmalloc/bmalloc/Algorithm.h:55:  bitwise_cast is incorrectly named. Don't use underscores in your identifier names.  [readability/naming/underscores] [4]
ERROR: Source/bmalloc/bmalloc/VMAllocate.h:33:  Alphabetical sorting problem.  [build/include_order] [4]
Total errors found: 7 in 34 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 13 Daewoong Jang 2015-09-10 16:32:33 PDT
Created attachment 260965 [details]
Patch 2
Comment 14 Alex Christensen 2015-09-10 16:35:13 PDT
Have you measured any benchmark improvement with this patch?
Comment 15 Daewoong Jang 2015-09-10 16:37:48 PDT
Created attachment 260968 [details]
Patch 2
Comment 16 Daewoong Jang 2015-09-10 16:40:27 PDT
(In reply to comment #14)
> Have you measured any benchmark improvement with this patch?

For Windows ports? Not yet. Could you give me any leads, to test it myself? I'm also interested about the performance gain bmalloc would give.

And I turned bmalloc on by default for all Windows ports. If you have diffrent idea, please let me know. Thank you.
Comment 17 WebKit Commit Bot 2015-09-10 16:43:32 PDT
Attachment 260968 [details] did not pass style-queue:


ERROR: Source/bmalloc/bmalloc/Algorithm.h:55:  bitwise_cast is incorrectly named. Don't use underscores in your identifier names.  [readability/naming/underscores] [4]
Total errors found: 1 in 34 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 18 Alex Christensen 2015-09-10 17:02:30 PDT
(In reply to comment #16)
> (In reply to comment #14)
> > Have you measured any benchmark improvement with this patch?
> 
> For Windows ports? Not yet. Could you give me any leads, to test it myself?
> I'm also interested about the performance gain bmalloc would give.
http://browserbench.org has a dom and a javascript benchmark.
Comment 19 Daewoong Jang 2015-09-10 18:01:40 PDT
Created attachment 260977 [details]
Patch 3

Fixed compilation error.
Comment 20 WebKit Commit Bot 2015-09-10 18:04:55 PDT
Attachment 260977 [details] did not pass style-queue:


ERROR: Source/bmalloc/bmalloc/Algorithm.h:55:  bitwise_cast is incorrectly named. Don't use underscores in your identifier names.  [readability/naming/underscores] [4]
Total errors found: 1 in 34 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 21 Daewoong Jang 2015-09-10 18:39:23 PDT
Created attachment 260982 [details]
Patch 4

Fix compilation error.
Comment 22 WebKit Commit Bot 2015-09-10 18:41:51 PDT
Attachment 260982 [details] did not pass style-queue:


ERROR: Source/bmalloc/bmalloc/Algorithm.h:55:  bitwise_cast is incorrectly named. Don't use underscores in your identifier names.  [readability/naming/underscores] [4]
Total errors found: 1 in 34 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 23 Daewoong Jang 2015-09-11 01:39:59 PDT
Created attachment 260995 [details]
Patch 4
Comment 24 Daewoong Jang 2015-09-11 01:42:59 PDT
Created attachment 260996 [details]
Patch 4
Comment 25 WebKit Commit Bot 2015-09-11 01:45:11 PDT
Attachment 260996 [details] did not pass style-queue:


ERROR: Source/bmalloc/bmalloc/Algorithm.h:55:  bitwise_cast is incorrectly named. Don't use underscores in your identifier names.  [readability/naming/underscores] [4]
Total errors found: 1 in 35 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 26 Daewoong Jang 2015-09-12 03:44:39 PDT
Created attachment 261053 [details]
Patch 5
Comment 27 WebKit Commit Bot 2015-09-12 03:46:37 PDT
Attachment 261053 [details] did not pass style-queue:


ERROR: Source/bmalloc/bmalloc/Algorithm.h:55:  bitwise_cast is incorrectly named. Don't use underscores in your identifier names.  [readability/naming/underscores] [4]
Total errors found: 1 in 37 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 28 Daewoong Jang 2015-09-12 04:00:29 PDT
(In reply to comment #18)
> (In reply to comment #16)
> > (In reply to comment #14)
> > > Have you measured any benchmark improvement with this patch?
> > 
> > For Windows ports? Not yet. Could you give me any leads, to test it myself?
> > I'm also interested about the performance gain bmalloc would give.
> http://browserbench.org has a dom and a javascript benchmark.

I ran Jet Stream benchmark on Windows MiniBrowser(AppleWin for x86).

PC: i7-2600 @ 3.4GHz, 16GB ram, Windows 7 64bit OS

Tested three times on both MiniBrowsers with/without bmalloc, in turn:

Result
------
MiniBrowser: w/o bmalloc      | w/ bmalloc
-------------------------------------------------
Set 1:       87.676 +- 2.5816 | 94.044 +- 0.33255
Set 2:       85.505 +- 5.8075 | 93.971 +- 1.3106
Set 3:       86.152 +- 3.3151 | 93.873 +- 2.7638

I couldn't complete speedometer test. They refused to continue running after a short while. I believe recent Windows ports have problem.
Comment 29 Geoffrey Garen 2015-09-16 11:25:00 PDT
Comment on attachment 261053 [details]
Patch 5

There's still far too much platform-specific conditional code in the core logic of bmalloc in this patch. We should strive for core logic that works on all platforms, with helper classes or helper functions that call through to platform-specific APIs. bmalloc on Windows is not worth cluttering up the code this much.

As an aside is it really true that Windows does not support partial decommit of a memory region? I've read that it does.

I don't think we should just copy existing WTF code to do the platform abstraction here. Some of that code is old enough that it deals with platform problems that don't exist anymore.
Comment 30 Daewoong Jang 2015-09-16 14:19:38 PDT
(In reply to comment #29)
> Comment on attachment 261053 [details]
> Patch 5
> 
> There's still far too much platform-specific conditional code in the core
> logic of bmalloc in this patch. We should strive for core logic that works
> on all platforms, with helper classes or helper functions that call through
> to platform-specific APIs. bmalloc on Windows is not worth cluttering up the
> code this much.
> 

I would like to know which ifdefs made you unsatisfied. I looked into my patch again and I think I can remove those ifdefs in AsyncTask.h and SegregatedFreeList.cpp with modifying existing code, but the rest of ifdefs looks necessary to me.

I agree that bmalloc on Windows is not worth making code ugly since nobody makes useful product from Windows port.

> As an aside is it really true that Windows does not support partial decommit
> of a memory region? I've read that it does.
> 

OK, I will look into that matter again.

> I don't think we should just copy existing WTF code to do the platform
> abstraction here. Some of that code is old enough that it deals with
> platform problems that don't exist anymore.

I didn't copy code from WTF. I just adopted class, method and function names form WTF and filled them with new code for Windows after I moved existing code into them.

I agree that 'We should strive for core logic that works on all platforms', but I hardly modified meaningful logic in this patch. What I did was redefinition of helper classes and helper functions.
Comment 31 Brent Fulgham 2015-09-16 14:21:03 PDT
(In reply to comment #30)
> 
> I agree that bmalloc on Windows is not worth making code ugly since nobody
> makes useful product from Windows port.

That's not true. Apple ships products using the Windows port. So do a number of other companies.
Comment 32 Daewoong Jang 2015-09-16 15:41:47 PDT
(In reply to comment #31)
> (In reply to comment #30)
> > 
> > I agree that bmalloc on Windows is not worth making code ugly since nobody
> > makes useful product from Windows port.
> 
> That's not true. Apple ships products using the Windows port. So do a number
> of other companies.

Sorry, my selection of words was inappropriate. I was thinking of something like Safari browser. Of course WebKit Windows ports are useful for various matters.
Comment 33 Alex Christensen 2015-11-02 17:29:23 PST
This has been sitting here for over a month.  It would be nice to have this reviewed, Geoff.
Comment 34 Geoffrey Garen 2015-11-02 21:35:31 PST
I think I did review it?

I understand that I've set a high bar here, and that it's going to be challenging to eliminate as much platform-specific code from core logic as possible, and that it might be frustrating too.

I'm not sure how to guide this work more clearly without writing the patch myself. I might be able to find some time to do that.
Comment 35 Daewoong Jang 2016-09-22 22:52:21 PDT
Created attachment 289663 [details]
Updated patch based on r203260.
Comment 36 Don Olmstead 2020-06-29 12:17:00 PDT
Created attachment 403094 [details]
WIP Patch
Comment 37 Don Olmstead 2020-06-29 12:52:07 PDT
Created attachment 403100 [details]
WIP Patch