Bug 107498 - TRANSFORMATION_MATRIX_USE_X86_64_SSE2 broken for 64-bit Windows builds
Summary: TRANSFORMATION_MATRIX_USE_X86_64_SSE2 broken for 64-bit Windows builds
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Justin Schuh
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-01-21 20:46 PST by Justin Schuh
Modified: 2013-01-22 13:21 PST (History)
3 users (show)

See Also:


Attachments
Patch for discussion. (1.63 KB, patch)
2013-01-21 20:55 PST, Justin Schuh
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Justin Schuh 2013-01-21 20:46:17 PST
Looks like a simple bug in this conditional macro:

72   #if CPU(X86_64) && !PLATFORM(WINDOWS)
73   #define TRANSFORMATION_MATRIX_USE_X86_64_SSE2
74   #endif

The problem is that !PLATFORM(WINDOWS) doesn't do anything. So TRANSFORMATION_MATRIX_USE_X86_64_SSE2 gets defined but the compiler explodes on the alignment specifier. I assume it was meant to be PLATFORM(WIN), but COMPILER(MSVC) is probably more correct.

I could just fix the macro to exclude Windows, but the SSE intrinsics are all supported on Win64. So, is there any reason I can't just add an #if with a __declspec to set the correct alignment and enable this for Win64? (I ask, because at the moment I'm just a few dependencies short of a build I can actually test with on Win64.)
Comment 1 Justin Schuh 2013-01-21 20:55:52 PST
Created attachment 183881 [details]
Patch for discussion.
Comment 2 Benjamin Poulain 2013-01-21 22:00:53 PST
I am sorry. I intended to use OS(WINDOWS).

We cannot enable that optimization as is for Windows because of alignment issues. Neither the allocator, nor the stack alignment gives us the right garanties for alignment.

Stricto sensu, it is not correct on Unix either without a custom new() with an aligned allocation. But in practice we know the ABI and how our allocator works on the platforms we support.
Comment 3 Justin Schuh 2013-01-21 22:20:30 PST
Stack and struct/class member alignment can be guaranteed with __declspec(align), as in the patch I posted. And the 64-bit Windows heap guarantees 16-byte alignment, as does Chromium's 64-bit TCMalloc port.

That said, you're right that it would still be relying on quirks rather than explicit guarantees. So, if you're not comfortable with me enabling it for any Windows 64-bit build, do you have any objections to me doing so for just the 64-bit Windows Chromium port? (That is, assuming @jamesr isn't aware of something preventing this for 64-bit Chromium ports.)
Comment 4 James Robinson 2013-01-21 22:37:23 PST
On Chromium Windows the allocator can be selected at runtime with an environment variable. Last I looked, we supported tcmalloc, jemalloc, and winheap at least. Is this true for the 64 bit build? Do we know the alignment properties of all of the possible allocators?
Comment 5 Benjamin Poulain 2013-01-21 23:02:52 PST
(In reply to comment #3)
> Stack and struct/class member alignment can be guaranteed with __declspec(align), as in the patch I posted.

This is not true on x86, but according to http://msdn.microsoft.com/en-us/library/aa290049(v=vs.71).aspx, you are right for x86_64.

> And the 64-bit Windows heap guarantees 16-byte alignment, as does Chromium's 64-bit TCMalloc port.
> 
> That said, you're right that it would still be relying on quirks rather than explicit guarantees. So, if you're not comfortable with me enabling it for any Windows 64-bit build, do you have any objections to me doing so for just the 64-bit Windows Chromium port? (That is, assuming @jamesr isn't aware of something preventing this for 64-bit Chromium ports.)

I have no objection against enabling this on Windows. I just don't know enough about Windows x86_64 to r+.

If you enable this for Windows, can you introduce a new macro to do __attribute__((aligned (16))) || __declspec(align(16))?
Comment 6 James Robinson 2013-01-21 23:45:27 PST
> 
> > And the 64-bit Windows heap guarantees 16-byte alignment, as does Chromium's 64-bit TCMalloc port.
> > 
> > That said, you're right that it would still be relying on quirks rather than explicit guarantees. So, if you're not comfortable with me enabling it for any Windows 64-bit build, do you have any objections to me doing so for just the 64-bit Windows Chromium port? (That is, assuming @jamesr isn't aware of something preventing this for 64-bit Chromium ports.)
> 
> I have no objection against enabling this on Windows. I just don't know enough about Windows x86_64 to r+.
> 
> If you enable this for Windows, can you introduce a new macro to do __attribute__((aligned (16))) || __declspec(align(16))?

__attribute goes after the type, __declspec goes before. You can write a function-like macro for members and locals, but it won't work for type declarations.I think standard practice is to have both pre and post macros.
Comment 7 Justin Schuh 2013-01-22 09:27:47 PST
(In reply to comment #3)
> I have no objection against enabling this on Windows. I just don't know enough about Windows x86_64 to r+.

Gotcha. If I do propose turning it on I'll make sure to find an appropriate reviewer. Sadly, I don't think anyone other than myself is actually working a Win64 port, so I doubt I can get *that* appropriate of a reviewer.


(In reply to comment #4)
> On Chromium Windows the allocator can be selected at runtime with an environment variable.

Yup, but on win64 we don't support custom allocators yet so it's winheap only. That said, the only thing I'm aware of us using the allocator switching for is winheap under instrumentation, so we're basically safe right now.

> Last I looked, we supported tcmalloc, jemalloc, and winheap at least. Is this true for the 64 bit build? Do we know the alignment properties of all of the possible allocators?

You forgot Windows LFH. :) And yes, they all have a minimum 16-byte alignment on 64-bit. After we get a basic build up and running we'll eventually enable the custom allocators and test each, just to be sure.


(In reply to comment #6)
(In reply to comment #5)
> > If you enable this for Windows, can you introduce a new macro to do __attribute__((aligned (16))) || __declspec(align(16))?
> 
> __attribute goes after the type, __declspec goes before. You can write a function-like macro for members and locals, but it won't work for type declarations.I think standard practice is to have both pre and post macros.

So, add something like WTF_ALIGNED_TYPE_START(align) and WTF_ALIGNED_TYPE_END(align) to Alignment.h? Feels a bit horsey, but it's probably a better solution than anything more clever.
Comment 8 James Robinson 2013-01-22 12:17:33 PST
Comment on attachment 183881 [details]
Patch for discussion.

I think this is fine. We can add a macro when we get a few more callers - at this point I think it's a bit premature.
Comment 9 Benjamin Poulain 2013-01-22 12:39:50 PST
I was thinking of something like:

#define WTF_ALIGNED(declaration, alignment) \
    declaration __attribute__((aligned (alignment))); 

> I think this is fine. We can add a macro when we get a few more callers - at this point I think it's a bit premature.

Fair enough, I don't have a problem with that.
Comment 10 Justin Schuh 2013-01-22 12:43:37 PST
Since the verdict seems to be that this patch is fine, I'm just going to cq+.
Comment 11 WebKit Review Bot 2013-01-22 13:21:34 PST
Comment on attachment 183881 [details]
Patch for discussion.

Clearing flags on attachment: 183881

Committed r140455: <http://trac.webkit.org/changeset/140455>
Comment 12 WebKit Review Bot 2013-01-22 13:21:37 PST
All reviewed patches have been landed.  Closing bug.