Bug 164588

Summary: [JSC] Build broken for 32-bit x86 after r208306 with GCC 4.9
Product: WebKit Reporter: Carlos Alberto Lopez Perez <clopez>
Component: JavaScriptCoreAssignee: Carlos Alberto Lopez Perez <clopez>
Status: RESOLVED FIXED    
Severity: Normal CC: benjamin, bugs-noreply, cdumez, cgarcia, cmarcelo, commit-queue, dbates, fpizlo, ggaren, gustavo, keith_miller, mark.lam, mcatanzaro, ossy, ysuzuki
Priority: P2    
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=163562
Attachments:
Description Flags
Patch
none
Patch
none
Patch none

Description Carlos Alberto Lopez Perez 2016-11-10 05:45:38 PST
Building WebKitGTK+ (and I guess any other Linux port) after r208306 <http://trac.webkit.org/r208306> causes this:

g++-4.9  [.....] -o Source/JavaScriptCore/CMakeFiles/JavaScriptCore.dir/heap/Heap.cpp.o -c ../../Source/JavaScriptCore/heap/Heap.cpp
In file included from ../../Source/WTF/wtf/Lock.h:29:0,
                 from ../../Source/WTF/wtf/HashTable.h:33,
                 from ../../Source/WTF/wtf/HashMap.h:25,
                 from ../../Source/JavaScriptCore/runtime/JSCJSValue.h:33,
                 from ../../Source/JavaScriptCore/heap/HandleTypes.h:28,
                 from ../../Source/JavaScriptCore/heap/Handle.h:28,
                 from ../../Source/JavaScriptCore/heap/HandleSet.h:28,
                 from ../../Source/JavaScriptCore/heap/Heap.h:27,
                 from ../../Source/JavaScriptCore/heap/Heap.cpp:22:
../../Source/WTF/wtf/Atomics.h: In member function 'bool JSC::Heap::handleGCDidJIT(unsigned int)':
../../Source/WTF/wtf/Atomics.h:245:20: error: inconsistent operand constraints in an 'asm'
         : "memory");
                    ^


https://build.webkit.org/builders/GTK%20Linux%2032-bit%20Release/builds/65378/steps/compile-webkit/logs/stdio/text


This is the function causing the issue:

inline void x86_cpuid()
{
#if OS(WINDOWS)
    int info[4];
    __cpuid(info, 0);
#else
    intptr_t a = 0, b, c, d;
    asm volatile(
        "cpuid"
        : "+a"(a), "=b"(b), "=c"(c), "=d"(d)
        :
        : "memory");
#endif
}


Seems GCC 4.9 doesn't like it on 32-bit


The build failure is not reproducible with newer GCC (tested with GCC 5.3.1 and built fine)
Comment 1 Carlos Alberto Lopez Perez 2016-11-10 06:02:42 PST
And this seems to be the right answer: http://stackoverflow.com/a/28976166

Removing -fPIC for that file from the compiler arguments makes it build
Comment 2 Michael Catanzaro 2016-11-10 06:55:31 PST
Does it work, though??? Is it really possible to have a shared library with position-dependent code? I do not think removing -fPIC could be right...?
Comment 3 Carlos Alberto Lopez Perez 2016-11-10 08:35:42 PST
(In reply to comment #2)
> Does it work, though??? Is it really possible to have a shared library with
> position-dependent code? I do not think removing -fPIC could be right...?

What I tried was just removing the fPIC flag from JavaScriptCore/heap/Heap.cpp which is (right now) the only user of x86_cpuid() via crossModifyingCodeFence().

Doing this with cmake seems difficult (not even sure if its possible), because the fPIC argument gets passed automatically for any library via ${CMAKE_SHARED_LIBRARY_CXX_FLAGS} (for the whole library and not for a single file).

So I managed to get it build by manually editing the ninja recipes generated to remove the fPIC variable from JavaScriptCore/heap/Heap.cpp

It seems to work, but its hard to tell if something is broke as I don't even know what is the purpose of this crossModifyingCodeFence() thing or when its used.

And removing -fPIC for the whole JavaScriptCore library seems not something we want to do in any case.

I see crossModifyingCodeFence is defined as std::atomic_thread_fence(std::memory_order_seq_cst) for non x86 cpus. Maybe that could work also for an x86 cpu?.


Another option is to modify that assembly to make it happy with gcc-4.9 on x86-32 bits. Here seems to be some interesting notes about this: http://sam.zoy.org/blog/2007-04-13-shlib-with-non-pic-code-have-inline-assembly-and-pic-mix-well


But is hard to try to modify that without knowing what is x86_cpuid() actually doing, how to test it, or what is the purpose of crossModifyingCodeFence().


Filip could you advice us what to do here to fix this?
Comment 4 Carlos Alberto Lopez Perez 2016-11-10 10:42:16 PST
Ok.. i think now I understand better this.

The purpose of executing the x86 assembly instruction cpuid on crossModifyingCodeFence() is just to force any modifications to registers, memory and flags to be completed and to drain all buffered writes to memory before the next instruction is fetched and executed.

The value returned by cpuid is not used at all here. The idea is just to force this sync() on the cpu registers and memory.. right?

Reference: https://jpassing.com/2015/01/19/runtime-code-modification-explained-part-2-cache-coherency-issues/
Comment 5 Carlos Alberto Lopez Perez 2016-11-10 11:38:58 PST
Created attachment 294391 [details]
Patch
Comment 6 Michael Catanzaro 2016-11-10 12:21:01 PST
Comment on attachment 294391 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=294391&action=review

> Source/WTF/wtf/Atomics.h:239
> +#elif CPU(X86) && defined(__PIC__) && __GNUC__ < 5

It will be a compiler warning on other compilers. Best check if it's defined before using it:

&& defined(__GNUC__) && __GNUC__ < 5

Do we want to run this code when using Clang? If not, then use this instead to filter out Clang:

&& COMPILER(GCC) && __GNUC__ < 5
Comment 7 Carlos Alberto Lopez Perez 2016-11-10 13:12:55 PST
Created attachment 294401 [details]
Patch

Address Michael comments
Comment 8 Carlos Alberto Lopez Perez 2016-11-15 07:33:45 PST
Ping reviewers? The GTK 32-bit buildbot has been red for several days already due to this <https://build.webkit.org/builders/GTK%20Linux%2032-bit%20Release>
Comment 9 Michael Catanzaro 2016-11-15 08:35:01 PST
Comment on attachment 294401 [details]
Patch

I'd rather another reviewer look at it because I do not understand the code, but here's your rubber stamp.
Comment 10 Michael Catanzaro 2016-11-15 08:36:00 PST
I'm suspicious that the code is only needed when using GCC.
Comment 11 Filip Pizlo 2016-11-15 09:31:36 PST
Comment on attachment 294401 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=294401&action=review

> Source/WTF/wtf/Atomics.h:248
> +#elif CPU(X86) && defined(__PIC__) && COMPILER(GCC) && __GNUC__ < 5
> +    intptr_t a = 0, b, c, d;
> +    asm volatile(
> +        "pushl %%ebx;" // save %ebx
> +        "cpuid;"
> +        "movl %%ebx, %0;" // save what cpuid put in %ebx into the first output (b)
> +        "popl %%ebx;" // restore the old %ebx value
> +        : "=r"(b), "+a"(a), "=c"(c), "=d"(d)
> +        :
> +        : "memory");

You should just make this conditional on CPU(X86).  Because cpuid is so expensive, we don't use this on code paths where the extra push/pop would be expensive.  It's better to simplify the conditional just for sanity.

You don't need to movl %%ebx, %0.  We never use the results of a, b, c, d.  The only point of those variables is as a way to tell the compiler that cpuid clobbers ax, bx, cx, and dx.  But it seems that this version of GCC doesn't grok our clobbering of bx.

So, I suggest something like this for x86-32:

intptr_t a = 0, c, d;
asm volatile(
    "pushl %%ebx\n\t"
    "cpuid\n\t"
    "popl %%ebx\n\t"
    : "+a"(a), "=c"(c), "=d"(d)
    :
    : "memory");

So, we still tell the compiler that this reads and writes ax, writes cx, and writes dx, but we don't tell the compiler about bx, because we know that the compiler might not understand this and because we preserve its value using push/pop.

r- because I think that the movl is unnecessary.
Comment 12 Carlos Alberto Lopez Perez 2016-11-15 12:10:14 PST
(In reply to comment #11)
> Comment on attachment 294401 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=294401&action=review
> 
> > Source/WTF/wtf/Atomics.h:248
> > +#elif CPU(X86) && defined(__PIC__) && COMPILER(GCC) && __GNUC__ < 5
> > +    intptr_t a = 0, b, c, d;
> > +    asm volatile(
> > +        "pushl %%ebx;" // save %ebx
> > +        "cpuid;"
> > +        "movl %%ebx, %0;" // save what cpuid put in %ebx into the first output (b)
> > +        "popl %%ebx;" // restore the old %ebx value
> > +        : "=r"(b), "+a"(a), "=c"(c), "=d"(d)
> > +        :
> > +        : "memory");
> 
> You should just make this conditional on CPU(X86).  Because cpuid is so
> expensive, we don't use this on code paths where the extra push/pop would be
> expensive.  It's better to simplify the conditional just for sanity.
> 
> You don't need to movl %%ebx, %0.  We never use the results of a, b, c, d. 
> The only point of those variables is as a way to tell the compiler that
> cpuid clobbers ax, bx, cx, and dx.  But it seems that this version of GCC
> doesn't grok our clobbering of bx.
> 
> So, I suggest something like this for x86-32:
> 
> intptr_t a = 0, c, d;
> asm volatile(
>     "pushl %%ebx\n\t"
>     "cpuid\n\t"
>     "popl %%ebx\n\t"
>     : "+a"(a), "=c"(c), "=d"(d)
>     :
>     : "memory");
> 
> So, we still tell the compiler that this reads and writes ax, writes cx, and
> writes dx, but we don't tell the compiler about bx, because we know that the
> compiler might not understand this and because we preserve its value using
> push/pop.
> 
> r- because I think that the movl is unnecessary.

The movl is necessary if you actually want to read what cpuid has returned, but since we don't care about that, we can just ignore it and avoid the movl.

However, if you are concerned about the extra movl and about having more if-defs than necessary, why just don't use this both unconditionally for x86 and x64?

I have checked what the compiler does, and doing it how you suggests avoids 2 movl call on x64 http://sprunge.us/SVKR (and 1 in x86)
Comment 13 Filip Pizlo 2016-11-15 12:13:56 PST
(In reply to comment #12)
> (In reply to comment #11)
> > Comment on attachment 294401 [details]
> > Patch
> > 
> > View in context:
> > https://bugs.webkit.org/attachment.cgi?id=294401&action=review
> > 
> > > Source/WTF/wtf/Atomics.h:248
> > > +#elif CPU(X86) && defined(__PIC__) && COMPILER(GCC) && __GNUC__ < 5
> > > +    intptr_t a = 0, b, c, d;
> > > +    asm volatile(
> > > +        "pushl %%ebx;" // save %ebx
> > > +        "cpuid;"
> > > +        "movl %%ebx, %0;" // save what cpuid put in %ebx into the first output (b)
> > > +        "popl %%ebx;" // restore the old %ebx value
> > > +        : "=r"(b), "+a"(a), "=c"(c), "=d"(d)
> > > +        :
> > > +        : "memory");
> > 
> > You should just make this conditional on CPU(X86).  Because cpuid is so
> > expensive, we don't use this on code paths where the extra push/pop would be
> > expensive.  It's better to simplify the conditional just for sanity.
> > 
> > You don't need to movl %%ebx, %0.  We never use the results of a, b, c, d. 
> > The only point of those variables is as a way to tell the compiler that
> > cpuid clobbers ax, bx, cx, and dx.  But it seems that this version of GCC
> > doesn't grok our clobbering of bx.
> > 
> > So, I suggest something like this for x86-32:
> > 
> > intptr_t a = 0, c, d;
> > asm volatile(
> >     "pushl %%ebx\n\t"
> >     "cpuid\n\t"
> >     "popl %%ebx\n\t"
> >     : "+a"(a), "=c"(c), "=d"(d)
> >     :
> >     : "memory");
> > 
> > So, we still tell the compiler that this reads and writes ax, writes cx, and
> > writes dx, but we don't tell the compiler about bx, because we know that the
> > compiler might not understand this and because we preserve its value using
> > push/pop.
> > 
> > r- because I think that the movl is unnecessary.
> 
> The movl is necessary if you actually want to read what cpuid has returned,
> but since we don't care about that, we can just ignore it and avoid the movl.
> 
> However, if you are concerned about the extra movl and about having more
> if-defs than necessary, why just don't use this both unconditionally for x86
> and x64?

I'm happy with that, but it gets awkward when telling the compiler that you want to push ebx.  On x86-32 you would say pushl ebx, while on x86-64 you would say pushq rbx.  If you can find a clever way around this then I'm happy with this code being shared.  That would be the best.

> 
> I have checked what the compiler does, and doing it how you suggests avoids
> 2 movl call on x64 http://sprunge.us/SVKR (and 1 in x86)
Comment 14 Carlos Alberto Lopez Perez 2016-11-15 12:51:31 PST
(In reply to comment #13)
> > 
> > The movl is necessary if you actually want to read what cpuid has returned,
> > but since we don't care about that, we can just ignore it and avoid the movl.
> > 
> > However, if you are concerned about the extra movl and about having more
> > if-defs than necessary, why just don't use this both unconditionally for x86
> > and x64?
> 
> I'm happy with that, but it gets awkward when telling the compiler that you
> want to push ebx.  On x86-32 you would say pushl ebx, while on x86-64 you
> would say pushq rbx.  If you can find a clever way around this then I'm
> happy with this code being shared.  That would be the best.
> 

Mmmm, good point.

I was overlooking that in 64-bit the instructions and the registers are different :|

So, I guess that using this for x86 only is the best thing I can come with.
Comment 15 Carlos Alberto Lopez Perez 2016-11-15 13:09:21 PST
Created attachment 294870 [details]
Patch

Avoid a movl call and make it conditional for CPU(X86)
Comment 16 Carlos Alberto Lopez Perez 2016-11-16 12:03:06 PST
Filip please, could you review this patch? The GTK 32-bit buildbot continues to be broken due to this.


I understand it should be quick to review. Thanks
Comment 17 Mark Lam 2016-11-16 12:50:01 PST
Comment on attachment 294870 [details]
Patch

r=me because Filip has already reviewed this previously and suggested this specific fix.
Comment 18 Carlos Alberto Lopez Perez 2016-11-16 12:59:46 PST
(In reply to comment #17)
> Comment on attachment 294870 [details]
> Patch
> 
> r=me because Filip has already reviewed this previously and suggested this
> specific fix.

Thanks!
Comment 19 Carlos Alberto Lopez Perez 2016-11-16 13:01:05 PST
Comment on attachment 294870 [details]
Patch

Clearing flags on attachment: 294870

Committed r208806: <http://trac.webkit.org/changeset/208806>
Comment 20 Carlos Alberto Lopez Perez 2016-11-16 13:01:16 PST
All reviewed patches have been landed.  Closing bug.