Compiler Bug
|
|
Thread rating:  |
Ben Weiss - 27 Apr 2005 02:49 GMT I came across the following bug in the CodeWarrior PPC compiler, took a long time to isolate... I've submitted it to Metrowerks, but heard only silence, which is unsettling. (Ron, what's going on over there?)
Try compiling/disassembling the following:
#pragma altivec_vrsave off #pragma scheduling once #pragma optimization_level 1
int test (vector unsigned int x) { vector unsigned int buf[1]; buf[0] = x; int a = ((int*)buf)[0]; int b = ((int*)buf)[1]; return a + b; }
Note that the compiler incorrectly places the first dependent load BEFORE the store, so 'a' ends up with a garbage value.
Frightening bug, worth watching out for.
Ben
Paul Russell - 27 Apr 2005 09:52 GMT > I came across the following bug in the CodeWarrior PPC compiler, took a > long time to isolate... I've submitted it to Metrowerks, but heard only [quoted text clipped - 20 lines] > > Frightening bug, worth watching out for. Frightening code too - you're breaking the C aliassing rules with the above type-punning, so I'm not sure if it's so much a compiler bug as a coding bug.
FWIW I've seen similar problems with gcc 3.3 recently in some code written by a junior colleague - after I persuaded the miscreant to get rid of the type puns and use unions instead the code generation problems went away.
Paul
larry@skytag.com - 27 Apr 2005 12:41 GMT Amen. I've never understood why so many people want to use these kinds of casts. Such casts are intended for only one purpose: To blind the compiler. Sometimes its necessary, but often there are more robust ways to accomplish what you need to do. Furthermore, they make for a maintence nightmare *because* they blind the compiler. You probably won't see a problem in a five-line function, but what if this were a much longer function and the declaration of buf was changed at some later date to a data type of a different size? If the person making the change didn't update that casting code it would happily ignore the change and almost certainly not do what it was written to do.
Ben Weiss - 27 Apr 2005 21:03 GMT > > I came across the following bug in the CodeWarrior PPC compiler, took a > > long time to isolate... I've submitted it to Metrowerks, but heard only [quoted text clipped - 31 lines] > > Paul Sure, unions (or even volatile declarations) would probably solve the problem, and are the pedantically correct way to do this, but I was under the impression that the compiler was still not allowed to rearrange the order of stores/loads like this. Could it be that the compiler isn't interpreting buf[0] = x; as a store, since it's on the stack?
The problem still occurs with:
vector unsigned int buf[1] = { v }; int a = ((int*)&buf[0])[0]; int b = ((int*)&buf[0])[1];
where the buffer contents are explicitly initialized in the declaration, yet the compiler uses the variable before initializing it. Is this code still illegal?
FWIW, I've seen code like this in countless places, and always presumed it to be legal and unambiguous. (Do you have a reference for where in the C spec it's stated not to be?)
It's also surprising that the problem manifests itself only at LOW optimization settings; I could understand how an aggressive optimizer might try to rearrange loads and stores, but it's less understandable when the optimizer is turned off.
Finally, the problem STILL occurs with:
int a = ((int*)&v)[0]; int b = ((int*)&v)[1]; return a + b;
Is this also illegal??
Ben
Paul Russell - 27 Apr 2005 23:05 GMT > Sure, unions (or even volatile declarations) would probably solve the > problem, and are the pedantically correct way to do this, but I was > under the impression that the compiler was still not allowed to > rearrange the order of stores/loads like this. Could it be that the > compiler isn't interpreting buf[0] = x; as a store, since it's on the > stack? I'm not a language lawyer, but from experience I can tell you that type punning is generally a Bad Thing. Some compilers will give you appropriate warnings about it but most won't (probably because it's so prevalent). As mentioned, I've recently seen code generation problems with gcc 3.3 caused by type punning (I think this was actually very similar to your CodeWarrior example - casting between pointer to vector and pointer to int) so whether it's strictly correct for a compiler to reorder loads and stores around type-punned pointers or not, the bottom line is that it can happen. Even if you can persuade MetroWerks to (a) accept that this really is a bug and (b) fix it, you may well get bitten again when you switch to another compiler.
Regards,
Paul
Ben Weiss - 27 Apr 2005 22:41 GMT > Frightening code too - you're breaking the C aliassing rules with the > above type-punning, so I'm not sure if it's so much a compiler bug as a [quoted text clipped - 6 lines] > > Paul OK, I rewrote it with unions, and it STILL breaks. You can't tell me this isn't a compiler bug.
#pragma scheduling once #pragma optimization_level 1 #pragma altivec_vrsave off
static int test(vector unsigned int v) { struct u32x4 { unsigned int a, b, c, d; }; union vu32_u32x4 { vector unsigned int v; u32x4 i; }; vu32_u32x4 u; u.v = v; int a = u.i.a; int b = u.i.b; return a + b; }
compiles to:
Hunk: Kind=HUNK_LOCAL_CODE Align=16 Class=PR Name=".test__FXUi"(256) Size=72 00000000: 542C073E clrlwi r12,SP,28 00000004: 218CFFB0 subfic r12,r12,-80 00000008: 7C21616E stwux SP,SP,r12 0000000C: 38010040 addi r0,SP,64 00000010: 80610040 lwz r3,64(SP) 00000014: 7C4001CE stvx vr2,r0,r0 00000018: 80010044 lwz r0,68(SP) 0000001C: 7C630214 add r3,r3,r0 00000020: 80210000 lwz SP,0(SP) 00000024: 4E800020 blr
Ben
Paul Russell - 28 Apr 2005 07:30 GMT > OK, I rewrote it with unions, and it STILL breaks. You can't tell me > this isn't a compiler bug. [quoted text clipped - 36 lines] > 00000020: 80210000 lwz SP,0(SP) > 00000024: 4E800020 blr Yes, I'll give you that one - that does indeed look like a bug.
BTW, probably irrelevant, but I'm curious as to why you would disable VRSAVE ? This is not normally something you would ever want to do.
Regards,
Paul
Ben Weiss - 28 Apr 2005 09:32 GMT > > OK, I rewrote it with unions, and it STILL breaks. You can't tell me > > this isn't a compiler bug. [quoted text clipped - 45 lines] > > Paul Two reasons.
First, it makes the disassembled code shorter and easier to read, particularly for small functions like this. Compare the same function compiled with VRSAVE ON (note the bug is still there):
Hunk: Kind=HUNK_LOCAL_CODE Align=16 Class=PR Name=".test__FXUi"(205) Size=96 00000000: 7C0042A6 mfvrsave r0 00000004: 542C073E clrlwi r12,SP,28 00000008: 218CFFA0 subfic r12,r12,-96 0000000C: 9001FFFC stw r0,-4(SP) 00000010: 64002000 oris r0,r0,$2000 00000014: 7C21616E stwux SP,SP,r12 00000018: 7C0043A6 mtvrsave r0 0000001C: 38010040 addi r0,SP,64 00000020: 80610040 lwz r3,64(SP) 00000024: 7C4001CE stvx vr2,r0,r0 00000028: 80010044 lwz r0,68(SP) 0000002C: 7C630214 add r3,r3,r0 00000030: 80210000 lwz SP,0(SP) 00000034: 8161FFFC lwz r11,-4(SP) 00000038: 7D6043A6 mtvrsave r11 0000003C: 4E800020 blr
Second, this was inside an altivec-intensive and performance-critical thread, so most of the vector registers (typically all of them) are in use when the context is switched. Disabling VRSAVE reduces the code size and function overhead as shown above, and seems to outweigh the slight (if any) slowdown while context-switching. (The thread entrypoint is wrapped with #pragma vrsave_allon.)
The rest of my code uses VRSAVE normally, of course.
Regarding the bug, is there a better/more portable way than unions or casting to accomplish the same task? For now I'm just declaring 'buf' as volatile and it seems to work, but yuck. (By the way, I had optimization level set to 1 because this was the debug build, and scheduling once because the second scheduling pass was doing its own ghastly things to the code.)
The ultimate purpose of the altivec-to-scalar conversion is to extract aligned bytes from the altivec register and write them into possibly-misaligned, possibly-discontiguous pixels of an image. (e.g. the green channnel of RGBRGB data). The obvious way was to transfer the data to scalar in 32-bit chunks, and dice/store from there. Technically I could do it with a bunch of splats and vec_ste's; possibly worth benchmarking to see if that's more efficient?
Thanks for your help and comments; much appreciated.
Ben
Paul Russell - 28 Apr 2005 09:49 GMT >>BTW, probably irrelevant, but I'm curious as to why you would disable >>VRSAVE ? This is not normally something you would ever want to do. [quoted text clipped - 32 lines] > > The rest of my code uses VRSAVE normally, of course. OK, thanks for the explanation. The reason that I asked is that I've encountered cases of people turning off VRASVE without actually knowing what it did, but evidently this doesn't apply in your case.
> Regarding the bug, is there a better/more portable way than unions or > casting to accomplish the same task? For now I'm just declaring 'buf' as [quoted text clipped - 10 lines] > I could do it with a bunch of splats and vec_ste's; possibly worth > benchmarking to see if that's more efficient? Generally I use unions or vec_ste to get scalar values from vectors, depending on the context. In the other direction you can use unions or vec_lde of course.
As for your particular case, I would tend to try and stay in the vector unit if you can. The latency of both loads/stores (even to/from L1) and of AltiVec instructions thmeselves is on the increase (particularly in the G5) so moving data between the vector unit and the GPR's is getting increasingly expensive. It may look like more work in the code but it's probably going to be more efficient in the long run.
Regards,
Paul
P.S. If you're doing a lot of this kind of thing you might want to check out the AltiVec mailing list at <www.simdtech.org/altivec>. Low bandwidth, high signal-to-noise ratio and some very knowledgeable contributors.
Paul Russell - 28 Apr 2005 09:54 GMT > First, it makes the disassembled code shorter and easier to read, > particularly for small functions like this. Compare the same function [quoted text clipped - 6 lines] > (if any) slowdown while context-switching. (The thread entrypoint is > wrapped with #pragma vrsave_allon.) One other thing: for small performance-critical routines like this you should probably just use inlining - this not only gets rid of the VRSAVE but it also removes the function call overhead (of course) plus it gives the compiler some additional optimisation opportunities.
Regards,
Paul
Alwyn - 28 Apr 2005 07:42 GMT > FWIW I've seen similar problems with gcc 3.3 recently in some code > written by a junior colleague - after I persuaded the miscreant to get > rid of the type puns and use unions instead the code generation problems > went away. But in standard C, a 'union' may only contain a value of one type at a time; you cannot portably use 'union's to reinterpret data of one type as data of another type. If you must do this, it is generally better touse casts.
Of course, I'm not disputing what you said above about GCC 3.3, but I would say that any improvement that was seen was implementation-specific.
Alwyn
Paul Russell - 28 Apr 2005 08:24 GMT > But in standard C, a 'union' may only contain a value of one type at a > time; you cannot portably use 'union's to reinterpret data of one type as [quoted text clipped - 3 lines] > Of course, I'm not disputing what you said above about GCC 3.3, but I > would say that any improvement that was seen was implementation-specific. You probably already know this, but it's important to distinguish between legitimate casts and the use of casts with fundamentally different pointer types to perform "type punning", e.g.
float f = 1.0; int i = *(int *)&f; // put bit representation of 1.0f into an int
In the OP's case you can't cast between an AltiVec vector and an int, and unfortunately Motorola rather botched the syntax of the AltiVec extensions, so he resorted to type punning to extract an element of an AltiVec vector into a scalar int. Unfortunately type punning breaks aliasing rules (at least in C89 onwards ?) so there can be unpleasant consequences.
But yes, you're right. using unions to achieve the same thing is still not 100% portable, but at least it doesn't have the potential for undefined behaviour (beyond the actual representation of the data within the union) that type punning does, and is therefore to be preferred.
Regards,
Paul
Alwyn - 28 Apr 2005 08:35 GMT > But yes, you're right. using unions to achieve the same thing is still > not 100% portable, but at least it doesn't have the potential for > undefined behaviour (beyond the actual representation of the data within > the union) that type punning does, and is therefore to be preferred. I haven't got the C standard to hand, but it is my recollection that accessing the same data in a union as two different types is undefined behaviour. So far as standard C is concerned, once undefined behaviour occurs, _anything_ could happen.
Alwyn
Alwyn - 28 Apr 2005 08:48 GMT > I haven't got the C standard to hand, but it is my recollection that > accessing the same data in a union as two different types is undefined > behaviour. So far as standard C is concerned, once undefined behaviour > occurs, _anything_ could happen. According to this page: 'The Standard says that if you do this, the behaviour is implementation defined (not undefined).' So, in the absence of a higher authority, I will gladly admit you may well be right. <http://publications.gbdirect.co.uk/c_book/chapter6/unions.html>
Alwyn
Paul Russell - 28 Apr 2005 09:24 GMT > According to this page: 'The Standard says that if you do this, the > behaviour is implementation defined (not undefined).' So, in the absence > of a higher authority, I will gladly admit you may well be right. > <http://publications.gbdirect.co.uk/c_book/chapter6/unions.html> Thanks for the link - I did some Googling myself and it seems that the issue of aliasing and unions is a thorny one that the standards committees have been having a hard time with. It's a trade-off between making life easier for the programmer and making life easier for the compiler (optimiser). It would be interesting to look at the C99 standard and see if this ever got resolved satisfactorily.
Regards,
Paul
|
|
|