Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
General
GeneralPortable MacsHardwareNetworking
Applications
Mac ApplicationsEudoraFirefox / MozillaInternet ExplorerOutlook ExpressMS OfficeEntourageExcelPowerPointWordVirtual PCMedia PlayerOther MS Products
Programming
Mac ProgrammingCodeWarriorPerl
Country Specific
Australian Mac GroupUK Mac Group

Mac Forum / Programming / CodeWarrior / April 2005



Tip: Looking for answers? Try searching our database.

Compiler Bug

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Ben Weiss - 27 Apr 2005 02:49 GMT
I came across the following bug in the CodeWarrior PPC compiler, took a
long time to isolate... I've submitted it to Metrowerks, but heard only
silence, which is unsettling. (Ron, what's going on over there?)

Try compiling/disassembling the following:

#pragma altivec_vrsave off
#pragma scheduling once
#pragma optimization_level 1

int test (vector unsigned int x) {
  vector unsigned int buf[1];
  buf[0] = x;
 
  int a = ((int*)buf)[0];
  int b = ((int*)buf)[1];
 
  return a + b;
  }

Note that the compiler incorrectly places the first dependent load
BEFORE the store, so 'a' ends up with a garbage value.

Frightening bug, worth watching out for.

Ben
Paul Russell - 27 Apr 2005 09:52 GMT
> I came across the following bug in the CodeWarrior PPC compiler, took a
> long time to isolate... I've submitted it to Metrowerks, but heard only
[quoted text clipped - 20 lines]
>
> Frightening bug, worth watching out for.

Frightening code too - you're breaking the C aliassing rules with the
above type-punning, so I'm not sure if it's so much a compiler bug as a
coding bug.

FWIW I've seen similar problems with gcc 3.3 recently in some code
written by a junior colleague - after I persuaded the miscreant to get
rid of the type puns and use unions instead the code generation problems
went away.

Paul
larry@skytag.com - 27 Apr 2005 12:41 GMT
Amen. I've never understood why so many people want to use these kinds
of casts. Such casts are intended for only one purpose: To blind the
compiler. Sometimes its necessary, but often there are more robust ways
to accomplish what  you need to do. Furthermore, they make for a
maintence nightmare *because* they blind the compiler. You probably
won't see a problem in a five-line function, but what if this were a
much longer function and the declaration of buf was changed at some
later date to a data type of a different size? If the person making the
change didn't update that casting code it would happily ignore the
change and almost certainly not do what it was written to do.
Ben Weiss - 27 Apr 2005 21:03 GMT
> > I came across the following bug in the CodeWarrior PPC compiler, took a
> > long time to isolate... I've submitted it to Metrowerks, but heard only
[quoted text clipped - 31 lines]
>
> Paul

Sure, unions (or even volatile declarations) would probably solve the
problem, and are the pedantically correct way to do this, but I was
under the impression that the compiler was still not allowed to
rearrange the order of stores/loads like this.  Could it be that the
compiler isn't interpreting buf[0] = x; as a store, since it's on the
stack?

The problem still occurs with:

vector unsigned int buf[1] = { v };
 
int a = ((int*)&buf[0])[0];
int b = ((int*)&buf[0])[1];

where the buffer contents are explicitly initialized in the declaration,
yet the compiler uses the variable before initializing it. Is this code
still illegal?

FWIW, I've seen code like this in countless places, and always presumed
it to be legal and unambiguous. (Do you have a reference for where in
the C spec it's stated not to be?)

It's also surprising that the problem manifests itself only at LOW
optimization settings; I could understand how an aggressive optimizer
might try to rearrange loads and stores, but it's less understandable
when the optimizer is turned off.

Finally, the problem STILL occurs with:

int a = ((int*)&v)[0];
int b = ((int*)&v)[1];
 
return a + b;

Is this also illegal??

Ben
Paul Russell - 27 Apr 2005 23:05 GMT
> Sure, unions (or even volatile declarations) would probably solve the
> problem, and are the pedantically correct way to do this, but I was
> under the impression that the compiler was still not allowed to
> rearrange the order of stores/loads like this.  Could it be that the
> compiler isn't interpreting buf[0] = x; as a store, since it's on the
> stack?

I'm not a language lawyer, but from experience I can tell you that type
punning is generally a Bad Thing. Some compilers will give you
appropriate warnings about it but most won't (probably because it's so
prevalent). As mentioned, I've recently seen code generation problems
with gcc 3.3 caused by type punning (I think this was actually very
similar to your CodeWarrior example - casting between pointer to vector
and pointer to int) so whether it's strictly correct for a compiler to
reorder loads and stores around type-punned pointers or not, the bottom
line is that it can happen. Even if you can persuade MetroWerks to (a)
accept that this really is a bug and (b) fix it, you may well get bitten
again when you switch to another compiler.

Regards,

Paul
Ben Weiss - 27 Apr 2005 22:41 GMT
> Frightening code too - you're breaking the C aliassing rules with the
> above type-punning, so I'm not sure if it's so much a compiler bug as a
[quoted text clipped - 6 lines]
>
> Paul

OK, I rewrote it with unions, and it STILL breaks.  You can't tell me
this isn't a compiler bug.

#pragma scheduling once
#pragma optimization_level 1
#pragma altivec_vrsave off

static int test(vector unsigned int v) {
  struct u32x4 {
     unsigned int a, b, c, d;
     };
 
  union vu32_u32x4 {
     vector unsigned int v;
     u32x4 i;
     };
 
  vu32_u32x4 u;
 
  u.v = v;
  int a = u.i.a;
  int b = u.i.b;
 
  return a + b;
  }

compiles to:

Hunk: Kind=HUNK_LOCAL_CODE     Align=16  Class=PR  
Name=".test__FXUi"(256)  Size=72
00000000: 542C073E  clrlwi     r12,SP,28
00000004: 218CFFB0  subfic     r12,r12,-80
00000008: 7C21616E  stwux      SP,SP,r12
0000000C: 38010040  addi       r0,SP,64
00000010: 80610040  lwz        r3,64(SP)
00000014: 7C4001CE  stvx       vr2,r0,r0
00000018: 80010044  lwz        r0,68(SP)
0000001C: 7C630214  add        r3,r3,r0
00000020: 80210000  lwz        SP,0(SP)
00000024: 4E800020  blr

Ben
Paul Russell - 28 Apr 2005 07:30 GMT
> OK, I rewrote it with unions, and it STILL breaks.  You can't tell me
> this isn't a compiler bug.
[quoted text clipped - 36 lines]
> 00000020: 80210000  lwz        SP,0(SP)
> 00000024: 4E800020  blr

Yes, I'll give you that one - that does indeed look like a bug.

BTW, probably irrelevant, but I'm curious as to why you would disable
VRSAVE ? This is not normally something you would ever want to do.

Regards,

Paul
Ben Weiss - 28 Apr 2005 09:32 GMT
> > OK, I rewrote it with unions, and it STILL breaks.  You can't tell me
> > this isn't a compiler bug.
[quoted text clipped - 45 lines]
>
> Paul

Two reasons.

First, it makes the disassembled code shorter and easier to read,
particularly for small functions like this. Compare the same function
compiled with VRSAVE ON (note the bug is still there):

Hunk: Kind=HUNK_LOCAL_CODE     Align=16  Class=PR  
Name=".test__FXUi"(205)  Size=96
00000000: 7C0042A6  mfvrsave   r0
00000004: 542C073E  clrlwi     r12,SP,28
00000008: 218CFFA0  subfic     r12,r12,-96
0000000C: 9001FFFC  stw        r0,-4(SP)
00000010: 64002000  oris       r0,r0,$2000
00000014: 7C21616E  stwux      SP,SP,r12
00000018: 7C0043A6  mtvrsave   r0
0000001C: 38010040  addi       r0,SP,64
00000020: 80610040  lwz        r3,64(SP)
00000024: 7C4001CE  stvx       vr2,r0,r0
00000028: 80010044  lwz        r0,68(SP)
0000002C: 7C630214  add        r3,r3,r0
00000030: 80210000  lwz        SP,0(SP)
00000034: 8161FFFC  lwz        r11,-4(SP)
00000038: 7D6043A6  mtvrsave   r11
0000003C: 4E800020  blr

Second, this was inside an altivec-intensive and performance-critical
thread, so most of the vector registers (typically all of them) are in
use when the context is switched. Disabling VRSAVE reduces the code size
and function overhead as shown above, and seems to outweigh the slight
(if any) slowdown while context-switching. (The thread entrypoint is
wrapped with #pragma vrsave_allon.)

The rest of my code uses VRSAVE normally, of course.

Regarding the bug, is there a better/more portable way than unions or
casting to accomplish the same task? For now I'm just declaring 'buf' as
volatile and it seems to work, but yuck. (By the way, I had optimization
level set to 1 because this was the debug build, and scheduling once
because the second scheduling pass was doing its own ghastly things to
the code.)

The ultimate purpose of the altivec-to-scalar conversion is to extract
aligned bytes from the altivec register and write them into
possibly-misaligned, possibly-discontiguous pixels of an image. (e.g.
the green channnel of RGBRGB data). The obvious way was to transfer the
data to scalar in 32-bit chunks, and dice/store from there. Technically
I could do it with a bunch of splats and vec_ste's; possibly worth
benchmarking to see if that's more efficient?

Thanks for your help and comments; much appreciated.

Ben
Paul Russell - 28 Apr 2005 09:49 GMT
>>BTW, probably irrelevant, but I'm curious as to why you would disable
>>VRSAVE ? This is not normally something you would ever want to do.
[quoted text clipped - 32 lines]
>
> The rest of my code uses VRSAVE normally, of course.

OK, thanks for the explanation. The reason that I asked is that I've
encountered cases of people turning off VRASVE without actually knowing
what it did, but evidently this doesn't apply in your case.

> Regarding the bug, is there a better/more portable way than unions or
> casting to accomplish the same task? For now I'm just declaring 'buf' as
[quoted text clipped - 10 lines]
> I could do it with a bunch of splats and vec_ste's; possibly worth
> benchmarking to see if that's more efficient?

Generally I use unions or vec_ste to get scalar values from vectors,
depending on the context. In the other direction you can use unions or
vec_lde of course.

As for your particular case, I would tend to try and stay in the vector
unit if you can. The latency of both loads/stores (even to/from L1) and
of AltiVec instructions thmeselves is on the increase (particularly in
the G5) so moving data between the vector unit and the GPR's is getting
increasingly expensive. It may look like more work in the code but it's
probably going to be more efficient in the long run.

Regards,

Paul

P.S. If you're doing a lot of this kind of thing you might want to check
out the AltiVec mailing list at <www.simdtech.org/altivec>. Low
bandwidth, high signal-to-noise ratio and some very knowledgeable
contributors.
Paul Russell - 28 Apr 2005 09:54 GMT
> First, it makes the disassembled code shorter and easier to read,
> particularly for small functions like this. Compare the same function
[quoted text clipped - 6 lines]
> (if any) slowdown while context-switching. (The thread entrypoint is
> wrapped with #pragma vrsave_allon.)

One other thing: for small performance-critical routines like this you
should probably just use inlining - this not only gets rid of the VRSAVE
but it also removes the function call overhead (of course) plus it gives
the compiler some additional optimisation opportunities.

Regards,

Paul
Alwyn - 28 Apr 2005 07:42 GMT
> FWIW I've seen similar problems with gcc 3.3 recently in some code
> written by a junior colleague - after I persuaded the miscreant to get
> rid of the type puns and use unions instead the code generation problems
> went away.

But in standard C, a 'union' may only contain a value of one type at a
time; you cannot portably use 'union's to reinterpret data of one type as
data of another type. If you must do this, it is generally better touse
casts.

Of course, I'm not disputing what you said above about GCC 3.3, but I
would say that any improvement that was seen was implementation-specific.

Alwyn
Paul Russell - 28 Apr 2005 08:24 GMT
> But in standard C, a 'union' may only contain a value of one type at a
> time; you cannot portably use 'union's to reinterpret data of one type as
[quoted text clipped - 3 lines]
> Of course, I'm not disputing what you said above about GCC 3.3, but I
> would say that any improvement that was seen was implementation-specific.

You probably already know this, but it's important to distinguish
between legitimate casts and the use of casts with fundamentally
different pointer types to perform "type punning", e.g.

float f = 1.0;
int i = *(int *)&f; // put bit representation of 1.0f into an int

In the OP's case you can't cast between an AltiVec vector and an int,
and unfortunately Motorola rather botched the syntax of the AltiVec
extensions, so he resorted to type punning to extract an element of an
AltiVec vector into a scalar int. Unfortunately type punning breaks
aliasing rules (at least in C89 onwards ?) so there can be unpleasant
consequences.

But yes, you're right. using unions to achieve the same thing is still
not 100% portable, but at least it doesn't have the potential for
undefined behaviour (beyond the actual representation of the data within
the union) that type punning does, and is therefore to be preferred.

Regards,

Paul
Alwyn - 28 Apr 2005 08:35 GMT
> But yes, you're right. using unions to achieve the same thing is still
> not 100% portable, but at least it doesn't have the potential for
> undefined behaviour (beyond the actual representation of the data within
> the union) that type punning does, and is therefore to be preferred.

I haven't got the C standard to hand, but it is my recollection that
accessing the same data in a union as two different types is undefined
behaviour. So far as standard C is concerned, once undefined behaviour
occurs, _anything_ could happen.

Alwyn
Alwyn - 28 Apr 2005 08:48 GMT
> I haven't got the C standard to hand, but it is my recollection that
> accessing the same data in a union as two different types is undefined
> behaviour. So far as standard C is concerned, once undefined behaviour
> occurs, _anything_ could happen.

According to this page: 'The Standard says that if you do this, the
behaviour is implementation defined (not undefined).' So, in the absence
of a higher authority, I will gladly admit you may well be right.
<http://publications.gbdirect.co.uk/c_book/chapter6/unions.html>

Alwyn
Paul Russell - 28 Apr 2005 09:24 GMT
> According to this page: 'The Standard says that if you do this, the
> behaviour is implementation defined (not undefined).' So, in the absence
> of a higher authority, I will gladly admit you may well be right.
> <http://publications.gbdirect.co.uk/c_book/chapter6/unions.html>

Thanks for the link - I did some Googling myself and it seems that the
issue of aliasing and unions is a thorny one that the standards
committees have been having a hard time with. It's a trade-off between
making life easier for the programmer and making life easier for the
compiler (optimiser). It would be interesting to look at the C99
standard and see if this ever got resolved satisfactorily.

Regards,

Paul
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.