In article
<martin.taylor-41ECD4.10433523012004@newstrial.btopenworld.com>,
Martin,
Howard is taking a skiing week, and he is the expert on this. If you
can write me or write here again next Monday he'll be able to help. I
believe there are some things we fixed with locking in CW 9 which may
have better performance too.
Ron
>I am working on a cross-platform project which makes extensive use of
>the Boost shared_ptr. A while back, we made the application threaded
[quoted text clipped - 34 lines]
>Thanks,
>Martin

Signature
Metrowerks, maker of CodeWarrior - "Software Starts Here"
Ron Liechty - MWRon@metrowerks.com - <http://www.metrowerks.com>
In article
<martin.taylor-41ECD4.10433523012004@newstrial.btopenworld.com>,
> I am working on a cross-platform project which makes extensive use of
> the Boost shared_ptr. A while back, we made the application threaded
[quoted text clipped - 31 lines]
> Can anyone help? If so, I would be happy to submit the code to boost
> for the benefit of all.
I will take you up on that offer! :-)
I took a look at boost/detail/lwm_gcc.hpp and I'm not convinced that it
is reliable, at least the scoped_lock constructor:
explicit scoped_lock(lightweight_mutex & m): m_(m)
{
while( !__exchange_and_add(&m_.a_, -1) )
{
// m_.a_ == -1 !!!
__atomic_add(&m_.a_, 1);
sched_yield();
}
}
The design of this mutex is that a value of 0 is locked, and other
values are unlocked. Say that the mutex is locked by thread A. Thread
B executes __exchange_and_add which returns 0 but does succeed in
decrementing the mutex to -1. Thread B enters the loop to wait as it
should. But thread B gets interrupted by Thread C before it can
increment the mutex backed to its proper locked state (0). Thread C
executes __exchange_and_add which decrements the mutex to -2 and returns
-1. Thread C now believes that it has accquired the mutex. Chaos
ensues.
Therefore instead of writing CodeWarrior PPC versions of
__exchange_and_add and __atomic_add as you requested, I've instead
written a CodeWarrior PPC version of lightweight_mutex based on the
lwarx/stwcx. assembly statements. The code posted below appears to
execute aproximately 8 times faster than pthreads for an uncontested
lock/unlock cycle.
Please let us know if this works for you, and if not, perhaps we can fix
whatever went wrong. I would also like to publicly thank Bob Campbell
of Metrowerks who helped me review the PPC assembly.
Hoping to see you over at boost soon! :-)
-Howard
Metrowerks
#include <sched.h>
namespace boost
{
namespace detail
{
class lightweight_mutex
{
private:
volatile int a_;
lightweight_mutex(lightweight_mutex const &);
lightweight_mutex & operator=(lightweight_mutex const &);
public:
lightweight_mutex(): a_(0)
{
}
class scoped_lock;
friend class scoped_lock;
class scoped_lock
{
private:
lightweight_mutex & m_;
scoped_lock(scoped_lock const &);
scoped_lock & operator=(scoped_lock const &);
public:
explicit scoped_lock(lightweight_mutex & m): m_(m)
{
register volatile int* p = &m_.a_;
register int f;
register int one = 1;
asm
{
loop:
lwarx f, 0, p
cmpwi f, 0
bne- yield
stwcx. one, 0, p
beq+ done
}
yield:
sched_yield();
goto loop;
done: ;
}
~scoped_lock()
{
m_.a_ = 0;
}
};
};
}
}
Martin Taylor - 26 Jan 2004 12:13 GMT
Hi Howard
Thanks for such a fast response, especially considering you had just got
back from holiday!
I did have a couple of problems building with the code you suggested
however, so I would like to suggest the following modification:
namespace boost
{
namespace detail
{
class lightweight_mutex
{
private:
volatile int a_;
lightweight_mutex(lightweight_mutex const &);
lightweight_mutex & operator=(lightweight_mutex const &);
public:
lightweight_mutex(): a_(0)
{
}
class scoped_lock;
friend class scoped_lock;
class scoped_lock
{
private:
lightweight_mutex & m_;
scoped_lock(scoped_lock const &);
scoped_lock & operator=(scoped_lock const &);
public:
explicit scoped_lock(lightweight_mutex & m);
~scoped_lock()
{
m_.a_ = 0;
}
};
};
inline
lightweight_mutex::scoped_lock::scoped_lock(lightweight_mutex & m): m_(m)
{
register volatile int *p = &m_.a_;
register int f;
register int one = 1;
asm
{
loop:
lwarx f, 0, p
cmpwi f, 0
bne- yield
stwcx. one, 0, p
beq+ done
b loop // not sure if this is needed
yield:
stwcx. f, 0, p
b loop
}
done: ;
}
} // namespace detail
} // namespace boost
I moved the body of the constructor out so that it would compile with
CW8. Also for CFM Carbon apps (which ours is) sched_yield is not
readily available, so I changed the loop to just keep going until it
succeeds.
Would this be an acceptable alternative do you think?
Thanks
Martin
In article
<hinnant-1B4EA8.16090623012004@syrcnyrdrs-03-ge0.nyroc.rr.com>,
> In article
> <martin.taylor-41ECD4.10433523012004@newstrial.btopenworld.com>,
[quoted text clipped - 144 lines]
>
> }
Howard Hinnant - 26 Jan 2004 13:39 GMT
In article
<martin.taylor-0F302A.12130526012004@newstrial.btopenworld.com>,
> inline
> lightweight_mutex::scoped_lock::scoped_lock(lightweight_mutex & m): m_(m)
[quoted text clipped - 29 lines]
> succeeds.
> Would this be an acceptable alternative do you think?
Hi Martin,
I suspect that on OS 10 the unconditional branch would be ok, but
perhaps not ideal. If you run it on OS 9, I think it may be possible
that the lack of a yield to the OS could lead to an infinite loop. Can
you substitute a call to MPYield() in for CFM?
inline
lightweight_mutex::scoped_lock::scoped_lock(lightweight_mutex & m)
: m_(m)
{
register volatile int* p = &m_.a_;
register int f;
register int one = 1;
asm
{
loop:
lwarx f, 0, p
cmpwi f, 0
bne- yield
stwcx. one, 0, p
beq+ done
}
yield:
MPYield();
goto loop;
done: ;
}
You may need to include <Multiprocessing.h> for that.
-Howard