[A5] simple multithread code acts weird

rocket1

hi
so, i'm doing this:

//thread 1:
al_lock_mutex(job);
al_start_thread(thread2); 
for(/*some condition*/){
  al_wait_cond(sig,job); 
}

//thread 2:
while(1){
  al_lock_mutex(job); 
  /* work... */
  al_signal_cond(sig);
  al_unlock_mutex(job);
}

i've read the pthreads docs and googled for quite a few hours now. as far as i understand:
-thread1 gets the mutex, thread2 starts and waits.
-when i call wait_cond, thread1 starts waiting and unlocks the mutex, so thread2 can do some work. this apparently is atomic.
-now, thread2 signals and unlocks. when these 2 conditions are met, the wait in thread1 wakes up and gets back the mutex. this last step done by wait_cond should be atomic as well, but apparently is not.

what happens is that thread2 gets the mutex "while" thread1 is waking up.

if i add a small al_rest delay after the wait_cond, i get this behavior 100% of the time. if i add this delay after thread2 unlocks, thread1 gets the mutex first and works like i want it to :p

where am i being stupid?

Elias

rocket1 said:

what happens is that thread2 gets the mutex "while" thread1 is waking up.

What do you mean? Can you extend the example with some printfs and then say what you expect it to print but what it prints instead?

axilmar

The problem is that thread2 sometimes acquires the mutex before thread1 acquires the mutex. You can picture thread2 like this:

//thread2
al_lock_mutex(job); 
/* work... */
al_signal_cond(sig);
al_unlock_mutex(job);
al_lock_mutex(job); 
/* work... */
al_signal_cond(sig);
al_unlock_mutex(job);
...

As you can see, thread2 unlocks the mutex and then immediately locks it. If you put
al_rest after al_unlock_mutex, like this:

//thread2
al_lock_mutex(job); 
/* work... */
al_signal_cond(sig);
al_unlock_mutex(job);
al_rest(1);
al_lock_mutex(job); 
/* work... */
al_signal_cond(sig);
al_unlock_mutex(job);
al_rest(1);
...

then your code works because thread1 is given a chance to do its work.

In order to achieve what you want, you need a semaphore: the worker thread increments the semaphore, and the waiting thread decrements the semaphore or waits for the semaphore to be incremented.

rocket1

thanks for the reply

let me do the steps from the start:

t1 locks
t2 waits on lock
t1 waits on cond and unlocks

t2 now can lock
t2 does some work
t2 signals
t2 unlocks

t1 does not react
t2 locks again (it's the first line in the while loop)
t2 signals again
t2 unlocks again

t2 locks again (same thing..)

now t2 apparently takes control forever and t1 still has to do anything. it's in wait_cond limbo
if i add a delay in t2, after t2 unlocks, it looks like i'm giving the time to wait_cond in t1 to get the mutex and exit the limbo

for obvious reasons, adding a delay should not be necessary..

axilmar

There is no guarantee that the thread that is woken up by al_signal_cond will run immediately. What happens under the hood is that the operating system puts the thread in a queue for execution. In the mean time, the other thread gets another execution slot from the operating system, and locks the mutex.

You need a semaphore.

Slartibartfast

rocket1 said:

t1 does not react

It isn't supposed to.
Signaling an event/cond does not mean that the waiting thread immediately gets to run, it only means the OS will now again be able to schedule it to run.
Also, when t2 signals/unlocks it does not give up its time to run - it continues running until the OS tells it to stop; However if you add al_rest(0) then t2 is telling to OS to take back its remaining run-time and schedule it to run again later. This is why adding a rest in t2 seems to help you, as t1 can be scheduled after t2.

rocket1

thanks for the replies
exactly as some of you suggested, i was expecting the wait_cond to get back the mutex atomically.

kazzmir

wait_cond will atomically lock the mutex before continuing. The issue is that thread 1 never gets to run because whenever its woken up most likely the mutex is locked by t2.

Thread #607730. Printed from Allegro.cc