Thread Rating:
  • 0 Votes - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Race condition in Timer::FireAt
09-01-2015, 07:52 AM (This post was last modified: 09-01-2015 07:52 AM by simoncn.)
Post: #1
Race condition in Timer::FireAt
A user has reported an ohNet crash that turns out to be caused by a race condition in Timer::FireAt.

This method can be called by multiple device threads in rapid succession when a subscription is being renewed. If a thread switch occurs after iMgr.Remove and before iMgr.Add, the following can happen:

1) Thread A removes the Timer object from the queue
2) Thread B removes the same Timer object from the queue. This is allowed by the code in QueueSortedBase::DoRemove (line 204).
3) Thread B modifies this Timer object and adds it back to the queue
4) Thread A modifies this Timer object and adds it back to the queue. This is not allowed by the code in QueueSortedBase::DoAdd (lines 177-179).

The result is an assertion failure in line 179 of Queue.cpp.

I think there are two possible fixes for the problem:

a) Protect the code in Timer::FireAt with a new mutex
b) Modify QueueSortedBase::DoAdd to not throw an assertion error if the added item is already in the queue

I think a) is more robust and I presume the overhead of adding a mutex is acceptable. If you agree, I would be happy to submit a patch for this change.
Find all posts by this user
12-01-2015, 03:41 PM
Post: #2
RE: Race condition in Timer::FireAt
(09-01-2015 07:52 AM)simoncn Wrote:  I think there are two possible fixes for the problem:

a) Protect the code in Timer::FireAt with a new mutex
b) Modify QueueSortedBase::DoAdd to not throw an assertion error if the added item is already in the queue

Thanks for the report/diagnosis.

I've gone with a variant of your approach (a) above - Timer::FireAt() now calls onto TimerManager::FireAt() which is protected by a mutex. This is committed locally so should be on github this evening.
Find all posts by this user
15-01-2015, 09:06 PM
Post: #3
RE: Race condition in Timer::FireAt
(12-01-2015 03:41 PM)simonc Wrote:  Thanks for the report/diagnosis.

I've gone with a variant of your approach (a) above - Timer::FireAt() now calls onto TimerManager::FireAt() which is protected by a mutex. This is committed locally so should be on github this evening.

Thanks for this. I've released a MinimServer update with this fix and it seems to be working well so far.
Find all posts by this user


Forum Jump: