Sunday, 28 July 2013

Multi-Threaded Code Is Hard

I know that title is probably stating the obvious to anyone who has tried it but it bears repeating.  I have other articles that refer to the same in different ways.

Nearly a month ago I thought I was nearly finished with the change to the physics to get it ready for networking, just a few peculiarities to sort out.  Little did I know how long some of those 'peculiarities' would take to find. 



Today I have finally confirmed that one of the hardest problems to find was, yet again, caused by multi-threading.  When the player shot towards another character sometimes but not always the shot would fire, the muzzle flash would display but no projectile trail was shown and the enemy was not damaged.

Lots of debug code and head scratching later I narrowed it down to the line of sight intersect code with the enemy bounding spheres was returning a distance of zero from the muzzle of the shooting weapon.



In my head and all the maths I did and all the debug code said this was an impossible situation but there it was on my output window, length zero to target and the shot in reverse!  Aaaaah!

I got so frustrated I started looking at Unity3D and the possibility of re-writing from scratch and targeting the Xbox One instead of the Xbox 360.  After installing Unity3D and taking a quick look at that I decided all I would be doing was moving from one frustrating problem to a different set of frustrating problems. 

The break from XNA gave me some time to think and I eventually noticed that the spheres I use to intersect with are updated in a different thread to where I calculate collision.  I am not sure how I prove this next statement but all the elements of the position of the spheres update atomically (in one operation) and therefore cannot be out of sync on an individual level but my guess is that as each vector contains three floating point numbers any one could be out of sync with the other two.

Anyway, I've changed the code to lock the changes to the spheres and I can no longer reproduce the problem.  At last :-)



Just when you thought this article was over, sorry... moments before writing this up I have come across another error.  This time with particles.  At least this time I can see immediately it is also caused by multi-threading.

My last word on this today...  That particle code has been unchanged for a very long time but threading problems can hit you at any time and may not be easily repeatable!