The pleasures and pitfalls of real-time
I started this article with the intention of being able to make some simple, common sense observations about real-time software. I guess I was even hoping for some insights. That’s because I have never found real-time software at all difficult – armed with a test-and-set instruction at the processor level; or a semaphore at the operating system level.
It always seemed to me that identifying resource conflicts was usually simple, and setting then releasing locks before and after the critical sections of code was no big deal. No need to mask interrupts - I’ve got it covered!
But as I trawled through some recent research findings, I realize it would be very misleading to assume my experience approaches anything like the general case. With hindsight, I can see that the real-time parts of systems that I have been involved with have been relatively linear and simple. The problems I have had to solve are not very different from the signalling needed to allow trains going in opposite directions to share a single track section of the line.
Synchronization problems come in many more flavors than this. A PhD thesis I looked at required over 300 pages to explain the scheduling and synchronization needed to ensure completion, before a defined completion time, of a set of software tasks communicating using shared memory and executing in parallel in a multi-core processor.
So I set myself a new objective for this article. What aspects of this sort of problem are visible and should be handled by development team managers? To investigate this question let’s start by asking why is it that not all synchronization problems can be reduced to the single track train scheduling problem?
For example, multiple software threads adding and removing items from an in-memory queue must avoid tripping up over each other. There is a solution to handling the in-memory queue that is a bit like a solution to the single track train scheduling problem – only allow one process (train) to access the queue (be on the line) at any one time.
But of course software developers, like train operators, are always looking for something better. In the case of trains, if there are two trains going in the same direction, why not let them both on the single track at the same time – just make sure the one behind travels no faster than the one in front. I’m sure you can see that this could get more throughput of passengers and freight, but it’s the start of a much more complicated train signaling system.
Something similar applies to the in-memory queue – why not allow many readers at once (they are harmless, they just want to see what is on the queue). But if a process wants to write to the queue, wait for all the readers to finish, then give the writer exclusive access. In some designs, there will be cases when it would be wrong to allow multiple readers to continue when there is a writer waiting in the wings to change the contents of the queue. The point is that software can require itself to solve ever more complex synchronization problems in order to achieve more and better functionality, or performance, or efficiency.
I’m not trying to put forward ways of solving these problems, I just want to illustrate the point that simple approaches to synchronization suffer, like so many other bits of engineering, from a trade-off dilemma. On the plus side, they are low cost to implement, easy to test, and often robust even when other parts of the system change. But there is a cost, and that can be some combination of performance and functional limitations.
So put yourself in the shoes of the development manager – how would you trade-off simplicity against efficiency?
One development team I spoke to had taken the simple approach. I was impressed and alarmed in equal measure by their design for a signal processing system that needed to process all input data in time for the next ‘frame’ of the output subsystem. Their approach was to expect the ‘ready-to-run’ queue of tasks for the CPU and DSP to be empty for 80% and preferably 90% of the time. Any load greater than this, and they specified faster hardware.
I was impressed because this was beautifully simple. I was alarmed because I suspect there could be failure modes that no amount of processor capacity can handle, and this strategy doesn’t lead the designers towards design elements for graceful degradation and recovery. However, in this system, there was new data every frame, and the specification allowed the occasional frame to be skipped. In practice, glitches in the output stream were so rare it was hard to measure them.
But this team was working on a low-volume, expensive product. The additional cost of faster electronics was trivial compared to the product cost – and each product instance was specified and built to handle a defined load of ‘input’ data.
Contrast this to the situation with volume products. The ability to squeeze the same performance from a lower cost processor directly improves the profitability of each product sold. The development team are under pressure to use every trick in the book to extract maximum throughput/performance/functionality from the minimum cost components.
Of course, quantifiers like mean time between failures can provide an apparently rational basis for a decision on where to draw the line. But it would be reassuring if all the costs of extra software complexity could be built into these quantifiers. Complex software has a habit of passing all the tests, then being implicated in quality incidents in the real world. So a development manager needs to know the degree of certainty that the tests cover all normal and abnormal operating conditions. If the tests don’t achieve this, is this a reason to go with a faster processor and simpler software?
I think the key to this is visibility. The people concerned must be able to recognize and report which way they are taking the complexity/performance characteristics. Visibility of their qualitative assessment is a good start (you can be sure there will be questions at the next engineering review meeting). If you (or they) can find a way of quantifying the complexity, then the discussion at the review meeting has an even better chance of guiding the project towards an optimum solution.
Peter Thorne is Managing Director for Cambashi Ltd . He researches and advises on the new product introduction process, e-business, and other industrial applications of information and communication technologies. He has applied information technology to engineering and manufacturing enterprises for more than 20 years, holding development, marketing and management positions with both user and vendor organizations. He has a Master of Arts degree in Natural Sciences and Computer Science from Cambridge University, is a Chartered Engineer, and a member of the British Computer Society.