Windows NT - Predictability vs Guarantee
Introduction
Mission critical applications that are strictly dependent on time constraints are kind of the rarer variety on most Windows NT platforms deployed out there. But NT is aiming high at those markets. Many of today's business applications require high performance and scalability, but usually they are specified and implemented without any decent metric on software performance. Also there is no performance requirement or criteria that a program can request from the NT operating system to facilitate any such metric. The end result is systems that have "accepted" performance instead of any kind of guaranteed performance.
For example, it is not possible to request NT that an API must complete its execution within so many milliseconds or that a disk i/o request must provide so many bytes per second transfer rate, or that a network connection must provide so much transfer rate. But these are typical requirements of online high volume transactional systems that engage users in high numbers or with high volumes of critical and timely data. Usually the attempt to solve this problem is to "beef up" the system by adding more computers, processors, memory and disk, and do some optimizations on the software and any databases involved. This is what we may call as a predictable system. A system that is expected to work perhaps 90% of the time with possible performance problems 10% of the time. There are no guarantees here whatsoever.
For example, it is not possible to request NT that an API must complete its execution within so many milliseconds or that a disk i/o request must provide so many bytes per second transfer rate, or that a network connection must provide so much transfer rate. But these are typical requirements of online high volume transactional systems that engage users in high numbers or with high volumes of critical and timely data. Usually the attempt to solve this problem is to "beef up" the system by adding more computers, processors, memory and disk, and do some optimizations on the software and any databases involved. This is what we may call as a predictable system. A system that is expected to work perhaps 90% of the time with possible performance problems 10% of the time. There are no guarantees here whatsoever.
One of these mechanisms is for the network. Though platforms should use ATM technology for any guaranteed bandwidth, many systems are deployed on regular IP networks built over Ethernets. NT 5.0 brings in Quality of Service (QOS) programming to the windows sockets world whereby you can create sockets or socket groups and associate a QOS parameter with them. These internally use certain network protocols like RSVP supported by network elements like routers in an attempt to "reserve" bandwidth. NT calls this GQOS or Generic QOS where programmers do not worry about the specific protocols concerned.
If you are just concerned with better network utilization and faster performance when using a higher level network programming abstraction like DCOM, then NT 5.0 introduces asynchronous RPC and DCOM, which is essentially a version of overlapped I/O on the network. You can also use I/O completion ports and overlapped I/O for network processing (discussed in another article in this issue). Again, no guarantees, only better utilization.
To manage processor "guarantees", NT provides four classes of priorities, with the fourth class providing a real-time priority band. No process or application that comes with NT uses this band, which runs at higher priority than even the virtual memory management system. So it *is* possible for an application to forcefully gorge CPU time. Such a process will run as long and whenever it is ready to run. Many such processes may share the CPU and may have gradations of priority within the real-time priority band. It would be possible to determine a processing rate/performance by statistical methods, and not in concrete terms of any priorities or operation timings involved. Real-time priorities are rarely recommended since they can pre-empt even the mouse cursor movement across your display, resulting in jitters. There are real-time extensions to NT available from 3rd parties that resolve some of these problems to a good extent. Investing in such technologies that are yet to prove themselves or become a s tandard may not sound very attractive or viable.
NT introduced symmetric multi-processing for systems that may take upto 32 processors. Quite a bit of speedup can be effected if these systems are designed well. Symmetric multi-processing means that any thread including the OS threads can run on any processor. However, it is possible for a process to map its own threads to specific processors. This is indeed useful for near guaranteed performance. But again, this does not preclude operating system threads from running on these processors. Also, throwing in more processors does not imply faster performance. Software and hardware limitations must be considered here. Software limitations like data coupling and available parallelism and hardware limitations like the amount of processor cache, memory bus speed and possibility of severe cache sloshing would strain the demands on hardware and software design.
What about timers? Can't they be used for guaranteed timing events? The default timers in NT are of 10ms resolution and suffer from clock drift. But NT does provide high resolution multi-media timers that can give you a resolution of up to 1 ms at best. These timers can come in handy when your processes need fine control over the processing of its own interested events. This gives an improved level of predictability in schedules while processing events.
For disk I/O, again, there are no guarantees on performance. You might increase your process priority in hopes of getting better access to the disk, use Fast-Wide or Ultra SCSI disks, or you might use striping or RAID 5 on multiple disks to get faster performance. It would take a specially written device driver to manage I/O and provide any semblance of a guarantee on any single disk. When dealing with disks under a volume manager, this function needs to move into the volume manager software.
For memory accesses, one might think that it would be possible to lock pages in memory, so as to avoid swapping. This is indeed possible, but in many cases this can lead to degraded performance than any hoped for improvement. The trouble is, under NT, this locking is not considered mandatory by the operating system. If all threads in your process block, the system may swap out the whole process. And when one of the threads unblock later, NT will reload ALL the pages that you locked earlier, and proceed to lock them before resuming the thread. If you have locked a large number of pages, then your application is immediately hit by a drastic performance degradation! Instead of locking, it is possible to increase the working set size of a process. The working-set is essentially the set of the most recently used memory pages. Increasing the working set size will help reduce swapping - to an extent.
Another feature that NT provides is clustering on its enterprise server editions. Clustering in NT is currently limited to 2 machines, and is more oriented towards a failover solution. That is, more of a hot standby kind of solution. However, it is possible to run the members of the cluster in active/active mode where each machine runs a separate subset of the processes or applications deployed. This can be tricky and might need multiple shared SCSI disks between the machines. Again only statistical performance expectations can be computed.
In summary, all these techniques can be used for improving and tuning Windows NT performance. There are hardly any ways by which one can look for any guaranteed performance other than statistically. If Windows NT is indeed to move strongly into the mission critical, high end, high volume application domains, it is high time that it incorporated guarantees, or real-time extensions that facilitate guarantees. Predictability alone does not suffice. Many high-end server operating systems based on Unix have native real-time extensions. We believe this is an area for Microsoft to seriously consider.


