Between the Comments: On Process Schedulers and "Throughput"

What a Process Scheduler Does

To be fair to those who don't know every detail of how an OS works, I'll start by explaining what a process scheduler does and why it's important in the first place. Every operating system has a process scheduler, and it's a required function of the kernel, the lowest level part of the OS. The process scheduler determines what program gets to run next, and how long it gets to run before it's kicked off the processor so that some other process can run for a while (called preemption). The process scheduler is very important for system performance; it determines to a large extent how responsive the user interface will be, it determines how well the system can multitask and it can even affect how long things take to finish from the time they're started.

I think I should also mention what a process scheduler does not do, which is anything else useful. Every processor cycle spent doing scheduling tasks is time that hasn't been spent rendering web pages, processing game mechanics, compiling, decoding music or video files or anything else that the user wants the computer to do. In other words, the less time the scheduler spends scheduling in general, the better.

Performance Metrics for Process Schedulers

There are two meaningful performance metrics for schedulers, which are difficult to measure with any exactness although it is possible to get decent estimates and comparisons. The first is latency. Latency is basically how long a process spends waiting between the time it asks to run and the time it actually gets to run, on average. Latency has a large impact on things like games and movie playback where short response times are important, and lower is better.

The second is turnaround time. Turnaround time measures how long it takes to complete a compute-intensive job (like compiling) from start to finish. Again, smaller times are obviously better, since that means jobs are finishing faster, and we like things to be faster.

"Throughput"

Now, there exists a third metric which has become popular to cite, and has even been the central topic of certain high-profile CS papers, termed "throughput". "Throughput" measures how many processes per second the scheduler actually runs, but this does not actually count completed jobs, just processed scheduled. It is thought, wrongly, that 'throughput' is perhaps some average between turnaround times and latency, but in practice there is no correlation between throughput and any other measure.

On paper "throughput" sounds reasonable enough. Running more processes should result in better multitasking performance, or something like that. In reality though, the only thing they're measuring is how many times the scheduler preempts processes per second, and remember that each scheduling event is time not spent doing anything really useful, so more is not better. For turnaround times, the best result usually comes from long scheduling 'time slices', which is the opposite of 'high throughput'. For latency things are more complicated. Shorter time slices in general mean less waiting time, but even for latency-sensitive processes turnaround times are still important, so a balance must be struck to achieve good all around performance. However, that balance does not in any way correlate to "higher throughput".

Finally, a user cannot feel "throughput". A user can feel latency in terms of smoothness of the UI, video playback etc. A user can feel turnaround times in direct terms of how long cpu-heavy applications take to finish jobs (or how many simultaneous clients their web server will support). Even a modest improvement can be quite noticeable in either case, but there is no case where 'higher throughput' results in a direct improvement in the user experience. In fact, the ultimate 'high throughput' scheduler would schedule every application for exactly one instruction, and even if the scheduler were as fast as possible almost all of the CPU time on such a system would be spent scheduling. In practice I don't think a CPU will even allow slices that short, perhaps luckily.

It never ceases to amaze me how much nonsense like this gets repeated unquestioningly within supposedly educated academic circles. The linux community has also been big on 'throughput', although they're at least consistent in never knowing what they're talking about. There was even a research project where an AI scheduler was built to 'maximize throughput' for the linux kernel on large server systems, and if you look up that paper you find that they predictably adjust the time slices to the smallest possible value. And for the derp award, servers are also much more sensitive to turnaround times than latency. Great job, here's your PhD.

Between the Comments

Friday, November 28, 2014

On Process Schedulers and "Throughput"

What a Process Scheduler Does

Performance Metrics for Process Schedulers

"Throughput"

No comments:

Post a Comment